Category: Language Complexity Primer

Language Complexity Primer

David A. Tanzer, January 2021

How complex is the English language? That’s a tough nut to crack. Here we’ll reframe the question in a more formal setting which is shared by computer science and linguistics: the theory of formal languages and their complexity. By the end, we’ll reach a technical understanding of what is meant by the famous conjecture P != NP.

To talk about the unit in general, and … Read more

Language Complexity, Part 1

David A. Tanzer, January 2021. Cross-posted to Azimuth.

A simple view of languages

How complex is the English language? The question has many dimensions and nuances. What does complexity mean, and how could it be measured? This a tough nut to crack, so in this post we’ll make a retrenchment and reconsider the question in a more formal setting — computer science.

In this setting, a language gets identified … Read more

Language Complexity, Part 2

David A. Tanzer, January 2021. Cross-posted to Azimuth.

Decision problems, decision procedures and complexity

In Part 1, we introduced the formal idea of a language, as being just a set of sentences. Now let’s approach the topic of language complexity.

For any given language, there is an associated decision problem:   given a candidate string of characters, determine whether or not it belongs to the language.   A decision Read more

Language Complexity, Part 3

David A. Tanzer, January 2021. Cross-posted to Azimuth.

A detailed measure of computational complexity

In Part 2, we identified language complexity by how “difficult” it is for a program to decide if some given text is a sentence of the language — in other words, by the complexity of the decision problem for the language.  

Note there may be many decision procedures — which are implementations of functions … Read more

Language Complexity, Part 4

David A. Tanzer, January 2021. Cross-posted to Azimuth.

Summarizing computational complexity

In Part 3 we defined, for each program P, a detailed function P'(n) that gives the worst case number of steps that P must perform when given some input of size n. Now we want to summarize P into general classes, such as linear, quadratic, etc.

What’s in a step?

But we should clarify something before proceeding: what … Read more

Language Complexity, Part 5

David A. Tanzer, January 2021. Cross-posted to Azimuth.

Big O analysis

Recall our function $f(n)$ from Part 4, which gives the values 2, 13, 14, 25, 26, 37, …

Using ‘big O’ notation, we write $f(n) = O(n)$ to say that $f$ is linearly bounded.

This means that $f(n)$ will eventually become less than some linear function.

As we said, $f(n)$ has a “rough slope” of 6. So … Read more

Language Complexity, Part 6

David A. Tanzer, January 2021. Cross-posted to Azimuth.

Quadratic complexity

In Part 5 we introduced big O notation for describing linear complexity. Now let’s look at a function with greater than linear complexity.

def square_length(text): 
    # compute the square of the length of text
    # FIXME: not the most elegant or efficient approach
    n = length(text)  
    counter = 0    
    for i = 1 to n:        
        for j = 1 
Read more

Language Complexity, Part 7

David A. Tanzer, February 11, 2021. Cross-posted to Azimuth.

Higher complexity classes

In Part 6, we saw a program with quadratic complexity. The collection of all languages that can be decided in $O(n^k)$ time for some $k$ is called P, for polynomial time complexity.

Now let’s consider languages that appear to require time that is exponential in the size of the input, and hence lie outside of P.

Here … Read more