Language Complexity, Part 2

David A. Tanzer, January 30, 2021, in unit Language Complexity. Cross-posted to the Azimuth Blog.

Decision problems, decision procedures and complexity

In Part 1, we introduced the formal idea of a language, as being just a set of sentences. Now let’s approach the topic of language complexity.

For any given language, there is an associated decision problem:   given a candidate string of characters, determine whether or not it belongs to the language.   A decision procedure, or decider, is a program that solves the decision problem: it returns True or False to indicate whether a given character string belongs to the language.

decider_for_english("This is a tomato") = True
decider_for_english("Tomato a Eso 3e") = False

Here is the key idea for measuring language complexity. A simple language will have a decision procedure that is straightforward and runs quickly. In contrast, a complex language calls for a lot of “thinking” on the part of the decision procedure, in order for it to assess membership in the language. So the complexity of a language will be defined by the computational complexity of its decision procedure.

Now let’s look at the complexity of a tiny language:

L = {“Bob likes Alice”, “Alice likes Bob”}.

def decider_for_L(text):
if text == "Bob likes Alice": return True
if text == "Alice likes Bob": return True
return False # none of the cases matched 

Because this code runs quickly, the theory classifies L as a simple language.  (Since we already knew that L was simple, this is a good check on the theory.) More generally, every language which is finite will get classified as simple, language is simple, since a decision procedure can be written using this pattern.

With infinite languages, things become more interesting.  Take the language ALL_A, with strings consisting only of the character ‘A’:

ALL_A = {“A”, “AA”, “AAA”, “AAAA”, …}

We can’t write a decision procedure for ALL_A using the above code pattern, as it would create a program with an infinite number of lines. That’s not feasible in reality, so we’ll rule it out.

However, we can still write a decision procedure for ALL_A.  This is easy: just loop through every character in the string and check whether it equals ‘A’.

Again, the theory matches our expectations. We see that ALL_A is a simple language, and the theory classifies it as so.

Next consider the language PRIME_A, with strings just containing ‘A’, where the length is a prime number:

PRIME_A = {“AA”, “AAA”, “AAAAA”, “AAAAAAA”, “AAAAAAAAAAA”, …}

We can write a decision procedure for PRIME_A, but now the code has more work to do.  First it has to loop through the input, to count the number of characters N. Then it has to analyze whether N is prime. This is related to the work of factorizing N, which is not a trivial matter. And the work increases as N gets larger.

So the theory tells us that PRIME_A has a greater complexity than ALL_A. And indeed it does. A human decider would have to do a lot more mental processing in order to determine membership in PRIME_A as compared to membership in ALL_A.

With these ideas, you should now have a qualitative understanding for what language complexity means. In the next posts, we will refine it into a quantitative concept.