David A. Tanzer, January 30, 2021, in unit Language Complexity. Cross-posted to the Azimuth Blog.
Decision problems, decision procedures and complexity
In Part 1, we introduced the formal idea of a language, as being just a set of sentences. Now let’s approach the topic of language complexity.
For any given language, there is an associated decision problem: given a candidate string of characters, determine whether or not it belongs to the language. A decision procedure, or decider, is a program that solves the decision problem: it returns True or False to indicate whether a given character string belongs to the language.
decider_for_english("This is a tomato") = True
decider_for_english("Tomato a Eso 3e") = False
Here is the key idea for measuring language complexity. A simple language will have a decision procedure that is straightforward and runs quickly. In contrast, a complex language calls for a lot of “thinking” on the part of the decision procedure, in order for it to assess membership in the language. So the complexity of a language will be defined by the computational complexity of its decision procedure.
Now let’s look at the complexity of a tiny language:
L = {“Bob likes Alice”, “Alice likes Bob”}.
def decider_for_L(text):
if text == "Bob likes Alice": return True
if text == "Alice likes Bob": return True
return False # none of the cases matched
Because this code runs quickly, the theory classifies L as a simple language. (Since we already knew that L was simple, this is a good check on the theory.) More generally, every language which is finite will get classified as simple, language is simple, since a decision procedure can be written using this pattern.
With infinite languages, things become more interesting. Take the language ALL_A, with strings consisting only of the character ‘A’:
ALL_A = {“A”, “AA”, “AAA”, “AAAA”, …}
We can’t write a decision procedure for ALL_A using the above code pattern, as it would create a program with an infinite number of lines. That’s not feasible in reality, so we’ll rule it out.
However, we can still write a decision procedure for ALL_A. This is easy: just loop through every character in the string and check whether it equals ‘A’.
Again, the theory matches our expectations. We see that ALL_A is a simple language, and the theory classifies it as so.
Next consider the language PRIME_A, with strings just containing ‘A’, where the length is a prime number:
PRIME_A = {“AA”, “AAA”, “AAAAA”, “AAAAAAA”, “AAAAAAAAAAA”, …}
We can write a decision procedure for PRIME_A, but now the code has more work to do. First it has to loop through the input, to count the number of characters N. Then it has to analyze whether N is prime. This is related to the work of factorizing N, which is not a trivial matter. And the work increases as N gets larger.
So the theory tells us that PRIME_A has a greater complexity than ALL_A. And indeed it does. A human decider would have to do a lot more mental processing in order to determine membership in PRIME_A as compared to membership in ALL_A.
With these ideas, you should now have a qualitative understanding for what language complexity means. In the next posts, we will refine it into a quantitative concept.
Copyright © 2021, David A. Tanzer. All Rights Reserved.