Language Complexity, Part 1

      Comments Off on Language Complexity, Part 1

David A. Tanzer, December 31, 2020

A simple view of languages

How complex is the English language? The question has many dimensions and nuances. What does complexity mean, and how could it be measured? This a tough nut to crack, so in this post we’ll make a retrenchment and reconsider the question in a more formal setting — computer science.

Here a language is identified with its extension — meaning the collection of all its grammatically correct sentences. So here English is modeled as just a large repository of sentences. Although this is an ‘impoverished’ view of language, it is not without value, as simpler models can still give us ideas to work with.

It is easy to see that the extension of the English language is an infinite set of sentences, as it includes for example the following sentences:

  • The father of Adam did not exist.
  • The father of the father of Adam did not exist.
  • The father of the father of the father of Adam did not exist.
  • Etc., ad infinitum

One can also consider small languages, consisting of some fixed set of sentences. For example, we could take the sublanguage of English consisting of all simple sentences where the subject and object are either Alice or Bob, and the only verb is ‘likes’:

  • Alice likes Alice.
  • Alice likes Bob.
  • Bob likes Alice.
  • Bob likes Bob.

If we called this language AB, we could write an equation using set notation:

AB = {“Alice likes Alice”, “Alice likes Bob”, “Bob likes Alice”, “Bob likes Bob”}.

In this setting, a ‘sentence’ means something pretty simple – a string of characters that belongs to the language. So the sentence “Bob likes Bob” just means [B, o, b, SPACE, l, i, k, e, s, SPACE, B, o, b]. semantics here. And the word “Alice” is just a shorthand for the string [A, l, i, c, e]. There is no semantics here.

Here are a couple of other examples of formal languages.

  • The set of all strings involving just the characters 0 and 1, which start with 0 and alternate. That is: {“0”, “01”, “010”, “0101”, …}. This is a simple example of an infinite language.
  • The set of prime numbers, written out as strings: [“2”, “3”, “5”, “7”, “11”, “13”, “17”, …]. Again, infinite.

Now that we know what these “languages” are, we are ready for the next post, where we will see how their complexity gets measured.

Copyright © 2020, David A. Tanzer. All Rights Reserved.