Language Complexity, Part 4

      No Comments on Language Complexity, Part 4

David A. Tanzer, February 8, 2021, in unit Language Complexity. Cross-posted to the Azimuth Blog.

Summarizing computational complexity

In Part 3 we defined, for each program P, a detailed function P'(n) that gives the worst case number of steps that P must perform when given some input of size n. Now we want to summarize P into general classes, such as linear, quadratic, etc.

What’s in a step?

But we should clarify something before proceeding: what is meant by a ‘step’ of a program?  Do we count it in units of machine language, or in terms of higher level statements?   Before, we said that each iteration of the loop for the ALL_A decision procedure counted as a step. But in a more detailed view, each iteration of the loop includes multiple steps: comparing the input character to ‘A’, incrementing a counter, performing a test.

All these interpretations are plausible. Fortunately, provided that the definition of a program step is ‘reasonable’, all of them will lead to the same general classification of the program’s time complexity. Here, by reasonable I mean that the definition of a step should be such that, on a given computer, there is an absolute bound on the amount of clock time needed for the processor to complete one step.

Approximately linear functions

The classification of programs into complexity classes such as linear, quadratic, etc., is a generalization which doesn’t require that the time complexity be exactly linear, quadratic, etc. For example, consider the following code:

def approximately_linear(text):
    # perform a silly computation
    counter = 0
    for i = 1 to length(text):
        if i is odd:
            for j = 1 to 10:
                counter = counter + 1
    return counter
    

Here are the number of steps it performs, as a function of the input length:

f(0) = 2
f(1) = 13
f(2) = 14
f(3) = 25
f(4) = 26
f(5) = 37
...

The value alternatively increases by 1 and then by 11. As it increases by 12 for every two steps, we could say that it is “approximately linear,” with slope “roughly” equal to 6. But in fine detail, the graph looks like a sawtooth.

Soon, we will explain how this function gets definitively classified as having linear complexity.

Appendix: Python machines versus Turing machines

Here we are programming and measuring complexity on a Python-like machine, rather than a pure Turning machine. This is surfaced, for example, in the fact that without further ado we called a function length(text) to count the number of characters, and will regard this as a single step of the computation. On a true Turing machine, however, counting the length of the string takes N steps, as this operation requires that the tape be advanced one character at a time until the end of the string is detected.

This is a point which turns out not to substantially affect the complexity classification of a language. Assuming that steps are counted reasonably, any optimal decision procedure for a language of strings, whether written in Turing machine language, Python, C# or what have you, will end up with the same complexity classification.

The length function in Python really does take a bounded amount of time, so it is fair to count it as a single step. The crux of this matter is that, in a higher level language, a string is more than a sequence of characters, as it is a data structure containing length information as well. So there is order N work that is implied just by the existence of a string. But this can be folded into the up-front cost of merely reading the input, which is a general precondition for a language decider.

But, you may ask, what about languages which can be decided without even reading all of the input? For example, the language of strings that begin with the prefix “abc”. Ok, so you got me.

Still, as a practical matter, anything with linear or sub-linear complexity can be considered excellent and simple. The real challenges have do with complexity which is greater than linear, and which represents a real practical issue: software performance. So, for intents and purposes, we may treat any implied order N costs as being essentially zeros – as long as they can be on a one-time, up-front basis, e.g., the order N work involved in constructing a string object.

Copyright © 2021, David A. Tanzer. All Rights Reserved.

Leave a Reply