Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

Professor Ard Louis from the Department of Physics explains the maths behind a phenomenon observed in science and engineering termed ‘simplicity bias’, which makes simple outputs far more likely than complex outputs.

2018simplicity bias2019 in science

Despite the apparent complexity of the world around us, there is an inherent bias towards simplicity. This holds true as much for biological processes as for engineering and mathematics in everyday life.

Consider the familiar analogy of monkeys hitting keys at random on a typewriter. As the story goes, if the monkeys could type long enough, then they would eventually produce the full works of Shakespeare (although it would take much longer than the age of the universe to do so ).

To put this into a quantifiable form, for a typewriter with N keys the probability of the monkeys producing a specific sequence of characters of length K is just 1/NK, since at each stroke, the probability of getting the right character is 1/N, and they have to do this correctly K times in a row. This argument also implies that all sequences of length K are equally likely or unlikely (assuming of course that the monkeys type in a truly random way). In other words, the monkeys are equally likely to produce Hamlet, which has about 100,000 characters, as they are any other sequence of about 100,000 characters.

Now consider a different scenario, where the monkeys are typing not into a typewriter but instead into a computer programming language. Then some sequences of characters can be generated quite simply. For example, a short 22 character program ‘Print “0011” 250 times’ will produce a 1000 character sequence of the repeating form ‘00110011001100110011…..0011’.

The probability of accidentally generating this program by monkeys typing on a computer is 1/N22, which is much less than the 1/N1000 probability that they would produce the full sequence on a typewriter (or a word processor). Interestingly, most sequences don’t have programs to generate them that are much shorter than simply ‘Print “sequence” ‘, which means the probability of obtaining them by monkeys either typing into a computer program or on a typewriter is more or less the same. But for some sequences, like the one above, the computer program route is exponentially shorter, and so it is much more likely to occur upon random key strokes.

The basic intuition behind monkeys typing into computer programs was formalised over 50 years ago in the field of algorithmic information theory (AIT). In brief, the simplicity or complexity of a sequence is defined in AIT by the length of the shortest program that can generate the sequence on a universal Turing machine, a basic computer device, first hypothesised by Alan Turing, which can perform any possible computation.

Unfortunately, while the results from AIT are mathematically profound and elegant, they are difficult to apply in practice because universal Turing machines are rather special objects, and moreover, for deep reasons linked to the foundations of mathematics, the AIT definition of complexity is formally incomputable.

To make progress, our team at Oxford derived a form of the AIT coding theorem of Solomonoff and Levin, which calculates the probability that a random program will generate a particular output.

While this new coding theorem result only provides an upper bound on that probability, and so is less powerful than the full AIT coding theorem which provides both an upper and a lower bound, it is much easier to use in practice.

We applied our theorem to input-output maps, which are a general class of systems that take some input, and after some calculation, produce an output. The really interesting thing we found was that for many maps we need to know very little about the map. We can simply convert the outputs to binary strings, and then compress them with a simple compression algorithm (not unlike those you have on your computer to zip files).

The new bound predicts that if an output is highly compressible, then it is much more likely to occur upon random inputs to the map. Not needing to know much about the map is analogous to not needing to know what programming language the monkeys are typing into in order to still predict that they are much more likely to produce something like a highly compressible 1000-character sequence that repeats the pattern ‘0011’, than they are to produce some truly random and incompressible sequence of 1000 characters.

As in AIT, if a sequence is highly compressible, this implies that there exists a short code to generate it, so we called these outputs simple. Since simple outputs are much more likely to occur, we say that this broad class of maps exhibit ‘simplicity bias’.

Many systems in science and engineering can be analysed as input-output maps. Examples range from the biological mapping from RNA sequence to RNA secondary structures, to systems of coupled differential equations, to simple models from financial mathematics; these all show an exponential bias towards simple outputs.

In other words, we can expect that many different situations both in science and engineering will manifest simplicity bias, and it is very likely that simplicity is all around us.

The full paper, 'Input–output maps are strongly biased towards simple outputs,' can be read in Nature Communications.

Story courtesy of the University of Oxford Science Blog