Historians rely on different sources to reconstruct the thought, society and history of past civilisations. Many of these sources are text-based – whether written on scrolls or carved into stone, the preserved records of the past help shed light on ancient societies. Such sources include inscriptions, texts inscribed on a durable surface (such as stone, pottery or metal). Inscriptions are one of the main direct sources of new evidence from the ancient world, but the majority have suffered damage over the centuries, and parts of the text are illegible or lost (Figure 1). Restoring the missing or damaged text is one of the main undertakings of the discipline of Epigraphy. It is a complex and time consuming task, but ancient historians can estimate the likelihood of different possible solutions based on context clues in the inscription – such as grammatical and linguistic considerations, layout and shape, textual parallels, and historical context. Although complex, the restoration of these documents is necessary for a deeper understanding of civilisations past.
We have been using machine learning trained on these ancient inscribed texts to build a system that can furnish a more complete and systematically ranked list of possible restoration solutions, which we hope will augment historians’ understanding of a text.
Pythia, which takes its name from the woman who delivered the god Apollo's oracular responses at the Greek sanctuary of Delphi - is the first ancient text restoration model that recovers missing characters from a damaged text input using deep neural networks. Bringing together the disciplines of ancient history and deep learning, this work offers a fully automated aid to the text restoration task, providing ancient historians with multiple textual restorations, as well as the confidence level for each hypothesis.
Pythia takes a sequence of damaged text as input, and is trained to predict character sequences comprising hypothesised restorations of ancient Greek inscriptions (texts written in the Greek alphabet dating between the seventh century BCE and the fifth century CE). The architecture works at both the character- and word-level, thereby effectively handling long-term context information, and dealing efficiently with incomplete word representations (Figure 2). This makes it applicable to all disciplines dealing with ancient texts (philology, papyrology, codicology) and applies to any language (ancient or modern). To train Pythia, the largest digital corpus of ancient Greek inscriptions (PHI Greek Inscriptions) was converted to machine actionable text (called PHI-ML). On PHI-ML, PYTHIA’s predictions achieve a 30.1% character error rate, compared to the 57.3% of evaluated human epigraphists. Moreover, in 73.5% of cases the ground-truth sequence was among the Top-20 restoration hypotheses of Pythia, which effectively demonstrates the impact of this assistive method on the field of digital epigraphy, and sets the state-of-the-art in ancient text restoration.
Figure 2: Pythia processing the phrase μηδέν ἄγαν (Mēdèn ágan) "nothing in excess," a fabled maxim inscribed on Apollo’s temple in Delphi. The letters "γα" are the characters to be predicted, and are annotated with ‘?’. Since ἄ??ν is not a complete word, its embedding is treated as unknown (‘unk’). The decoder outputs correctly "γα".
The combination of machine learning and epigraphy has the potential to transform the study of ancient texts, and widen the scope of the historian’s work. For this reason, the Oxford and DeepMind teams collaborated to create an open-sourced online Python notebook, Pythia, and PHI-ML’s processing pipeline on GitHub. By so doing, we hope to aid future research and inspire further interdisciplinary work.
Read more about this work on the original DeepMind blog post, or the preprint article 'Restoring ancient text using deep learning: a case study on Greek epigraphy' on arXiv.
Article courtesy of the University of Oxford Arts Blog