BreMM19 | Hiippala

Tuomo Hiippala
University of Helsinki | Helsinki, Finland

AI2D-RST: A multimodal corpus of diagrams

The Allen Institute for Arti cial Intelligence Diagrams (AI2D) dataset contains nearly 5000 diagrams from school textbooks, which have been annotated for content elements and their interrelations by crowd-sourced, non-expert workers (Kembhavi et al. 2016). This presentation describes an alternative, expert-annotated replacement for the original AI2D annotation, called AI2D-RST, which is intended to support the use of this dataset for studying the multimodality of diagrams (Hiippala & Orekhova 2018) and their interpretation by computers and humans (Alikhani & Stone 2018).

For this purpose, the AI2D-RST dataset presents a new schema for describing the multimodal structure of diagrams, which accounts for (1) the hierarchical organisation of content elements, (2) the discourse relations that hold between them using Rhetorical Structure Theory (Taboada & Mann 2006), and (3) the connections between content elements that are expressed using diagrammatic elements such as lines and arrows. The
annotation for each layer of description is represented using graphs, which use common identi ers for content elements to enable relating different layers of description. In addition to presenting the annotation schema and its application to the original AI2D dataset, the presentation discusses the development of an in-house tool to support the creation of the AI2D-RST dataset. Furthermore, the presentation explores what kinds of new methods are needed for empirical study of multimodal corpora that use graph-based representations.


Alikhani, M. & Stone, M. (2018), Arrows are the verbs of diagrams, in Proceedings of the 27th Inter- national Conference on Computational Linguistics’, Santa Fe, New Mexico, USA, pp. 3552-3563.

Hiippala, T. & Orekhova, S. (2018), Enhancing the AI2 Diagrams dataset using Rhetorical Structure Theory, inProceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)’, European Language Resources Association (ELRA), Paris, pp. 1925{1931.

Kembhavi, A., Salvato, M., Kolve, E., Seo, M., Hajishirzi, H. & Farhadi, A. (2016), A diagram is worth a dozen images, in Proceedings of the 14th European Conference on Computer Vision (ECCV)’, Springer, Cham, pp. 235-251.

Taboada, M. & Mann, W. C. (2006),Rhetorical Structure Theory: looking back and moving ahead’, Discourse Studies 8(3), 423-459.


Tuomo Hiippala is Assistant Professor of English Language and Digital Humanities at the University of Helsinki, Finland. His current research interests include the application of arti cial intelligence in multimodal research and the multimodality of diagrams.

  • © 2020 University of Bremen || Faculty of Linguistics and Literary Science