Skip to main content

Research Repository

Advanced Search

Text mining in archaeology: Extracting information from archaeological reports

Richards, Julian D; Tudhope, Douglas; Vlachidis, Andreas

Text mining in archaeology: Extracting information from archaeological reports Thumbnail


Julian D Richards

Douglas Tudhope


Juan Barcelo

Igor Bogdanovic


Archaeologists generate large quantities of text, ranging from unpublished technical fieldwork reports (the ‘grey literature’) to synthetic journal articles. However, the indexing and analysis of these documents can be time consuming and lacks consistency when done by hand. It is also rarely integrated with the wider archaeological information domain, with bibliographic searches having to be undertaken independently of database queries, for example. Text mining offers a means of extracting information from large volumes of text, providing researchers with an easy way of locating relevant texts and also of identifying patterns in the literature. In recent years techniques of Natural Language Processing (NLP) and its subfield, Information Extraction (IE), have been adopted to allow researchers to find, compare and analyse relevant documents, and to link them to other types of data. This chapter introduces the underpinning mathematics and provides a short presentation of the algorithms and distance measures used, from the point of view of artificial intelligence and computational logic. It describes the different NLP schools of thought and compares the pros and cons of rule-based vs machine learning approaches to information extraction. The role of ontologies and named entity recognition will be discussed and the chapter demonstrates how IE can provide the basis for semantic annotation and how it contributes to the construction of a semantic web for archaeology. The authors have worked on a number of projects that have employed techniques from NLP and IE in Archaeology, including Archaeotools, STAR and STELLAR. The chapter describes the archaeological user needs requirement, drawing examples from several countries, and the authors present examples drawn from their own projects, and previous work by others, of how NLP and IE can contribute to addressing this need. The problems and challenges of employing text mining in the archaeological domain are discussed, as well as the potential benefits.


Richards, J. D., Tudhope, D., & Vlachidis, A. (2015). Text mining in archaeology: Extracting information from archaeological reports. In J. Barcelo, & I. Bogdanovic (Eds.), Mathematics and Archaeology (240). CRC Press

Publication Date Jan 1, 2015
Deposit Date Feb 5, 2018
Publicly Available Date Mar 27, 2018
Journal Mathematics and archaeology
Peer Reviewed Peer Reviewed
Pages 240
Book Title Mathematics and Archaeology
ISBN 9781482226812
Public URL
Publisher URL
Additional Information Additional Information : Copyright ©2015 From Mathematics and Archaeology by Juan Barcelo and Igor Bogdanovic. Reproduced by permission of Taylor and Francis Group, LLC, a division of Informa plc. This material is strictly for personal use only. For any other use, the user must contact Taylor & Francis directly at this address: Printing, photocopying, sharing via any means is a violation of copyright.


You might also like

Downloadable Citations