Julian D Richards
Text mining in archaeology: Extracting information from archaeological reports
Richards, Julian D; Tudhope, Douglas; Vlachidis, Andreas
Authors
Contributors
Juan Barcelo
Editor
Igor Bogdanovic
Editor
Abstract
Archaeologists generate large quantities of text, ranging from unpublished technical fieldwork reports (the ‘grey literature’) to synthetic journal articles. However, the indexing and analysis of these documents can be time consuming and lacks consistency when done by hand. It is also rarely integrated with the wider archaeological information domain, with bibliographic searches having to be undertaken independently of database queries, for example. Text mining offers a means of extracting information from large volumes of text, providing researchers with an easy way of locating relevant texts and also of identifying patterns in the literature. In recent years techniques of Natural Language Processing (NLP) and its subfield, Information Extraction (IE), have been adopted to allow researchers to find, compare and analyse relevant documents, and to link them to other types of data. This chapter introduces the underpinning mathematics and provides a short presentation of the algorithms and distance measures used, from the point of view of artificial intelligence and computational logic. It describes the different NLP schools of thought and compares the pros and cons of rule-based vs machine learning approaches to information extraction. The role of ontologies and named entity recognition will be discussed and the chapter demonstrates how IE can provide the basis for semantic annotation and how it contributes to the construction of a semantic web for archaeology. The authors have worked on a number of projects that have employed techniques from NLP and IE in Archaeology, including Archaeotools, STAR and STELLAR. The chapter describes the archaeological user needs requirement, drawing examples from several countries, and the authors present examples drawn from their own projects, and previous work by others, of how NLP and IE can contribute to addressing this need. The problems and challenges of employing text mining in the archaeological domain are discussed, as well as the potential benefits.
Publication Date | Jan 1, 2015 |
---|---|
Deposit Date | Feb 5, 2018 |
Publicly Available Date | Mar 27, 2018 |
Journal | Mathematics and archaeology |
Peer Reviewed | Peer Reviewed |
Pages | 240 |
Book Title | Mathematics and Archaeology |
ISBN | 9781482226812 |
Public URL | https://uwe-repository.worktribe.com/output/844180 |
Publisher URL | https://www.crcpress.com/Mathematics-and-Archaeology/Barcelo-Bogdanovic/p/book/9781482226812 |
Additional Information | Additional Information : Copyright ©2015 From Mathematics and Archaeology by Juan Barcelo and Igor Bogdanovic. Reproduced by permission of Taylor and Francis Group, LLC, a division of Informa plc. This material is strictly for personal use only. For any other use, the user must contact Taylor & Francis directly at this address: permissions.mailbox@taylorandfrancis.com. Printing, photocopying, sharing via any means is a violation of copyright. |
Contract Date | Feb 5, 2018 |
Files
TextMininginArchaeology-authorVersion.pdf
(333 Kb)
PDF
You might also like
ARIADNE: A research infrastructure for archaeology
(2017)
Journal Article
Enabling European archaeological research: The ARIADNE E-infrastructure
(2017)
Journal Article
Downloadable Citations
About UWE Bristol Research Repository
Administrator e-mail: repository@uwe.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search