Dr Nathan Duran Nathan.Duran@uwe.ac.uk
Lecturer in Artificial Intelligence
Sentence encoding for dialogue act classification
Duran, Nathan; Battle, Steve; Smith, Jim
Authors
Steve Battle Steve.Battle@uwe.ac.uk
Senior Lecturer
Jim Smith James.Smith@uwe.ac.uk
Professor in Interactive Artificial Intelligence
Abstract
In this study, we investigate the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification, including several aspects of text pre-processing and input representation which are often overlooked or underreported within the literature, for example, the number of words to keep in the vocabulary or input sequences. We assess each of these with respect to two DA-labelled corpora, using a range of supervised models, which represent those most frequently applied to the task. Additionally, we compare context-free word embedding models with that of transfer learning via pre-trained language models, including several based on the transformer architecture, such as Bidirectional Encoder Representations from Transformers (BERT) and XLNET, which have thus far not been widely explored for the DA classification task. Our findings indicate that these text pre-processing considerations do have a statistically significant effect on classification accuracy. Notably, we found that viable input sequence lengths, and vocabulary sizes, can be much smaller than is typically used in DA classification experiments, yielding no significant improvements beyond certain thresholds. We also show that in some cases the contextual sentence representations generated by language models do not reliably outperform supervised methods. Though BERT, and its derivative models, do represent a significant improvement over supervised approaches, and much of the previous work on DA classification.
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 20, 2021 |
Online Publication Date | Nov 2, 2021 |
Publication Date | 2023-05 |
Deposit Date | Nov 19, 2021 |
Publicly Available Date | Jul 7, 2023 |
Journal | Natural Language Engineering |
Print ISSN | 1351-3249 |
Electronic ISSN | 1469-8110 |
Publisher | Cambridge University Press (CUP) |
Peer Reviewed | Peer Reviewed |
Volume | 29 |
Issue | 3 |
Pages | 794-823 |
DOI | https://doi.org/10.1017/S1351324921000310 |
Keywords | Artificial Intelligence; Linguistics and Language; Language and Linguistics; Software |
Public URL | https://uwe-repository.worktribe.com/output/8167980 |
Publisher URL | https://www.cambridge.org/core/journals/natural-language-engineering/article/sentence-encoding-for-dialogue-act-classification/2EF3DC8E57D1019960D18FDE685B1EBA# |
Additional Information | Copyright: © The Author(s), 2021. Published by Cambridge University Press |
Files
Sentence encoding for dialogue act classification
(870 Kb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
Conversation analysis structured dialogue for multi-domain dialogue management
(2018)
Presentation / Conference Contribution
Inter-annotator agreement using the Conversation Analysis Modelling Schema, for dialogue
(2022)
Journal Article
A mobile homeostat with three degrees of freedom
(2015)
Presentation / Conference Contribution
Downloadable Citations
About UWE Bristol Research Repository
Administrator e-mail: repository@uwe.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search