Sentence encoding for dialogue act classification

Duran, Nathan; Battle, Steve; Smith, Jim

doi:10.1017/S1351324921000310

Sentence encoding for dialogue act classification

Duran, Nathan; Battle, Steve; Smith, Jim

Authors

Dr Nathan Duran Nathan.Duran@uwe.ac.uk
Senior Lecturer in Computer Science

Steve Battle Steve.Battle@uwe.ac.uk
Senior Lecturer

Jim Smith James.Smith@uwe.ac.uk
Professor in Interactive Artificial Intelligence

Abstract

In this study, we investigate the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification, including several aspects of text pre-processing and input representation which are often overlooked or underreported within the literature, for example, the number of words to keep in the vocabulary or input sequences. We assess each of these with respect to two DA-labelled corpora, using a range of supervised models, which represent those most frequently applied to the task. Additionally, we compare context-free word embedding models with that of transfer learning via pre-trained language models, including several based on the transformer architecture, such as Bidirectional Encoder Representations from Transformers (BERT) and XLNET, which have thus far not been widely explored for the DA classification task. Our findings indicate that these text pre-processing considerations do have a statistically significant effect on classification accuracy. Notably, we found that viable input sequence lengths, and vocabulary sizes, can be much smaller than is typically used in DA classification experiments, yielding no significant improvements beyond certain thresholds. We also show that in some cases the contextual sentence representations generated by language models do not reliably outperform supervised methods. Though BERT, and its derivative models, do represent a significant improvement over supervised approaches, and much of the previous work on DA classification.

Journal Article Type	Article
Acceptance Date	Sep 20, 2021
Online Publication Date	Nov 2, 2021
Publication Date	2023-05
Deposit Date	Nov 19, 2021
Publicly Available Date	Jul 7, 2023
Journal	Natural Language Engineering
Print ISSN	1351-3249
Electronic ISSN	1469-8110
Publisher	Cambridge University Press (CUP)
Peer Reviewed	Peer Reviewed
Volume	29
Issue	3
Pages	794-823
DOI	https://doi.org/10.1017/S1351324921000310
Keywords	Artificial Intelligence; Linguistics and Language; Language and Linguistics; Software
Public URL	https://uwe-repository.worktribe.com/output/8167980
Publisher URL	https://www.cambridge.org/core/journals/natural-language-engineering/article/sentence-encoding-for-dialogue-act-classification/2EF3DC8E57D1019960D18FDE685B1EBA#
Additional Information	Copyright: © The Author(s), 2021. Published by Cambridge University Press