Skip to main content

Research Repository

Advanced Search

Sentence encoding for dialogue act classification

Duran, Nathan; Battle, Steve; Smith, Jim

Sentence encoding for dialogue act classification Thumbnail


Jim Smith
Professor in Interactive Artificial Intelligence


In this study, we investigate the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification, including several aspects of text pre-processing and input representation which are often overlooked or underreported within the literature, for example, the number of words to keep in the vocabulary or input sequences. We assess each of these with respect to two DA-labelled corpora, using a range of supervised models, which represent those most frequently applied to the task. Additionally, we compare context-free word embedding models with that of transfer learning via pre-trained language models, including several based on the transformer architecture, such as Bidirectional Encoder Representations from Transformers (BERT) and XLNET, which have thus far not been widely explored for the DA classification task. Our findings indicate that these text pre-processing considerations do have a statistically significant effect on classification accuracy. Notably, we found that viable input sequence lengths, and vocabulary sizes, can be much smaller than is typically used in DA classification experiments, yielding no significant improvements beyond certain thresholds. We also show that in some cases the contextual sentence representations generated by language models do not reliably outperform supervised methods. Though BERT, and its derivative models, do represent a significant improvement over supervised approaches, and much of the previous work on DA classification.


Duran, N., Battle, S., & Smith, J. (2023). Sentence encoding for dialogue act classification. Natural Language Engineering, 29(3), 794-823.

Journal Article Type Article
Acceptance Date Sep 20, 2021
Online Publication Date Nov 2, 2021
Publication Date 2023-05
Deposit Date Nov 19, 2021
Publicly Available Date Jul 7, 2023
Journal Natural Language Engineering
Print ISSN 1351-3249
Electronic ISSN 1469-8110
Publisher Cambridge University Press (CUP)
Peer Reviewed Peer Reviewed
Volume 29
Issue 3
Pages 794-823
Keywords Artificial Intelligence; Linguistics and Language; Language and Linguistics; Software
Public URL
Publisher URL
Additional Information Copyright: © The Author(s), 2021. Published by Cambridge University Press


You might also like

Downloadable Citations