Vkg Kalaiselvi
A novel approach to document similarity retrieval using sentence transformers and vector databases
Kalaiselvi, Vkg; Gopal, Priya
Authors
Priya Gopal
Abstract
This study introduces a novel method for document similarity retrieval, leveraging Sentence Transformers for efficient processing and Milvus for vector storage. The workflow starts by extracting text from crowd-sourced vector databases and segmenting it into individual sentences. These sentences are transformed into embeddings using Sentence Transformer, creating a robust text representation. The embeddings are stored in Milvus, facilitating high-performance similarity searches. To improve query relevance, we enhance user queries with synonyms from WordNet, addressing different spellings and related terms. Our approach effectively tackles duplicate detection and spelling variations through vector similarity measures and customized indexing, ensuring accurate retrieval and ranking of relevant documents. Unlike traditional methods, our system integrates advanced natural language processing techniques with the capabilities of vector databases, resulting in a precise and efficient information retrieval system. Our method significantly enhances precision, recall, and F-score, making it a valuable contribution to the field of information retrieval.
Presentation Conference Type | Conference Paper (unpublished) |
---|---|
Conference Name | International conference on information and coomunication systems |
Start Date | Feb 4, 2025 |
Acceptance Date | Jan 15, 2025 |
Deposit Date | Feb 20, 2025 |
Electronic ISSN | 2255-8691 |
Publisher | De Gruyter Open |
Peer Reviewed | Peer Reviewed |
Keywords | Sentence Transformers; Vector storage; WordNet; Vector; databases; Natural language processing |
Public URL | https://uwe-repository.worktribe.com/output/13779825 |
You might also like
A survey on customer churn prediction using machine learning and data mining techniques in e-commerce
(2022)
Presentation / Conference Contribution
An improved convolutional neural network for churn analysis
(2023)
Journal Article
Green computing emerging trends
(2015)
Presentation / Conference Contribution
Downloadable Citations
About UWE Bristol Research Repository
Administrator e-mail: repository@uwe.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search