Skip to main content

Research Repository

Advanced Search

A novel approach to document similarity retrieval using sentence transformers and vector databases

Kalaiselvi, Vkg; Gopal, Priya

Authors

Vkg Kalaiselvi

Priya Gopal



Abstract

This study introduces a novel method for document similarity retrieval, leveraging Sentence Transformers for efficient processing and Milvus for vector storage. The workflow starts by extracting text from crowd-sourced vector databases and segmenting it into individual sentences. These sentences are transformed into embeddings using Sentence Transformer, creating a robust text representation. The embeddings are stored in Milvus, facilitating high-performance similarity searches. To improve query relevance, we enhance user queries with synonyms from WordNet, addressing different spellings and related terms. Our approach effectively tackles duplicate detection and spelling variations through vector similarity measures and customized indexing, ensuring accurate retrieval and ranking of relevant documents. Unlike traditional methods, our system integrates advanced natural language processing techniques with the capabilities of vector databases, resulting in a precise and efficient information retrieval system. Our method significantly enhances precision, recall, and F-score, making it a valuable contribution to the field of information retrieval.

Presentation Conference Type Conference Paper (unpublished)
Conference Name International conference on information and coomunication systems
Start Date Feb 4, 2025
Acceptance Date Jan 15, 2025
Deposit Date Feb 20, 2025
Electronic ISSN 2255-8691
Publisher De Gruyter Open
Peer Reviewed Peer Reviewed
Keywords Sentence Transformers; Vector storage; WordNet; Vector; databases; Natural language processing
Public URL https://uwe-repository.worktribe.com/output/13779825


You might also like



Downloadable Citations