Skip to main content

Research Repository

Advanced Search

Generating unambiguous URL clusters from web search

Smith, Gavin; Donner, Christoph; Hooijmaijers, Dennis; Truran, Mark; Goulding, James; Ashman, Helen; Brailsford, T.


Gavin Smith

Christoph Donner

Dennis Hooijmaijers

Mark Truran

James Goulding

Helen Ashman

Tim Brailsford
Head of Department Computer Science and Creative Techonologies


This paper reports on the generation of unambiguous clusters of from clickthrough data from the MSN search query log (the RFP 2006 dataset). Selections (clickthroughs) by a user from a single query can be assumed to have some semantic relevance, and the URLs coselected in this way be aggregated to form single-sense clusters. When the graphs a single term separate into distinct clusters, the semantics of distinct clusters can be interpreted as disambiguated of URLs. This principle had been tested on smaller more constrained datasets previously, and this paper reports findings from applying a method based on the principle to the 2006 dataset. paper evaluates the proposed coselection method for single-sense clusters against two other methods, with parameters. The evaluation is done both with a human to determine the quality of the clusters generated by the methods, and by a simple "edit distance" analysis to the content difference of the methods. main questions addressed are i) whether it is feasie to single-sense / sense-coherent clusters, and ii) whether, in closed world, it would be feasible to discover ambiguous terms. experimentation showed that sense-coherent clusters were and further indicated that ambiguous terms could be detected from observing small overlap between large clusters. Copyright 2009.


Ashman, H., Goulding, J., Truran, M., Hooijmaijers, D., Donner, C., Smith, G., …Ashman, H. (2009). Generating unambiguous URL clusters from web search.

Conference Name Proceedings of Workshop on Web Search Click Data, WSCD'09
Start Date Feb 9, 2009
End Date Feb 11, 2009
Acceptance Date Jan 2, 2009
Publication Date Jul 14, 2009
Peer Reviewed Peer Reviewed
Pages 28-34
ISBN 9781605584348
Public URL
Publisher URL
Additional Information Additional Information : This is the accepted version of the paper. The final version can be found online at
Title of Conference or Conference Proceedings : Proceedings of the 2009 workshop on Web Search Click Data


You might also like

Downloadable Citations