Font Size: a A A

Inverse usage frequency for topic analysis and fingerprinting

Posted on:2003-02-13Degree:Ph.DType:Dissertation
University:University of South CarolinaCandidate:Eason, James ChristopherFull Text:PDF
GTID:1468390011989628Subject:Computer Science
Abstract/Summary:
The current size and rapid growth of the World Wide Web pose a great challenge to Internet users seeking specific information or attempting to organize available resources for a particular topic. This dissertation examines how search services on the World Wide Web make use of information retrieval techniques and natural language processing methods to retrieve documents in response to user queries. The dissertation includes research that involved the collection and use of word frequency data to improve both the precision of search results and the task of topic management. The research involved collecting and analyzing word frequency probabilities from a large number of World Wide Web documents. The word frequency probabilities were then used to examine documents on particular topics to select the best keywords automatically for finding other documents on the same topic. This usage frequency metric was compared to document frequency metrics for selecting keywords automatically. The metrics were compared by using the automatically selected keywords in search experiments.; After giving background information about natural language processing and information retrieval, this dissertation provides theoretical background for the research and discusses the research plan. A description the software used to gather word frequency statistics and an analysis of the data that was gathered follows. The dissertation then describes the software used to analyze document sample sets to extract keywords and discusses the topic fingerprint sets derived using the keywords. It concludes by explaining and evaluating the search experiments that were performed using the topic fingerprint sets, analyzing the overall results, and describing directions for further research.
Keywords/Search Tags:Topic, World wide web, Frequency, Search
Related items