Font Size: a A A

Research On Automatic Abstraction Based On Search Engine Result

Posted on:2011-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:W J ZhangFull Text:PDF
GTID:2178360308490388Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Currently, most search engines were based on keywords matching for information retrieval. Because the number of query words inputted by the users was limited, so it was not well reflect the users'query intention. In addition, when search engines returned search results, which were called snippets to the end users, the first few lines of a Web document or the sentences containing query keywords were briefly extracted as a summary. This method was simple and the snippets were almost the users'most interested contents. However, the reliability and accuracy of the snippets were not high; meanwhile, if just given the snippets of the search engine, users could not directly understand whether the web document was indeed relevant without accessing it.According to the above problems, automatic summarization of web document returned by the search engine was researched in this paper. On the basis of users'query keywords expansion, a sentence weight calculation method, which was relevant for the users'query, was proposed. Through the effective use of the distance information between the keywords in one sentence, the accuracy of the summarization was improved and it would be convenient for the users to find the information they needed.Based on the idea of pseudo-relevance feedback, query keywords expansion was proposed. According to the users'original query keywords, sentences were divided into topic-relevant sentences and topic-irrelevant sentences. Only the noun and noun phrase contained in the topic-relevant sentences were selected as query expansion candidate's words. The expanded words were selected by calculating the correlation weight between the query expansion candidate words and the users'query keywords. Thus, the expanded words not only reflected the topic of Web documents, but also met the query relevance need.In the circumstance of the sentence importance calculation, the relationship between the query keywords contained in the sentence was considered. On the consideration of the principle"the nearer of the words, the closer of the word relationship", the weight formula for calculating sentence importance effectively introduced the words distance information which was calculated by statisticing the number of words between two neighbor query keywords. Experimental results showed the promising results of our proposed methods.
Keywords/Search Tags:Automatic abstraction, Query expansion, Pseudo-relevance feedback, Sentence weight calculation
PDF Full Text Request
Related items