Research And Realization Of Text Retrieval Technology Based On Keywords Query Expansion

Posted on:2015-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:D Wang

Full Text:PDF

GTID:2298330434452320

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology and network technologynowadays, the human society has entered a new information era. In the face of presenthuge information resources, how to obtain the information accurately and apply itefficiently has become an urgent issue to be addressed in the information retrievalfield.The main research of this thesis is to find the texts which are relevant to the textgiven by users in the specific text library sets. In most cases, the usersâ€™ retrievalrequirement is only one simple keyword or several keywords. However, the usersâ€™retrieval requirement in the current study is a whole text which largely increases theamount of information. By using a method of weight calculation, namely TF-IWF, thethesis attempts to improve the accuracy rate for testing text keywords. Consideringthe role of semantics in the text, the thesis gets expanded forms of keywords fromHowNet by analyzing semantic similarity among different concepts and then conductsretrieval by using the set of expanded keywords. The result shows that the abovemethod has improved the retrieval performance effectively.The specific work has been done as follows:(1) A new word similarity algorithm is proposed in the thesis. Firstly, the thesisputs forward a new sememe classification method and takes distinguishing calculationmethods based on different sememe features. Sememe is divided into first basicsememe, other basic sememe and indirect sememe. Secondly, acknowledging thesignificance of the first basic sememe in the word semantic similarity calculation, thethesis selects the semantic items to be used in the calculation by a comparison of thefirst basic sememe. This process can reduce the calculation difficulty and greatlyimprove the efficiency. Finally, the thesis replaces the maximum value by arithmeticmean value of these selected semantic itemsâ€™ similarity degree, which has improvedthe calculation objectivity obviously.(2) The thesis uses the TF-IWF method to calculate the weight of words in thetext and then extracts keywords based on the results. Experiments show that thismethod can effectively reduce the influence of the similar corpus to the sample text.(3) The thesis conducts a query expansion of selected keywords in the semanticlevel and compares the similarity degree of different texts by vector calculationsbetween query texts and library texts in the vector space modal. Meanwhile, the text retrieval results which meet the requirements of threshold value are acquired. At last,the analysis of experimental results shows that the retrieval results are consistent withpeopleâ€™s expectations and finally proves the feasibility and effectiveness of themethod in the current study.According to the experimental results, the full text retrieval techniques andmethods introduced by this thesis completely conform to the query expectations withcertain practical value. The thesis possesses higher precision ratio, with the recall ratiomaintaining at an appropriate values.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Chinese Information Extraction And The Method Of Summarization Generating Based On HowNet Semantic
2	Research And Application Of Information Extraction Based On Query Expansion
3	Query Expansion Based On Semantic Analysis And Local Documents
4	Research And Implementation Of Information Retrieval Based On Semantic Expansion And Matching In P2P
5	Keyword Extraction From News Web Pages
6	Research On Semantic Search And Related Technology
7	Research And Implementation On Query Expansion Model Of Information Retrieval Based-on Conceptual Graph
8	The Research Of Semantic Similarity Computing Algorithm Based On HowNet
9	Chinese Word Semantic Similarity Measure And Its Application In Cross-language Information Retrieval
10	Query Expansion Optimization Based On Semantic Similarity