Font Size: a A A

Research And Realization Of Text Retrieval Technology Based On Keywords Query Expansion

Posted on:2015-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2298330434452320Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and network technologynowadays, the human society has entered a new information era. In the face of presenthuge information resources, how to obtain the information accurately and apply itefficiently has become an urgent issue to be addressed in the information retrievalfield.The main research of this thesis is to find the texts which are relevant to the textgiven by users in the specific text library sets. In most cases, the users’ retrievalrequirement is only one simple keyword or several keywords. However, the users’retrieval requirement in the current study is a whole text which largely increases theamount of information. By using a method of weight calculation, namely TF-IWF, thethesis attempts to improve the accuracy rate for testing text keywords. Consideringthe role of semantics in the text, the thesis gets expanded forms of keywords fromHowNet by analyzing semantic similarity among different concepts and then conductsretrieval by using the set of expanded keywords. The result shows that the abovemethod has improved the retrieval performance effectively.The specific work has been done as follows:(1) A new word similarity algorithm is proposed in the thesis. Firstly, the thesisputs forward a new sememe classification method and takes distinguishing calculationmethods based on different sememe features. Sememe is divided into first basicsememe, other basic sememe and indirect sememe. Secondly, acknowledging thesignificance of the first basic sememe in the word semantic similarity calculation, thethesis selects the semantic items to be used in the calculation by a comparison of thefirst basic sememe. This process can reduce the calculation difficulty and greatlyimprove the efficiency. Finally, the thesis replaces the maximum value by arithmeticmean value of these selected semantic items’ similarity degree, which has improvedthe calculation objectivity obviously.(2) The thesis uses the TF-IWF method to calculate the weight of words in thetext and then extracts keywords based on the results. Experiments show that thismethod can effectively reduce the influence of the similar corpus to the sample text.(3) The thesis conducts a query expansion of selected keywords in the semanticlevel and compares the similarity degree of different texts by vector calculationsbetween query texts and library texts in the vector space modal. Meanwhile, the text retrieval results which meet the requirements of threshold value are acquired. At last,the analysis of experimental results shows that the retrieval results are consistent withpeople’s expectations and finally proves the feasibility and effectiveness of themethod in the current study.According to the experimental results, the full text retrieval techniques andmethods introduced by this thesis completely conform to the query expectations withcertain practical value. The thesis possesses higher precision ratio, with the recall ratiomaintaining at an appropriate values.
Keywords/Search Tags:Semantic Similarity, Keyword Extraction, Query Expansion, HowNet
PDF Full Text Request
Related items