Font Size: a A A

Research And Application Of Natural Language Processing In Information Retrieval

Posted on:2020-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhongFull Text:PDF
GTID:2428330590979099Subject:Engineering
Abstract/Summary:PDF Full Text Request
The 21 st century is the golden age of the Internet.During this period,information technology has developed rapidly,and the Internet has become the largest knowledge repository.Its content is vast and all-encompassing,and it is an important source for people to seek knowledge and solve puzzles.Information retrieval system,as an efficient tool for people to access network resources,plays an important role from beginning to end.However,the traditional keyword-based full-text retrieval system has some problems,such as incomplete retrieval results and low relevance.In view of the shortcomings of the current retrieval system,this paper optimizes the retrieval system by using the relevant technology of natural language processing,and realizes the expansion of query keywords.This paper designs a word similarity calculation method based on encyclopedic entry information.The method obtains the overall similarity of given vocabulary pairs by the content similarity between the corresponding business card,the main body of the vocabulary,the open classification and the four parts of the related vocabulary.This method is used to extract words with similar meanings from the Chinese Dictionary of HowNet as extended words.In addition,this paper also achieves the extraction of user interest information,and takes the results as the basis for ranking and optimizing the search results.The main work of this paper is as follows:(1)The Simhash algorithm is deeply studied and an improved TTSimhash algorithm is proposed.TTSimhash algorithm uses ICTCLAS word segmentation technology.TF-IDF method is introduced in the initial weight calculation of keywords,and the factors of part of speech and word length are considered.Based on PageRank,a graph model is built for text.The final weight of keywords is obtained by voting for the target node through the adjacent node and the edge relationship between the adjacent node and the target node.(2)Combining with the improved TTSimhash algorithm,a word similarity calculation method based on encyclopedia entries is designed.The new method relies on the content of Baidu entries,and uses the similarity between the parts of the entries to weigh the overall similarity between the words.The algorithm is used to calculate the similarity between candidate words and conditional keywords.(3)Design and implement the query expansion module of information retrieval system.With the help of HowNet and the word similarity calculation method proposed in this paper,the words with similar semantics are obtained,and the query condition keywords are expanded.Make the content of retrieval results more comprehensive.(4)Design and implement the personalized module of information retrieval system.By collecting and analyzing the information used by user browsers,such as browsing history,collector information,the keywords of user interest and hobby are extracted.The retrieval results are optimized based on the obtained interest features.The system test results show that the application of this method in information retrieval is effective and feasible.It can effectively improve the efficiency of information retrieval and help users get the desired results.
Keywords/Search Tags:information retrieval, natural language processing, word similarity, Simhash, TTSimhash
PDF Full Text Request
Related items