Font Size: a A A

The Design And Implementation Of Information Retrieval And Retrieval Analysis Subsystem Of Scientific Research Literature

Posted on:2018-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:M YangFull Text:PDF
GTID:2348330536981620Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and the arrival of the data age,people are increasingly demanding data accuracy.It is the key to achieving accurate data analysis that getting more accurate and less noise data.Rapid retrieval of large amounts of data becomes a basic requirement for data processing.Rapid response and small delay system more in line with people's needs.In this paper,scientific research information mining system as the background.With specific business needs,This paper analyzes the current situation of text similarity,text keyword extraction and full text retrieval.Completed the research literature information extraction and retrieval analysis subsystem.As the subsystem of the information mining system of scientific resear ch literature.The system has completed the information extraction program design,text keyword extraction,text similarity between the comparison and full-text search technology.The information extraction uses the POI to read the scientific research document.Through the analysis of the text,that analysis the necessary attribute information of the subject.The Text Tank algorithm is improved by using the method of c alculating the weight of the text,so that it can extract the keywords of the text.The full-text search subsystem uses the proprietary search thesaurus to determine the search target.Search thesaurus through the computer professional vocabulary,subject areas,subject keywords.Using two-way Maximum Matching Word Segmentation Algorithm with proprietary thesaurus.By setting up an inverted index of the document to optimize the query structure and reduce the full text retrieval time.The BM25 probability model is used to rank the relevance.According to the level of relevance to the subject information presented in front of the user.The system also uses the memory database Redis to store the subject area,the subject direction,the key technology,search the thesaurus,and reverse the index table.Finally,proving that the system can complete the automatic information extraction,and quickly complete the full text search function with the actual test.This system meet the user's functional requirements and performance requi rements,it can be put into use.
Keywords/Search Tags:information extraction, full text retrieval, keyword extraction, inverted index, relevance ranking
PDF Full Text Request
Related items