Font Size: a A A

Research On Optimization And Recommendation Of Internet Information Storage And Retrieval

Posted on:2017-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:C C YaoFull Text:PDF
GTID:2308330509957104Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet around the world, the pace of social informatization construction in China is constantly accelerating. The data on the network is also in the direction of the diversification of explosive growth. In this context, it is urgent to solve the problem of how to store, manage massive Internet information and how to retrieve useful information from these massive information rapidly and timely. The traditional storage and retrieval scheme can not deal with the problem of real-time retrieval and personalized retrieval in the case of the amount of data.Firstly, by comparing the experiment and analysis of the existing open source distributed storage and retrieval system infrastructure with extensive influence, this paper designed a storage retrieval system infrastructure based on Hadoop and Elastic Search combined with the characteristics of Internet information. Hadoop is mainly used to ensure the reliability of the data storage and achieve personalized cross domain recommendation algorithm. Elastic Search is mainly served to solve the problem that how users quickly retrieve the necessary information from the massive information data..Secondly, through the analysis of the characteristics of Internet information data, this paper makes a series of optimization and improvement algorithms to improve the performance and stability of the system. According to the diversity characteristics of Internet information formats, unstructured data extraction plug-in which supports lateral expansion was designed and the data was processed through the parallel computing framework of Hadoop-Map Reduce; For Internet information featuring Chinese text and news text data featuring high repetition rate, word segmentation optimization algorithm and repetition removal algorithm in news were designed; finally, the parameters of operating system and the parameters of application system were combined to further optimize the parameters of retrival system.Thirdly, based on the combination of user information and text information, this paper designed a cross-domain associated text information recommendation algorithm. The realization of keyword extraction and classification algorithm for text information is to facilitate users to search text information more accurately. By mining users of weibo content, we design weibo content based text recommendation algorithm. The feature extraction of user information can also be personalized for users to recommend text and friends information.Finally, through the experimental verification system and the experimental performance, news text classification and personalized text recommendation effect, this paper completed the design of the optimization system of Internet information and retrieval, and through the web interface to display the entire Internet information retrieval and recommendation system. After the detailed system testing, the system achieved the expected requirements.
Keywords/Search Tags:storage, retrival, recommend, Hadoop, Elastic Search
PDF Full Text Request
Related items