Font Size: a A A

A Research Of Keyword Theme Searching Based On Hot And Cold Anticipation Classification Model

Posted on:2016-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2308330467982271Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowdays, along with the development of information technology and computerindustry. People’s desire to information is improving every day. In the face of vastamounts of messy data, how to quickly get the information which is the people need,this would have become the current problems to be solved. So how effectiveclassification and search for these huge amounts of data becomes a hot spot. It hascreated a new computing model in recently: cloud computing because of the rapiddevelopment of technology. Cloud computing because of its large scale, can bevirtualized, good versatility, scalability, and relatively inexpensive, now more andmore data and applications are beginning to use this cloud computing platform.Data within the cloud computing system can use cloud computing platform’shighly advantages, compared to traditional data management, it has been improved toa large extent already. So search technology is gradually transferred to the newplatform to build distributed, gradually replacing the concentrated way.In this paper, the main research work:(1)First of all, aimed at the clutter webpage information cannot be accuratelyand quickly found, which is most relevant to the search topic.In this paper, we rely onthe improved website ranking algorithm (which is based on the classic PageRankalgorithm improvements), to expect to get more data information which is closetopics.(2)Next, for a large number of pages of information,we obtained simple textinformation. In view of the present data classification storage which did not considerthe cold and hot features. In this paper, we build a pre-sentence model of cold and hotto store data separately. Based on the hot and cold classification for topic categoriessorting. In order to build indexes effectively.(3)In order to complete the subject category classification, aiming at bad topickeyword extraction function in specific application scenarios. In this paper, throughthe improvement of the TF-IDF keyword extraction algorithm for topic keywordextraction. In this article, consider the algorithm appears insufficient for us to extractbased on improved TF-IDF topic keywords, proposed a apply to temporaryemergency information data theme keyword extraction algorithm.(4)Finally, in order to further improve the search performance, comparing tothe relatively time-consuming situation in the past, this article also introduce the indexing technology improvements by use of distributed architecture through Hadoopplatform. Using the principle of node distribution storage, the data were distinguishedaccording to the different categories of hot and cold, then similar theme categories ofdata stored as a class. According to the different nodes of different data theme, we putthe inverted index to distributed structure. In order to achieve the purpose ofimproving the efficiency of the search, we build keyword subject classification baseddistributed inverted index.
Keywords/Search Tags:cloud platform, hot and cold anticipation classification model, topic keywordextraction, classification, distributed index
PDF Full Text Request
Related items