Font Size: a A A

Research On Intelligence Processing Of Searching Result

Posted on:2015-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2298330452964144Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the development of Internet technology, the volume ofinformation is increasing with an exponential speed. Searching forinformation in need becomes an important part of people’s daily life. Usingsearching engines is one of the most common method of seekinginformation, but the huge amount of result items returned by searchingengines make the information seeking process much harder. So anintelligence processing method is demanded. This paper studied the wholeprocess of the analysis of searching result, and proposed a solution combinedata-grabbing, natural language processing and data-mining.For the data-grabbing part, this article optimized the parsing method ofhtml web page content to avoid the limit set by searching engine of visittime and frequency.For the NLP part, this article proposed an algorithm about theexpansion of word segmentation dictionary which implemented thetraditional word segment algorithm based on the dictionary. And more thanthat, this algorithm can be used to digging the new word match from corpus. In the construction of vector space model part, this article proposed a newmethod which is based on the relationship between word rather than thestatistics. Using iterative process similar to PageRank algorithm to handlethe adjoining, modify, pronominal relationships. The experimental datashows that our new vectorization algorithm has an advantage upon thetraditional ways in accuracy rate and more suitable for short text likesearching engine result items.For the searching result item reconstruction part. Considered thedifference in importance between the result items, we proposed a key itemclustering and other items classifying solution. Using clustering resultevaluate system to make sure the cluster number is well defined. Thismethod got an ideal result on the searching result for ambiguity query keywords. Before the final result is returned, this article resorts the items rely onweight of searching engines, repetition of result items and item contents.This searching engine result processing method can supportmeta-search, i.e. grab data from different searching engines, so that theprocessing has an excellent expandability.
Keywords/Search Tags:Searching results, Segmentation, Data-mining, Vectorization, Clustering
PDF Full Text Request
Related items