Font Size: a A A

Based On The Theme Of The Web Information Extraction And Intelligent Search Technology Research And To Achieve

Posted on:2008-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:R Q XieFull Text:PDF
GTID:2208360212975268Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Though traditional Search Engine provides a good way for us to find useful information from web, searching results can't catch up with our needs in some situations. How to express query well is the problem. Special Search Engine can work more efficient for special topic information searching.The paper analysis and designs a special Search Engine frame based on traditional Search Engine's. Topic-oriented crawler is instead of Traditional crawler in the frame, and clustering module is added. Topic-oriented crawler adds Topic relativity decision module compares with Traditional crawler. Topic relativity includes URL topic relativity and page topic relativity. Clustering module implement by FCM algorithm. Tow improvements are designed for its' disadvantage. (1) Initial clustering center value comes from HCMs' output. (2) Select first-rank cluster number by validity function.Some key detail implements are discussed at last. The contents include: URL seed selecting, dictionary preparing, discarding repeat page, MD5 algorithm, FSM algorithm and key programming techniques.
Keywords/Search Tags:search engine, IE, topic relativity, fuzzy clustering
PDF Full Text Request
Related items