Font Size: a A A

City Encyclopedia Auto-contruction System

Posted on:2011-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:H J YangFull Text:PDF
GTID:2198330338489600Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology,the influence of internet on people's life is more and more,users have more intense to gain the comprehensive, authoritative and regional information from internet.For example,people who live in shenzhen have more interest in informations about shenzhen.How to meet people's demand of regional information,and how to classify text,these are the main research contents of the page.The paper will automatically construct city encyclopedia knowledge based on the data of baidu encyclopedia.The paper deal with webs's infomation based on the relevant technology of natural language processing.The paper mainly includes two aspects:constructing information retrieval system and constructing city classification system.(1) City encyclopedia information retrieval systemThe paper focuses on the data processing analysis of baidu encyclopedia.The paper realize the data information retrieval system based on baidu encyclopedia data.Firstly realize web crawler,secondly realize web tidy,and then realize in-line index and invert index,lastly realize keywords searching and so on. In order to improve the accuracy of searching, the system has been optimized by paragraph as a basic uint index.If other conditions are same ,experiments prove that the precision of the system indexed by paragraph is more 50% than the precision of system indexed by document.(2) City encyclopedia classification systemCity encyclopedia classification approach mainly includes two aspects:city classification and text classification.Different classification system needs different methods.Two key factors affects text classification system:features extraction and text classification algorithms.The paper has studied the current main methods of feature extract method and text classification algorithm,the page has optimize some realization.In order to satisfy the requirement of accuracy and recall under different conditions,this paper improve classification algorithm and feature extract algorithm.Experiments prove that the value of F has been increased by 10% with the improved methods.The paper includes soft classification and hard classificatio n,different classifical strategies has been used by different classification,and gain good results.The technologies of the above system in this page have been applied to the city encyclopedia auto-contruction system.
Keywords/Search Tags:information retrieval, text classification, feature extraction, city encyclopedia
PDF Full Text Request
Related items