Font Size: a A A

Reasearch On Key Technologies About Labeling The Content Of Internet Websites By Using Multi-tag

Posted on:2017-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2348330518995955Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the increased amount of data in the Internet,people have put forward higher requirements to retrieve information resources.For the information retrieval,people not only require to improve the accuracy,but also require the ability to retrieve a range of related resource.After the study of text categorization and website labeling technology,an improved algorithm is proposed.Through the website's content multi-tag research,this thesis strives to contribute to the website retrieval.The thesis's achievements are listed as follow:Firstly,propose a new organization method named the structure of website information.It's different from the physical structure and the logic structure of websites.And it can not only reduce the redundant information between pages of websites,extract the structure information of websites precisely,but also get better classification results and higher computing efficiency.Secondly,create a tag library system to label the website conveniently.After learning and analyzing the web portals,pages of websites can be divided into three parts according to the form of pages.Therefore,the three-level tag library,whose structure is the same as the website theme,is created.In the three-level tag library,the method of labeling pages is different when the level is different.Thirdly,propose an algorithm to label the structure of the website information by using multi-tag.Comparing with the traditional classification algorithm,it has taken the structural characteristics and the content of websites into account.Therefore,the accuracy can be improved when using the algorithm to classify pages.At last,this thesis has used the algorithm proposed in this thesis and SVM classification algorithm to test the same data set and verifies the accuracy of the algorithm by calculating precision,recall and F-score.
Keywords/Search Tags:structure of website information, website features, multi-tag, tag library, classification
PDF Full Text Request
Related items