Font Size: a A A

Research On The Recognition Of Link's Topic Drift With Short Text

Posted on:2017-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:X G JinFull Text:PDF
GTID:2428330485461668Subject:Information Science
Abstract/Summary:PDF Full Text Request
Because of the rapid development and enormous progress of internet technologies,hyperlink has become very common phenomenon.Hyperlink has wide varieties,include relevant recommended links?resource links?structural links?ad links?link spamming and so on.The quality?intensive degree and distribution characteristics are closely related to thequality of the website and users' access frequency.So these questions are always the research hots in link analysis area.However,because hyperlink is becoming more and more,the problem of link' s topic drift is increasingly serious.There are a lot of links which don' t have the problem of topic drift in the internet,such as relevant recommended links?resource links?extended links.At the same time,there are also more links which have serious problem of topic drift,such as irrelevant recommended links?structural links?ad links?link copyright?invalid links and hidden links.These links have the problem of topic drift inordinately,which will be researched in this paper.The problem of link' s topic drift can influence the quality of the web page?users' experience and the crawler algorithm of search engines.This paper will research the problem of link' s topic drift by using short text in the web page,from the perspective of qualitative and quantitative,which will contribute to improving and solving the problem.So the main research contents are as follows:Chapter one,introduction.This chapter discusses the purpose,basics,methods,technology roadmap of this research and innovative points.Chapter two,research review.This chapter introduces the research status of link analysis and link' s topic drift,from the perspective of PageRank,HITS and other algorithms of link analysis.Chapter three,research design.This chapter describes the mainconcept and problem,and introduces the research thought?strategy and technical route.Chapter four,data collection and pre-processing.This chapterdescribes the selected data source?the methods of data pre-processing and the process of data collection.Chapter five,recognition of link' s topic drift by link'ssurrounding text qualitatively.This chapter describes the calculation of similarity?hand-annotated process and how to recognize link' s topic drift by C5.0 decision tree.Chapter six,the algorithm of topic drift coefficient with anchor text.This chapter describes the method of calculating the similarity of word and anchor text by using search engine,the process of data collection and how to get the calculation formula of topic drift coefficient by using multiple linear regression.Chapter seven,summary and expectation.This chapter summarizes the research further and puts forward several conceptions that can be taken into consideration in the future research.
Keywords/Search Tags:link analysis, topic drift, short text, text mining, decision tree, multiple linear regression
PDF Full Text Request
Related items