Research Of Dynamic Comment Extraction Based On Web

Posted on:2015-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:H Meng

Full Text:PDF

GTID:2298330467970277

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The advent of Web2.0era has promoted the change of the Internet from the pastinformation dissemination platform to todayâ€™s information exchange platform, on whichpeople can express their views and discuss on any topic they were interested in to form thepublic opinion effect. Although there were some people who use the Internet public opinionwith bad intentions too, for this reason analysis of public opinion was paid more and moreattention to, and the research about Web information extraction is the basis for these analysis.Web information extraction is the technology to extract specific structured informationfrom unstructured or semi-structured web pages. The paper describes the status of webinformation extraction technology. Focusing on the problems of the existing technologies thatare structure-sensitive and lacking in the research of dynamic multi-level comments extraction,a new semi-automatic information extraction system is designed which is divided intoinformation access module and comment extraction module. Information access module is atool which succeeds in getting full content of the dynamic pages automatically based onbrowser API, message sending mechanism and chrome extension technology. In commentsextraction module the concept of LFSU is proposed based on the visual, structure andsemantic features of dynamic pages, using its location nature to identify the comment area indifferent organizational model, and giving the method which can extract comments both in thesingle-level and multi-level. The method uses little information of DOM tree, and does notinvolve complex structural contrast and cluster analysis. Hence the algorithm is efficient.By analyzing the results obtained from the coverage experiences in the real situation, thispaper proves the information extraction method can meet the actual demand of the publicopinion data in blogs, and especially has a good result for those pages which contains morethan one comments. The recall ratio, precision ratio and F-Value are all above92%.

Keywords/Search Tags:

information extraction, dynamic pages, Chrome, LFSU, DOM

PDF Full Text Request

Related items

1	Research On Web Information Extraction Technology Based On
2	Design And Implementation Of A Directional Information Extraction Model For Dynamic Web Pages
3	Research Of Web Information Extraction Method Based On Multi-feature Mining
4	Research On Efficient Web Data Extraction Technology Based On Visual Information
5	The Research Of Dynamic Web Pages Information Extraction Algorithm Based On Sequence Alignment
6	Research And Application Of Web Pages Denoising And Information Extraction Algorithm
7	Research Of Automatic Metadata Extraction From Template Web Pages
8	Based On The Key Pages Of Information To Improve The Hits Algorithm, And Location Information Extraction Method
9	The Design And Development Of Textrank And Log-Likelihood Based Chrome Chinese Keyword Cloud Extension
10	Research On The Rapid Extraction Method Of Url For Dynamic Pages