Study Of Extracting Stock Comments On The Internet Based On Semantic

Posted on:2012-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:W T Sun

Full Text:PDF

GTID:2178330335952254

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Many financial web sites issue stock information every day. There are a huge amount of data and a large number of redundant and confused data on the web sites. The average users need to spend much time to get expected information about stockcomment. This paper analysed domestic stockcomment on the Internet and the the technology of information extraction, and make the study of extracting stockcomment from Internet.The studied work was as follows:(1)Achieve extracting information from the web page based on web spider. Firstly it finds the URL which contains expected information on the web pages, and puts the URL into a queue waiting to extract information.Then it downloads the web page which corresponds to URL in turn, and analyses the structure of the HTML document, finding the stock information.(2) Establish the library of the stock feature. Firstly, based on the analysis of a large number of stockcomments, the feature words often emerge frequently and can express the characteristic of the stock trend. Secondly, because the first phrase of stockcomment describes the stock feature and the last phrase of the stockcomment gives the proposal of the operation of stock. Through the analysis of the first phrase, several feature words are got. The stock feature can be described by one feature word or combined two feature words, so two libraries need to be established--the feature library and the combined library. By analyzing the last phrase, the proposed words could be extracted which were used to establish the proposed library. Finally, the three libraries were discussed.(3) Parsing the stockcomment information based on the feature library which has been already established.Firstly the extraction module fetchs a feature word from the feature library at every turn and match it with the stockcomment, then it gets the first feature word and the second one.If it can't match it with the feature library, it should fetch word from the combined library and parse the phrase, then it will get the feature word. After parsing the first phrase, it parses the last phrase and gets the proposed word. The process is similar with the process of the parsing of the first phrase.When parsing the whole phrase,it can use the database interface to store the stockID,the stockname,the feature word of stock, the stockcomment in the table in the database.(4)The designment and implementation of the module of extracting stockcomment information from internet.Firstly,this paper describes the overall design of the module,given the chart of the level of the system structure;in the following,with the chart of the crawling process of the web spider,it introduces the process of information extraction and how the controlling module masters the crawling processes.Finally,it describes the designment of the stockcomment feature library and the the main pseudo-code which describes the process of extracting feature words from stockcomment.

Keywords/Search Tags:

stockcomment in the internet, semantic, web spider, extraction of information, feature of stockcomment

PDF Full Text Request

Related items

1	Study On Reasoning Of Stockcomment Information And Its System Architecture Based On Semantic
2	Application And Research Of Information Extraction And Topic Spider For Criminal Investigation Web Pages
3	Research And Implementation Of A Web Information Extraction System Based On Semantic Structure Of The Website
4	Based The Multidimensional Semantics Internet Drug Information Extraction Research Applications
5	User Web Information Collection And Analysis System Based On The Smart Router
6	Research On Chinese Feature-based Semantic Relation Extraction Between Named Entities
7	Semantic Feature Extraction Algorithm, The Contents Of Text Classification
8	Research And Implementation Of Web Topical Information Extraction Method With Semantic Consideration
9	Design And Research Of Network Spider
10	Research On The Techniques Of Semantic-based Internet Information Analysis