Font Size: a A A

Study Of Extracting Stock Comments On The Internet Based On Semantic

Posted on:2012-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:W T SunFull Text:PDF
GTID:2178330335952254Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Many financial web sites issue stock information every day. There are a huge amount of data and a large number of redundant and confused data on the web sites. The average users need to spend much time to get expected information about stockcomment. This paper analysed domestic stockcomment on the Internet and the the technology of information extraction, and make the study of extracting stockcomment from Internet.The studied work was as follows:(1)Achieve extracting information from the web page based on web spider. Firstly it finds the URL which contains expected information on the web pages, and puts the URL into a queue waiting to extract information.Then it downloads the web page which corresponds to URL in turn, and analyses the structure of the HTML document, finding the stock information.(2) Establish the library of the stock feature. Firstly, based on the analysis of a large number of stockcomments, the feature words often emerge frequently and can express the characteristic of the stock trend. Secondly, because the first phrase of stockcomment describes the stock feature and the last phrase of the stockcomment gives the proposal of the operation of stock. Through the analysis of the first phrase, several feature words are got. The stock feature can be described by one feature word or combined two feature words, so two libraries need to be established--the feature library and the combined library. By analyzing the last phrase, the proposed words could be extracted which were used to establish the proposed library. Finally, the three libraries were discussed.(3) Parsing the stockcomment information based on the feature library which has been already established.Firstly the extraction module fetchs a feature word from the feature library at every turn and match it with the stockcomment, then it gets the first feature word and the second one.If it can't match it with the feature library, it should fetch word from the combined library and parse the phrase, then it will get the feature word. After parsing the first phrase, it parses the last phrase and gets the proposed word. The process is similar with the process of the parsing of the first phrase.When parsing the whole phrase,it can use the database interface to store the stockID,the stockname,the feature word of stock, the stockcomment in the table in the database.(4)The designment and implementation of the module of extracting stockcomment information from internet.Firstly,this paper describes the overall design of the module,given the chart of the level of the system structure;in the following,with the chart of the crawling process of the web spider,it introduces the process of information extraction and how the controlling module masters the crawling processes.Finally,it describes the designment of the stockcomment feature library and the the main pseudo-code which describes the process of extracting feature words from stockcomment.
Keywords/Search Tags:stockcomment in the internet, semantic, web spider, extraction of information, feature of stockcomment
PDF Full Text Request
Related items