Font Size: a A A

The Design Of Listed Companies Negative Information Collection And Retrieval Systems

Posted on:2014-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:L B XuFull Text:PDF
GTID:2308330464464357Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of internet, people start to be convinced that it is difficult to pick up useful information from the large volume of data. Under the background of this phenomenon, data mining technology began to develop rapidly from the 90’s of the last century. This research field synthesizes multiple subjects of the computer technology, such as machine learning and statistical analysis etc., it can help people to pick up useful information from the great number of data effectively, then analyse and use these useful information to help people make various decisions objectively.System adopted data mining technology andican be applied to all kinds of websites in network to gather the negative information. In this article, system is designed for eastern wealth stock forum, gathers the negative information from a certain quoted company. It realized the whole process of webpage collecting, preprocessing, word segmenting, text tendency analysis and full-text searching, following are the main functions:1. Webpage collection:Download webpages from eastern wealth stock forum and save them in the local folder.2. Webpage preprocess:Remove various useless tags in the webpage and pick up the main body text.3. Word Segment:As the precondition of data mining, word segmentation is necessary before judging the negative information.4. Negative information judgement:Judge the negative information via the text classification technology. After that, save the texts which contain negative information in the local folder.5. User search:By inputting stock code of the quoted company in the forum, users can look through the negative information in the browser.Besides of system design and system realization, article analysed and studied multiple algorithms of text classification technology, adopted the algorithm with high precision to realize negative information judgement functionality. In the end, article summarized the achievement of the project, looked ahead the relative technologies and further research to the article.
Keywords/Search Tags:Search engine, Web crawl, Negative information, Word Segment, Full-text search
PDF Full Text Request
Related items