Font Size: a A A

The Study Of Topic-Oriented IT News With Search Enging And Web Page Analysing

Posted on:2011-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhaoFull Text:PDF
GTID:2218330338965268Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
News is part of important information for people's life and entertainment. Some influential events such like topic-oriented news are interesting and sciential. Here, News stands for freshness, and Series indicates features. Recently, Internet is becoming to an important platform for news and other information. Among these various news, which spread rapidly through Internet, IT plays an essential role. It is increasingly difficult to acquire knowledge manually. Technologies of search engines develop very quickly with Internet information explosion. Under the spreading of the internet, search engine, eg. Google, develops rapidly. Search engines help us to extract information include news from Internet easily. So the key point of the essay is how to dig out the hidden meaning and the related information to 'News-Series'.The process of extracting IT topic-oriented news essentially is called web infomation extracting or text mining. According to Search engine analysing based on user interest searching and hot news samples of IT events in 2009, we make a model named Trade-role Model. All the topic-oriented news mined follows two steps: First, we research characteristics of Google search engines and others. This step is the basis of the research, the quality of extracting directly determine the success or failure of follow-up work. According to their features we select using self-program to extract information and Finds the method and reuses the search results via comparing Extracting the URL series in searching results uses trade-role model comparing. Second, the paper analys the HTML page got by URL from first step and converts HTML page to text files and extracts IT topic-oriented news text from them by text mining programming. The process of mining text is based on Trade-role Model by analysing and comparing paragraphs of the texts. Finally the program focuses texts to IT related news. The experimental results of mining is below, the average precision is 90.2%, and the average recall is 72.8%.As searching and extracting would waste people a massive time and energy, the search engine and mining program not only efficient but accurate to accelerate the spreading of news. This is the purpose and value of this topic.
Keywords/Search Tags:topic-oriented IT news, search engine, trade-role model, text mining
PDF Full Text Request
Related items