Font Size: a A A

Stored In Corporate Competitive Intelligence, Intelligence Collecting Platform Based On Web-page Analysis

Posted on:2007-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z N ZhuFull Text:PDF
GTID:2208360185953688Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With our country Joined WTO along with whole world economic integration advancement speeding up, the market competition environment already has the sudden change. Enterprise's policy-maker has been impossible to depend upon the intuition and the instinct makes the decision in commerce. For making the correct decision to need to analyze the competitor frequently, prompt understanding competitor situation, therefore the consummated competition information collection system changes very important.Along with the Internet fast development, Web stored the massive knowledge for the people and already became a huge globalization knowledge warehouse. Obtained the information from Web already become the main way for the people obtain the knowledge. At the same time the website which the enterprise established increase unceasingly. Carried on the effective collection, the understanding, the study competitor situation through the enterprise website has also become the possibility.Web organization form mainly by the HTML page which the kind of half structurization form. The Web page constitutive improperly and hyperlink free disorder, as well as Web content has multiplicity and dynamic change characteristic and so on magnanimous, which cause the people has encountered the difficulty which some are unable to avoid when uses it.In order to solve these problems, this paper uses based on the Topic Information Collection Classification Resource Management Platform. The paper introduced the platform structure and each part of functions. It introduced the Web Page Collection Module and the VSM Web Page Classification Module emphatically. In order to realize these two modules, this paper introduces the. Bot grabbing web page technique and so on HTMLParser web page parsing, etc. The platform function is collecting HTML page on the Internet, parsing the web page's main content, title content, Meta label content and anchor text and letthese information store in the corresponding hard disk position, then according to these information classify the web page by the VSM method. Building a foundation for the further research of enterprise competition information collection system.
Keywords/Search Tags:Web page grabbing, Web page parsing, Text classifier, Web page classifier, VSM classifier
PDF Full Text Request
Related items