Font Size: a A A

Research On Web-based Collection Technique For Enterprise Competitive Intelligence

Posted on:2013-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:J SunFull Text:PDF
GTID:2248330371497508Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
With the fast development of the Information Technology, there are an increasing number of shared resources on the Web available between users and enterprises, which brings new opportunities in the intelligence-gathering means. At the same time, this makes some enterprises facing new challenges. How to effectively collect timely and reliable information resources from Internet becomes the research focus among business users. Common general Search Engines(SE) can meet the needs for most users, while they can not assure the timeliness and personalized searching of collected pages when facing the enterprises gathering.This paper is intended to use the Internet as a way of collecting intelligence, with web mining techniques as the means of information access, design and implement a related automated intelligence platform for some enterprises to cany out the competitive intelligence. Eventually, our purpose is to improve the intelligence efficiency in Web information age. to find out new opportunities in the market and reduce the cost in business.In this paper, we research and design abusiness automation competitive intelligence gathering system. This system aims to solve some problems which the intelligence agents faced in Web documents gathering, and to provide decision support for senior managers. The specific work is as follows:(1) Firstly, this paper discusses the important realistic meaning to enterprises about how to take advantage of CI under global background, and points out that the shortcomings of existing competitive intelligence softwares in domestic and overseas markets.(2) Secondly, we introduce the principles of topic crawler, and get a deep research on a series of key technologies, including crawler seed custom, Web document pre-processing, character encoding, the Chinese word segmentation, page formatting and so on.(3) Thirdly, we had a depth study of the architecture of topic crawler, and optimized the crawler structure according to the third-party portal Web page features.M) Fourthly, we use a kind of improved TF-IDF method based on high quality data souces to extract thematic words from Web documents. Our results show that we can have large higher accuracy and higher recall rate at much lower cost with the improved algorithm. (5) Finally, we designed and developed an automatic medicine-oriented intelligence collection system. The system can customize competitors web pages, collect information from authoritative portal websites, and present intelligence report to the intelligence agents with a certain format.
Keywords/Search Tags:Competitive Intelligence, Web Information Extraction, Topic Crawler, Web PageFormatting
PDF Full Text Request
Related items