Research On Enterprise Competitive Intelligence Acquisition Based On Web Information Extraction

Posted on:2016-05-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y G He

Full Text:PDF

GTID:2208330464463531

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development and popularity of Internet, the network has become an indispensable part of peopleâ€™s life. There are many kinds of information in the network, which use webpages as carriers to users. The rich information contained in the webpages provide a new source of intelligence information for the Enterprise Competitive Intelligence System(ECIS). The purpose of this thesis is to research a kind of universal method to obtain enterprise competitive intelligence. Based on the study of existing web information extraction technology, this thesis presents a new algorithm on web information extraction on the basis of DOM tree and DBSCAN algorithm. Afterwards the model of enterprise competitive intelligence acquisition which is based on web information extraction is researched and constructed.First, this thesis comprehensively and systematically introduces the present status of web information extraction and enterprise competitive intelligence. The basic theories of ECIS and enterprise competitive intelligence acquisition are also discussed. Then several web data processing technologies which will be used in this thesis are analyzed, such as web crawler technology, Jsoup webpage analysis technology, DOM and DBSCAN algorithm. After that, the basic concept, technologies and evaluation standards of web information extraction are introduced in detail.Secondly, this thesis presents a new algorithm of web information extraction which combined DOM tree with DBSCAN algorithm by researching the universal rules of various and changeful structures of webpages on the Internet. The several parts of the algorithm are introduced in detail, include webpage pretreatment, construct DOM tree and segmented text content acquisition, webpage content extraction based on DBSCAN. It shows that the algorithm can obtain the main text information in webpage effectively through the experiment results. Besides, the algorithm has strong universality, which is independent of the webpageâ€™s structure.Finally, the model of enterprise competitive intelligence acquisition which is based on web crawler technology, webpage analysis technology and web information extraction algorithm is constructed for an enterprise of an industry. According to the reserved website, the model gets the url of all links in the website through web crawler. Then it filters the webpages by judging the title of the webpage is related to the field of the industry. Next, the main text information of the filtered webpage is obtained. After that, the enterprise competitive information is extracted from the main text of the webpage according to the reserved information, which the enterprise focus on. Based on the model, the enterprise competitive intelligence acquisition prototype system is designed and implemented. Under the experiment result, the model of enterprise competitive intelligence acquisition which is based on web information extraction is right. Meanwhile, the model has a certain correctness.

Keywords/Search Tags:

DOM Tree, DBSCAN, Web Information Extraction, Enterprise Competitive Intelligence, Competitive Intelligence Acquisition

PDF Full Text Request

Related items

1	Web Information Extraction Technology Applied Research, Competitive Intelligence Platform In The Enterprise
2	Research On Web-based Collection Technique For Enterprise Competitive Intelligence
3	Based On The Information Environment Of The Securities Company Model Of Competitive Intelligence Research
4	Based On The Protection Of The Asymmetric Information Theory Of Competitive Intelligence Research
5	Research And Implementation Of Key Technologies Of Date Mining Oriented To Enterprise Competitive Intelligence
6	Extracting Enterprise Competitive Intelligence From The Web
7	The Research On Construcation Of Enterprise Competitive Intelligence System Of HS Company
8	Research On The Construction Of Competitive Intelligence System For Enterprise Merger And Acquisition
9	Study On College Library's Competitive Intelligence Service
10	Research On The Construction Of The Cultivation System Of Competitive Intelligence Capability Under The Environment Of "Internet +"