Commercial Information Mining From Competitors' Websites

Posted on:2012-12-02

Degree:Master

Type:Thesis

Country:China

Candidate:J C Liu

Full Text:PDF

GTID:2218330338970523

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As a popular communication channel, the Web has attracted more companies to publish their information online. With more competitors'information publicly available, it has become an opportunity for a company to know more about its competitors and gain business intelligence and competitive advantage. However, finding valuable information from competitors'websites is not an easy task, because first the number of web pages is so large that it is not feasible to seek such information manually, and second, patterns and hidden relationships between entities cannot be found without a collective analysis.Information retrieval applications, especially web search engines, can be designed to overcome the first difficulty. But to use search engines, a user has to know his/her information need to formulate the query. In the circumstances of finding unexpected information, such information needs remain unclear until the results are presented. Search engines also lack of analysis of retrieved documents; patterns across documents cannot be found easily. Current search tools'ability in obtaining business intelligence is very limited.We proposed a number of concepts or methods to mine commercial information that is unexpected to users form competitors'websites. For example we use the knowledge of user's own company's website as the user's background knowledge of the same business of competitors. The reason why we could do like this is that it's reasonable to assume that similar company has the similar basic information, and people in the same business know the basics of that business. Then we compare the user's website and the competitor's website to mine various unexpected information as various type of user's requests. The core techniques applied in these methods is mining the keywords of web pages and counts the raw frequency of each keyword. Then we analyze the results, compare and rank them to filter the useless information and get what the user want.But compared with the English websites, mining Chinese websites is harder for there's no clear mark between one word to another in Chinese sentences. So the first step of mining is to separate Chinese characters and identify each word in one sentence. The Chinese word extractions without dictionary based on statistic information is advanced in identify the unlisted words (such as person name, place name, company name and so on) and technical term in professional field. In this paper, we have researched an automatic and dictionary-free Chinese word segmentation method based on suffix array, improved the method that counting the frequency of the candidate words and reduced the number of judgments whether two candidate words have the father-and-son relationship when filtering the candidate words. We've also introduced three typical Dictionary Mechanisms and designed a new Dictionary Mechanism that is First Two Hash Sub-PATRICIA tree. The new mechanism integrates the advantage of existed linear Dictionary Mechanism and the advantage of the PATRICIA tree mechanism; it can improve the speed of the word segmentation.

Keywords/Search Tags:

Text Mining, Competitor, Commercial Information, Association Rule, Word Segmentation

PDF Full Text Request

Related items

1	Study On The System Of Chinese Automatic Word Segmentation Based On Text Information Of BBS
2	The Research And Application Of Text Association Rule Mining Method
3	The Key Techniques Research On Text Mining
4	Building Of Classification Method And Classifier About Text Complaints Information Based On Association Rules
5	Research On Mining Algorithm Of Association Rule And Its Application For Biological Data
6	Research And Implementation Of Text Mining Technology Based On Public Security Information
7	Text-oriented Disciplines Correlation Analysis Association Rule Mining Technology Research
8	The Design And Implementation Of Text Topic Key Word Processing System Based Chinese Word Segmentation
9	Association Rule Mining Expansion Of Research In The Area Of disaggregated Data
10	Studies And Applications Of Association Rule Mining Methods In Data Mining