Font Size: a A A

Commercial Information Mining From Competitors' Websites

Posted on:2012-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:J C LiuFull Text:PDF
GTID:2218330338970523Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a popular communication channel, the Web has attracted more companies to publish their information online. With more competitors'information publicly available, it has become an opportunity for a company to know more about its competitors and gain business intelligence and competitive advantage. However, finding valuable information from competitors'websites is not an easy task, because first the number of web pages is so large that it is not feasible to seek such information manually, and second, patterns and hidden relationships between entities cannot be found without a collective analysis.Information retrieval applications, especially web search engines, can be designed to overcome the first difficulty. But to use search engines, a user has to know his/her information need to formulate the query. In the circumstances of finding unexpected information, such information needs remain unclear until the results are presented. Search engines also lack of analysis of retrieved documents; patterns across documents cannot be found easily. Current search tools'ability in obtaining business intelligence is very limited.We proposed a number of concepts or methods to mine commercial information that is unexpected to users form competitors'websites. For example we use the knowledge of user's own company's website as the user's background knowledge of the same business of competitors. The reason why we could do like this is that it's reasonable to assume that similar company has the similar basic information, and people in the same business know the basics of that business. Then we compare the user's website and the competitor's website to mine various unexpected information as various type of user's requests. The core techniques applied in these methods is mining the keywords of web pages and counts the raw frequency of each keyword. Then we analyze the results, compare and rank them to filter the useless information and get what the user want.But compared with the English websites, mining Chinese websites is harder for there's no clear mark between one word to another in Chinese sentences. So the first step of mining is to separate Chinese characters and identify each word in one sentence. The Chinese word extractions without dictionary based on statistic information is advanced in identify the unlisted words (such as person name, place name, company name and so on) and technical term in professional field. In this paper, we have researched an automatic and dictionary-free Chinese word segmentation method based on suffix array, improved the method that counting the frequency of the candidate words and reduced the number of judgments whether two candidate words have the father-and-son relationship when filtering the candidate words. We've also introduced three typical Dictionary Mechanisms and designed a new Dictionary Mechanism that is First Two Hash Sub-PATRICIA tree. The new mechanism integrates the advantage of existed linear Dictionary Mechanism and the advantage of the PATRICIA tree mechanism; it can improve the speed of the word segmentation.
Keywords/Search Tags:Text Mining, Competitor, Commercial Information, Association Rule, Word Segmentation
PDF Full Text Request
Related items