| With the popularity of the Internet and the development of web2.0, more and more consumers are accustomed to comment on the Internet, and the number of comment is also showing the trend of explosive growth, in addition to the concerns of different consumers with a product generally are not the same, respect the interest of consumer information is likely to be submerged in a sea of information, consumers want to find areas of interest, has become increasingly difficult. So, faced with these comment for consumers and businesses which have a lot of values, and how to explore these values, use the information for consumers to quick access to their areas of interest to facilitate a more fine-grained the service has become increasingly important.To do this product reviews analysis based on emotion information extraction is necessary, and is of great significance. In this task, product attributes word extraction is a very important part. Based on this, the focus of this study focused on product attribute word extraction, and clustering of the attribute word to form word clusters.the product attribute word is also called feature in this paper.An feature extraction method based on bootstrapping in English product comment was proposed in this paper. By this method, starting with a set of extraction patterns as seeds, and then applying an incremental iterative procedure to find new features. During the process of the each iteration, the system ranks the new features by score, which is calculated by the intimacy relationship between the candidate features and patterns. This is useful for prevent topic drift. After extracting features, search engine is used to calculate the similarity between features. Then clustering the features by the similarity score,the K-Link hierarchical clustering algorithm is used to cluster, get different aspects of the product features, then filtering out the low score of the class clusters, remove noise. What’s more, to improve the portability of the system, the seed features are replaced by seed patterns. Select seed patterns is conducted under mining association rules.The main work and conclusions are reflected in the following aspects,1)the recognition method of initial seed patterns is proposed based on association rule mining.2)propose a feature extraction framework based on bootstrapping.3)proposed a method for calculating the degree of closeness between the feature and the pattern.4)Proposes a word similarity calculation method based on search engine.5)proposed an improved K-Link hierarchical clustering algorithm for feature clustering.6)Designed and implemented the feature extraction system which is called SSPA, based on bootstrapping.7)six groups of experiments, which are based on feature extraction, designed to compare.In conclusion, for bootstrapping framework, through the above-mentioned aspects of the research and improvement, not only improve the portability of the system, but also the accuracy of feature extraction and the recall rate have been improved. Experimental results show that extracting features by this method has a good result, the precision, recall and F-measure reach0.819,0.799,0.809and it has good extraction performance. |