| As is well-known to all, Comparison is an important way of understanding human language. With the further development of Web 2.0, Internet technology pays more attention to user's interaction. Users are not only the reader, but also the writer of Web content. Especially in recent years, blogs, podcast, logs, wikis, social networks and forums emerge everywhere as the new network elements which made the Web information more individual and diversification. There is much text information which includes new innovation, theory, technique, idea, arts and so on. Because of these facts, it is an issue of new international academic research by using natural language processing technology to identify the comparative sentences and relations.Based on the existing research, on this dissertation we did some part of research towards the main technologies which used in identification of comparative sentences and relations, and the finding and the main research contributions are as follows:â‘ Based on our research analysis and findings we proposed a novel Entropy-value Balancing Algorithm (EBA) for balancing class of imbalanced text corpus. Based on the theory that is entropy is measure a system orderly degree, we calculate each word's entropy and construct a keyword sets to filter each class text sets in order to majority class number close to minority class number. Applied in comparison imbalanced corpus, we got 701:1226 from original corpus that comparative sentences number is 796 and non- comparative sentences number is 8010, reached the expected goal.â‘¡We presented a novel method which is based information entropy for identifying comparative sentences. This method extracts semantics and structural features from comparative sentences, which will avoid the situation that only one of the statistical information or structural patterns of sentences considered or not. We extracted structural features using Apriori algorithm through setting suitable minsup and minconf. Then we selected features using information gain (IG) for classifying features vectors with SVM and NB classifier. Our experimental results showed that this method can solve comparative sentences identification and F1-value is 81%.â‘¢We also proposed a new method which based semantics role syntax parser tree (SRSPT) for extracting comparison relations. On our research combined syntax parser tree with semantics role label, were constructed as a new structure that was, semantics role syntax parser tree for extracting comparison entities, features and relationship in comparison relations. Similarity matching function is designed for calculating maximum probability between two sub-trees. The experimental results indicate this method has good effects on single relation in a sentence, on the contrary this method are not effective on multi-relations in a sentence.â‘£This research work realized the application of comparative sentences and comparative relations identification in product reviews mining. Within the framework of identification theory proposed, we presented the flow diagram of product reviews mining. Then we realized the application in this field. The result of the experiment proves that this method which is used to comparative sentences and comparative relation identification has a better effect. |