With the rapid development of web2.0, users become a main contributor of Internet content. A large number of user-generated subjective texts appear in the Internet. The sentiment analysis based on these subjective texts has become a research focus in recent years. Construction of domain-oriented sentiment lexicon is an important basis for sentiment analysis. But there exists a problem in the area of constructing domain-oriented sentiment lexicon that is the unbalanced distribution of corpus. Therefore, research on cross-domain construction of domain-oriented sentiment lexicon is gaining more and more people’s attention.Most current methods of cross-domain construction of domain-oriented sentiment lexicon require a lot of manual labeled information. This manual labeled information is somewhat unreliable and costs many efforts and time. To handle this problem, a new method based AF model was proposed. This method can use the context similarity of sentiment words to judge the semantic orientation of unknown sentiment words with only word level labeled information in source domain. And thus to construct sentiment lexicon for target domain. This construction method involves three main content. The first content is the preprocessing of the corpus data. The main work of this content is to segment sentences and words in corpus data. The second content is to build the AF model for source domain corpus and target domain corpus. The third content is to determine the semantic orientation of sentiment words. This is the core content of the method. Here, we proposed two statistical formula based on AF model, which is called domain difference and related affinity. The domain difference is used to distinguish sentiment words with domain dependence from sentiment words without domain dependence. The related affinity is used to measure the similarity of two words from two different domains. Based on these two statistical formula and the model constructed before, an algorithm to judge the semantic orientation of sentiment words was proposed. To form the sentiment lexicon, we mix up the positive words and the negative words judged by the algorithm.At last, we compared the result of our method with SO_PMI and the context based method over the data set of COAE2011 to demonstrate the effectiveness of our method. In addition, we also analyze the influences of parameters on the performance of the method through the experiment. |