| With the rapid development of the era of web 3.0 and the "Internet +",the number of the Internet users worldwide continues to grow.According to the latest data released by the United Nations in December 2018,the number of Internet users worldwide reached 3.9 billion,accounting for 51.2% of the global population and breaking through half of the global population for the first time.At the same time,the vigorous development of the global economy and the digital economy has promoted exchanges and cooperation in economic,trade,cultural,political and military aspects between the countries in the world.People from all over the world have also begun to participate through the Internet,a high-speed information channel,and use the Internet as a public communication platform to publish opinions on global events,cross-border trade,major military and political events on major social networking sites and media.These positions and opinions contain very important information,but the languages used by the internet users who have different citizenship are different,then the language used to express opinions and positions is not the same.Therefore,it is significant and difficult to retrieve and extract the values ??of these different languages ??for the same event or thing.There are few studies that have been completed or are currently underway for this issue.Usually the direct idea to solve the problem is to use a translation-based approach to manually translate the keywords of the source language into the target language,and then use the keywords that have been translated into target language to retrieve the relevant result sets of the target language,Then selects the sentences with opinions in the target result set by artificial method.This method usually is dependent on the manual participation and the accuracy of translation.Although nowadays,translation system is relatively mature and the accuracy of translation is relatively high,the translation results are too singular in different contexts,and it is difficult to cover all the relevant words.Moreover,manual participation leads to unnecessary costs and certain errors.This paper maps Chinese and English vocabulary to the same vector space from the point of view of linking between the vocabulary between Chinese and English.Based on this vector space,the similarity calculation and cross-language sentiment classification of Chinese and English sentences are completed,and the completed the cross-language retrieval tasks.The use of cross-language similarity calculations makes search results more accurate and can avoid some of the drawbacks of human involvement.Based on cross-language lexical alignment,cross-language similarity calculation and opinion retrieval,this paper proposes a cross-Language Opinion Retrieval Based on lexical alignment(AW-CLORM)to solve question of cross-language opinion retrieval.We select Facebookâs large-scale Chinese-English word vector dataset based on fasttexttraining for cross-language lexical alignment,Then use SemEval2014 Chinese-English parallel corpus to train cross-language similarity calculation model for Chinese and English similarity calculation,Finally,use sentiment analysis tools to the Baidu Post Bar And Twitter posts that grab manually to generate mixed emotional classification data in both Chinese and English,and then train cross-language sentiment classification model based on this data.For a given Chinese keyword,firstly,we use the keyword in the source language to retrieval the related document sets by the retrieval system,and then calculate a number of documents with the highest similarity with Ds in the target language corpus as the candidate document set Dc use the cross-language similarity calculation model;Finally retrieval the document set that contain opinions use the cross-language sentiment classification and opinion feature matching,The document set is the result set of cross-language opinion retrieval,thus completing the cross-language opinion retrieval task.There are three main innovations in this paper:(1)we have proposed and completed the task of cross-language opinion retrieval.Previously,there have been relatively many studies on cross-language retrieval,but there is basically no research on cross-language perspective retrieval.(2)Based on the completely unsupervised method,we constructed the cross-language word vector space and completed the cross-language vocabulary alignment task.The method is based on multi-dimensional word vector for lexical alignment,which effectively improves the accuracy of lexical alignment.(3)We applied the twin neural network based on Manhattan distance to cross-language similarity calculation.The neural network framework based on dual LSTM trains the simultaneous input model of cross-language texts,effectively extracts the similarity features of cross-language texts,and improves the accuracy of the similarity calculation results.The results of the experiment prove that the AW-CLORM model can effectively complete the cross-language opinion retrieval task,and the accuracy of the retrieval results is relatively high.The P@10 reaches 70%.However,there are still some shortcomings in this paper,mainly appears in that the accuracy of cross-language similarity calculation results needs to be improved,and the perspective retrieval model needs to be further enriched to meet more complex opinion retrieval tasks.These are the directions for further research and improvement in the future. |