The Method Of Chinese Synonym Extraction Based On Large-scale Corpus

Posted on:2015-08-03

Degree:Master

Type:Thesis

Country:China

Candidate:H C Ma

Full Text:PDF

GTID:2298330422983998

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the popularity of computer and the rapid development of Internet, the information on theInternet assumes the exponential order to grow. At the same time, the degree shared the informationresources is more and more high, bring great convenience in Peopleâ€™s Daily life. At present, peopleface a lot of information every day, how to extract valuable information from huge amounts of datathat has become a hot topic in the research of information technology. Chinese synonym extractionis the foundation of Chinese information processing research, it plays a different role in differentapplication fields. As synonym scattered in sea of information, it is to extract synonym as much aspossible, the paper use the large-scale corpus as the research object.The continuous development of Internet technology and the explosive growth of information,Natural Language Processing and information retrival technology play more and more importantrole in deal with and extract information, synonym has important research significance andapplication value in all sorts of Natural Language Processing. According to this, the paper proposestwo kinds of synonym methods, such as literal similarity and PageRank, Pointwise MuturalInformation(PMI) and Latent Semantic Analysis(LSA).Based on Literal similarity and PageRank method, make full use of the literal similarcharacteristics and PageRank semantic relation. Both consider the matching sequence andcompatibility of the two words and relationship.The combination method of PMI and LSA based on the principle of PMI and LSA theory. PMIuse two word mutural information to estimate multiple words simply and effectively. LSAcombines computer science, mathematics, the ideas of information science and technology andmeans to dig the potential meaning of vocabulary. According to the retrieval results of two wordssemantic association to achieve the goal. Synonym extraction method based on LSA starts with themass matrix that word associated with the document and build a semantic space automatically toallow user to find relevant information. As long as it connect with the main body of the document,they are still close to this document in the semantic space. So the position of the words anddocuments in the semantic space can be used to as a kind of idea guidance, the process of extractinginformation is used to identify a point in the space. According to the word vector with the documentvector dot product between the cosine value of the size of the array. This paper presents twofeasible similarity extraction methods.Finally, the two extraction methods are verified through the experiment, the recall rate and the accuracy and F index are improved.

Keywords/Search Tags:

Synonym, Synonym extraction, Literal similarity, Pattern matching, PageRank, Pointwise Mutual Information, Latent Semantic Analysis

PDF Full Text Request

Related items

1	Synonym Discovery Based On The Searching Information
2	Analysis Of Sentiment Tendency Based On Sentiment Dictionary And Semantic Orientation Pointwise Mutual Information Algorithm
3	Automatic Recongnition Of Synonym In Construction Of Intelligent Search Engine
4	Research Of Text Recommended Methods Based On Synonym Network
5	Research On Synonym Search And Leakage-resilient In SSE
6	Research On Chinese Hyponymy Relation Automatic Extraction
7	The Research And Implement Of Synonym Expanding Retrieval Based On Lucene
8	The Design & Implementation Of Industry Attribute Keyword Expansion Method Based On JAVA
9	Crowdsourcing For Synonyms Proofreading And Acquisition In Chinese Large-scale Semantic Knowledge Base
10	Research On Text Similarity Detection Algorithm Based On Simhash