Font Size: a A A

Improved Edit Distance Algorithm And Its Application In E-government

Posted on:2012-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:B WuFull Text:PDF
GTID:2208330332986798Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, natural language processing on the Internet is widely used, popular search engines and social networks are used in natural language processing technology, natural language processing, promising development. Natural language processing is a combination of artificial intelligence and computer an important direction for research on natural language allow a computer to understand human language, even to understand the human way of thinking, so that the computer to better serve humanity.The similarity of natural language research is a very important foundation work, as it applied to all aspects of the human, except for translation, information retrieval is based on the field, there are some high-end applications such as automatic Recognition, intelligent retrieval and other fields. Now has a lot of similarity algorithm, but these algorithms are mostly based on English characters. Since Chinese is a very complex language, and its word order is very dynamic and default on behalf of that flexibility, there is no change of tense, so the algorithm based on English characters is almost impossible to deal with Chinese.Many scholars studied the Chinese language similarity, but also made some progress, but also the emergence of a number of excellent algorithms. For example: TF / IDF is a statistical method based on, but this method does not consider the semantics of the corpus is small and does not apply to situations; semantic dictionary method is based on the theory of semantic annotation, the current development of the technology is not mature enough; words Shape and comprehensive law does not consider the semantic word order; dependent tree method is based on syntactic structure analysis, syntactic structure analysis development is not mature enough. Shows that research on the Chinese Sentence Similarity is difficult.Popular author in the current algorithm based on edit distance made some improvements: Edit unit replaced the original with the words English characters, editing unit replacement cost of improvements for the cost function based on semantics, and to support the exchange of adjacent blocks, but also The results were normalized and so on. Theory with practice, I also made the basis of an experimental system to study the theory used in e-government to decide cases on the system automatically...
Keywords/Search Tags:Natural language, similarity, edit distance, automatic judgment
PDF Full Text Request
Related items