Font Size: a A A

Similarity Determination Of Chinese Sentences In A Crowdsourcing Solution

Posted on:2015-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:D N ShiFull Text:PDF
GTID:2268330428454791Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularization of computer and the rapid development of the Internet, large amounts of information in the form of electronic documents spring up in front of people. It’s more difficult to pick out the information you need quickly from the huge amounts of text data. Natural language text is the most common form of information storage and communication. Similarity determination of Chinese sentences is one of the basic problems in natural language processing, which is the precondition of information retrieval, information extraction, data mining, artificial intelligence and so on.Chinese sentence is in the form of long string. It is very complicated for a computer to take semantic analysis of Chinese sentence because of its complex semantic expression and lots of ambiguous words. It is an urgent problem to propose efficient and accurate method to solve the problem of Chinese sentence similarity determination. Manual tagging of sentence is a high accuracy method, which not only need lots of money to employ workers but also has a bad efficiency. The traditional method of semantic similarity computation based on words frequency statistics, syntactic analysis, sentence structure analysis and so on, while there are still have imperfections in the existing technology of natural language processing due to the factors of semantic ambiguity, structural diversity and other aspects of Chinese sentence. Thus, Chinese sentence semantic similarity computations still need to be perfected."Crowdsourcing" is a flexible and effective way to solve the problem, has begun to attract more and more attention, which based on the idea of human computing. Some of the calculations and functions are outsourced to the crowd especially the online community to be implemented so that human and computer can work together to achieve the optimum result. There are many undecidable problems and NP problems among the problems of Chinese sentence semantic similarity. These problems are very difficult to solve for computers, while human has more background knowledge and better understanding of inductive capacity to solve these problems which is the key to solving such problems. Therefore, these problems can be solved by the way of crowdsourcing. In the crowdsourcing system the complex task is usually broken up into a series of simple tasks and then sent to people aiming to make crowdsourcing workers accomplish the task well. Eventually, the answers provided by users are collected and aggregated by crowdsourcing system and got the result. Not only a lot of results can be got in a short time but also the quality of the results can be ensured by the way of crowdsourcing.In the paper, we design a crowdsourcing solution of similarity determination of Chinese sentence; the sentence could expand into a collection of similar sentences. Combine the sentences into pairs and then assign the tasks to the crowd. We proposed the method of related sentences extension and put forward the sorting algorithm of Chinese sentence semantic similarity. And then, we analyzed the computational complexity of the sorting algorithm, as well as proposed the heuristic algorithm in polynomial time and sorted sequence of sentence semantic similarity are worked out. We also design the evaluating algorithm for the crowdsourcing workers to ensure the quality of crowdsourcing job. Finally, we have done the experiments to verify the correctness and feasibility of the algorithm and analyzed the factors that affected the accuracy and efficiency of the algorithm.
Keywords/Search Tags:sentence sematic similarity, crowdsourcing, human computing
PDF Full Text Request
Related items