Font Size: a A A

The Design And Implementation Of Pretreatment Subsystem For The Finance Q&A System

Posted on:2016-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WuFull Text:PDF
GTID:2298330452461275Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the development of the financial sector, more and morepeople pay attention to the financial information search. However, the traditionalkeyword search method could hardly meet the needs of financial users. Therefore,semantic search products have gained more popularity. However, the semanticparsing process is complex and diverse. During the process of the analysis oflarge-scale real text, the sentence analysis which based on word granularity is verydifficult, but after chunk analysis, the accuracy can be significantly improved andthe complexity can be clearly reduced. At present, the user input method is almostthe Pinyin input method, thus it could probably happen that the input phonetic lettersare correct, while the Chinese character is wrong. In order to achieve the purpose ofimproving the user experience, the Pinyin text proofreading appears.The content of this paper is a pretreatment subsystem of the finance Q&Asystem, and it contains two functions, Pinyin text proofreading and chunk partition.Text proofreading is based on Pinyin, in the first place, putting the query sequenceinto phonetic sequence. Then it uses the multiple grammars, three grammars and twogrammars to fill the Chinese characters sequence. And if there is no ambiguity whenfilling, then it uses the text proofreading, otherwise, ignores the text proofreading.For the chunk partition, this paper uses three different methods: the method ofthe shortest path chunk partition, the method of CRF chunk partition and the methodof neural network chunk partition. The subsystem designed and implemented thethree methods. The shortest path chunk partition is similar to the dictionary wordsegmentation method. CRF chunk partition divided the problem into the annotationproblem, through getting the optimal sequence labels, to reach the purpose of chunkpartition. Chunk neural network chunk partition is based on word segmentation,changing the query sequence into a sequence of nodes. If the sequence of possibilityimproved after adding chunk tag on each node sequence, the tag position is a chunkposition.In practical test, the shortest path chunk partition based on the dictionary,cannot process the query which is not in the dictionary of chunk. CRF chunkpartition can solve the unknown chunk query, but the mark information is too simplewhen the training corpus increased,and the error percentage of results increasedsignificantly. When using neural networks in chunk partition, the predicted timeincreases significantly, however, the query generalization is a good idea. Throughthe above research, a reasonable way to chunk partition is the CRF method, which properly uses the query generalization.Finally after testing, the pretreatment subsystem has not only met the needs ofthe finance Q&A system, largely improving the financial Q&A system userexperience, but also reduces the analytical complexity, significantly improved thecorrectness and performance.
Keywords/Search Tags:Pretreatment system question analysis, chunk, text proofreading, CRF, neural network
PDF Full Text Request
Related items