The Design And Implementation Of Pretreatment Subsystem For The Finance Q&A System

Posted on:2016-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Wu

Full Text:PDF

GTID:2298330452461275

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years, with the development of the financial sector, more and morepeople pay attention to the financial information search. However, the traditionalkeyword search method could hardly meet the needs of financial users. Therefore,semantic search products have gained more popularity. However, the semanticparsing process is complex and diverse. During the process of the analysis oflarge-scale real text, the sentence analysis which based on word granularity is verydifficult, but after chunk analysis, the accuracy can be significantly improved andthe complexity can be clearly reduced. At present, the user input method is almostthe Pinyin input method, thus it could probably happen that the input phonetic lettersare correct, while the Chinese character is wrong. In order to achieve the purpose ofimproving the user experience, the Pinyin text proofreading appears.The content of this paper is a pretreatment subsystem of the finance Q&Asystem, and it contains two functions, Pinyin text proofreading and chunk partition.Text proofreading is based on Pinyin, in the first place, putting the query sequenceinto phonetic sequence. Then it uses the multiple grammars, three grammars and twogrammars to fill the Chinese characters sequence. And if there is no ambiguity whenfilling, then it uses the text proofreading, otherwise, ignores the text proofreading.For the chunk partition, this paper uses three different methods: the method ofthe shortest path chunk partition, the method of CRF chunk partition and the methodof neural network chunk partition. The subsystem designed and implemented thethree methods. The shortest path chunk partition is similar to the dictionary wordsegmentation method. CRF chunk partition divided the problem into the annotationproblem, through getting the optimal sequence labels, to reach the purpose of chunkpartition. Chunk neural network chunk partition is based on word segmentation,changing the query sequence into a sequence of nodes. If the sequence of possibilityimproved after adding chunk tag on each node sequence, the tag position is a chunkposition.In practical test, the shortest path chunk partition based on the dictionary,cannot process the query which is not in the dictionary of chunk. CRF chunkpartition can solve the unknown chunk query, but the mark information is too simplewhen the training corpus increased，and the error percentage of results increasedsignificantly. When using neural networks in chunk partition, the predicted timeincreases significantly, however, the query generalization is a good idea. Throughthe above research, a reasonable way to chunk partition is the CRF method, which properly uses the query generalization.Finally after testing, the pretreatment subsystem has not only met the needs ofthe finance Q&A system, largely improving the financial Q&A system userexperience, but also reduces the analytical complexity, significantly improved thecorrectness and performance.

Keywords/Search Tags:

Pretreatment system question analysis, chunk, text proofreading, CRF, neural network

PDF Full Text Request

Related items

1	Design And Implementation Of Online Intelligent Text Proofreading System
2	The Research Of Chinese Automatic Question Answering And Proofreading Based On Deep Learning
3	Design And Implementation Of Intelligent Course Question Answering System In MOOC Environment
4	Research And Application Of Key Techniques In Chinese Text Proofreading
5	Research On Key Technologies Of Question Answering Systems
6	Research And Implementation Of Chinese Text Automatic Proofreading Based On Deep Learning
7	Research On Chinese Text Proofreading Algorithm Based On The Combination Of Statistical Features And Rules
8	Research On Text Proofreading Method Based On The Analysis Of The Mongolian Syllable
9	Research On Question Text Classification And Answer Extraction Technology In Automatic Question Answering System
10	Research On Automatic Generation Technology Of Chinese Text Proofreading Corpora