Research On Analysis And Computation Methods For Short Text With Deep Learning

Posted on:2017-02-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Li

Full Text:PDF

GTID:1108330485450025

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet and the mobile devices, users can easily express their emotions, opinions and comments through the Internet and the mobile platform, which produces a huge amount of text information. Among these text data, the short texts have become the main carrier for the users to transmit information. Thus, the analysis and computation of the short text have gradually become a hot research in natural language processing. However, for the characteristics of casual expression and irregular grammar, the traditional processing method will lead to sparse representation and the loss of semantic information in short text computation, and lead to word matching failure and words out of vocabulary in Chinese word segmentation, and lead to lack of semantic representation in words and characters, which demonstrates that the traditional methods are not fully suitable for the short text computation. With the development of deep learning, feature learning is becoming a new branch of machine learning. Therefore, it is important to study the related problems of short text with the semantic representation and deep learning. It is significant for the application of short text.Aiming at the above problems, according to the characteristics of short text, semantic representation, Chinese word segmentation, and short text similarity computation are studied in this dissertation with the theories and methods of deep learning. And a complete short text computing framework is formed. The main contents and innovative works of the dissertation are as follows:(1)To extract the semantic representation of Chinese characters and words, a semantic vector representation method based on local and global context is proposed. Through the semantic relations between the word and its context, this method constructs a neural network model for the semantic computation of local context and global context. The model learns the semantic vectors of characters and words in an unsupervised way to make the semantics irreplaceable in its context. Two widely covered groups of representations are trained by the model respectively for Chinese characters and Chinese words. Experimental results show that the learned vector representations contain effective semantic relations, and the low dimensional continuous vectors are more advantageous to the short text computation.(2)To avoid the failure of word matching and the words out of vocabulary of the traditional Chinese word segmentation methods, a Chinese word segmentation method based on Chinese character vector representations is proposed. This method takes the positions in the word as the target, and converts the word segmentation into a sequence annotation problem. A neural network model is constructed as an annotation classifier for the semantic analyzing of the context. Then, the word segmentation is performed by the estimated position in the word of each character. With the comparison of ICTCLAS, Cloud platform of HIT, and Paodingjieniu Chinese word segmentation tool, the experimental results demonstrate that the results of this method are effectively higher in accuracy and recall.(3)Aiming at the problems of sparse representation and the semantic loss of traditional methods in representing short texts, a short text representation method based on the pooling computation is proposed. Taking into account of the similar semantic words between the target text and the candidate text, the short texts are represented in the weighted pooling method with the word vectors. In addition, the feature obtained from the recursive auto-encoder is fused to construct a short text similarity computing framework. The experimental results demonstrate that the proposed framework can effectively improve the retrieval results of short texts.Finally, according to the actual needs of the biomedical information retrieval task, the text representation method is applied to the query expansion in order to solve the problem of the lack of domain dictionary and synonymy thesaurus. A biomedical information retrieval system based on the semantic representations and short text representations are finally designed. The BioASQ evaluation results show that the system won the champion for twice and the second place for twice in document retrieval and won the second place for four times in snippet retrieval. The application example further demonstrates the validity of this method.

Keywords/Search Tags:

Short Text, Deep Learning, Semantic Representation, Chinese Word Segmentation, Similarity Computation

PDF Full Text Request

Related items

1	Research On Semantic Similarity Calculation Of Chinese Short Text
2	Research On Semantic Similarity Calculation Of Chinese Short Text Based On Deep Learning
3	Research On Chinese Word Segmentation Based On Deep Learning
4	Research On Short Text Semantic Similarity Computation
5	Research And Application Of Short Text Semantic Similarity Model Based On Deep Learning
6	Research On Chinese Word Semantic Similarity Computation
7	Research On Word Similarity Computation Method Based On Non-IID Learning
8	Research On Chinese Word Segmentation Based On Deep Learning
9	Applied Study On Chinese Word Segmentation Based On Deep Learning
10	Research On Key Technologies Of Chinese Text Classification Based On Deep Learning