Font Size: a A A

Research On Machine Learning For Natural Language Processing And Transmission

Posted on:2019-05-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:L T FangFull Text:PDF
GTID:1368330590460091Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,a variety of services and applications have been proposed and developed to assist people in a host of tasks in daily life.However,many of such services require expertise in computer science,which makes it difficult for nonprogrammers to interact with the system and get useful knowledge.Hence,natural language processing,which enables computers to analyze,understand,and derive meaning from human language,has attracted the interest of researchers from all around the world.In this thesis,we focus on the machine learning techniques in natural language processing,and investigate it from two aspects: the applications of natural language processing and the transmission of natural language.In the applications of natural language processing,we first study ensemble embedding,which is a general technique that combines existing word embeddings and semantic knowledge bases to get a common representation of vocabulary.We next investigate two applications of natural language processing: grammar question retrieval and personalized recommendation techniques in online learning.In these two applications,we propose a relaxed tree matching algorithm and a content-based recommendation approach.Moreover,the ensemble word embedding is used as an assistive technology to further improve the retrieval and recommendation results.In the transmission of natural language,we study the transmission technology of natural language at the physical layer based on deep learning.The main contents of this thesis are listed as follows:First,we study the ensemble method of combining different word embedding sets and semantic knowledge bases.The existing word embeddings are learned from distribution of words in large corpora.Although the word embeddings learned from corpora contain semantic information,they disregard the valuable information that is contained in semantic knowledge bases such as ConceptNet.In addition,different embedding sets vary greatly in quality and characteristics of the captured information.We propose a method to learn high quality ensemble word embedding based on a variety of public word embedding sets and semantic knowledge bases.By conducting experiments on several standard natural language processing measure tasks including word similarity and word analogy,it is show that the proposed ensemble embedding outperforms individual embedding sets and the ensemble method without semantic knowledge bases.Secondly,we introduce a new grammar question retrieval problem.Specifically,given a query grammar question,the problem aims to find a question with similar grammatical focus.Since the search objective of our problem is different from the general retrieval tasks,the existing statistical analysis based methods and syntactic analysis based method cannot adopted to address our problem directly.In order to address the problem,we propose a tree matching based method for grammar question retrieval.Specifically,we first propose a new syntactic tree,namely parse-key tree,to capture a English grammar question's grammatical information.Then we propose two kernel functions,namely relaxed tree kernel and part-of-speech order kernel,based on which we can compute the similarity between two parse-key trees.In addition,word-embedding similarity,conceptual similarity and textual similarity are also incorporated to further improve the retrieval accuracy.Thirdly,we propose a personalized recommendation method for grammar questions.We recommend materials to users based on their choices and the feature information of the grammar questions.In our solution,we propose a novel and efficient features-extracting method to solve this problem.We extract 4 types of features from grammar questions: statistical features,partof-speech features,relationship features,and word embedding features.Based on these features,we recommend questions with standard linear models.To the best of our knowledge,our work is the first attempt to provide personalized grammar question recommendation.The experimental results show that the standard scoring models using proposed features perform well in both accuracy and efficiency.Finally,we study natural language transmission based on deep learning.Traditional communication methods have several inherent limitations in complex scenarios,such as difficult optimization problem and high computational complexity.We propose an end-to-end natural language transmission method.Different from the traditional communication system that optimizes each module individually,our method treats the entire communication system as an encoder and a decoder.The approach can learn the optimal solution to a complex scenario from large volume of data.
Keywords/Search Tags:Natural language processing, word embedding, information retrieval, personalized recommendation system, natural language transmission, ensemble algorithm, tree matching, feature extraction, deep learning
PDF Full Text Request
Related items