Research Of Deep Representation About Web Knowledge Resource

Posted on:2016-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:C Q Li

Full Text:PDF

GTID:2348330503978265

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology and the explosive growth of Internet, people often fail to acquire, utilize the required rich and diverse content network knowledge resources effectively. In order to make better use of network knowledge resources,we need to apply more automated and intelligent data mining and information extraction methods. Web document, as a network of knowledge resources, has the characteristics of natural language unstructured. Before using clustering, classification, text mining and mining technology, you need to change web documents into another format which machine-learning algorithms can understand. Text mining is an important support to convert text data of numerical data. This paper focuses on the representation of text and the in-depth research of named entity recognition. First, we summarized the basic concepts and theories of text representation and domain named entity recognition. Then,analyzed the most popular word vector space model vector and deep learning framework. On this basis, we propose a new representation approach of the network-based knowledge resource based on namedentity recognition and word vector. Then, we expand experimental research in the algorithm knowledge. Thesis launches the study and exploration in the following aspects:First, based on the research of the common representation of the text,it points out the limitations of the most popular text representation method of vector space model, and then use the word vector text entities and deep syntax, semantics mining features named, proposed a new representation method based on knowledge of the network resource named entity recognition and word vectors.Secondly, as the first part of the proposed model framework for algorithm knowledge areas, it expands named entity recognition research and experimentation. We crawl web documents, pretreatment and labeled corpus etc, and complete the construction of the corpus of knowledge of the algorithm, and regard CRFs as the main algorithm, fusion rules,dictionaries and statistical methods in one building models, in allusion to the algorithm Characteristics of knowledge and network report, select relevant features to generate a feature template, use open source tools to complete the training process CRF ++.Then, as the second part of the proposed model framework for algorithm knowledge areas, we train corpus word vector model in the field of knowledge, combined with the results of the naming of the first part of the entity, obtain a vector representation of network knowledgeresources, and use the proposed cluster network problem-solving methods report experimental results to show that this method has a good effect.Finally, we discussed the above two steps of experimental results,analyzed the reasons. Establish the next improvement goals and future research.

Keywords/Search Tags:

Text representation, named entity recognition, CRF, the algorithm knowledge, Distributed representation

PDF Full Text Request

Related items

1	Algorithm Research On Text Classification And Named Entity Recognition Based On Deep Text Feature Representation
2	Research On Key Technologies Of Named Entity Recognition And Linking Based On Representation Learning
3	Research On Named Entity Recognition And Knowledge Representation In Knowledge Graph Construction
4	Research On Knowledge Representation Learning Based On Entity Description And Entity Similarity
5	Research On Named Entity Recognition And Entity Link Method For Short Text Questions
6	Research On Knowledge Graph Construction Technologies Based On Text Feature Learning
7	Recognition And Discovery Of Programing Design Network Resource Named Knowledge Entity
8	Research On Chinese Named Entity Recognition With External Knowledge And Application In Medical Field
9	Knowledge Graph Embedding With Triple Context And Text
10	A Research On Question Answering System Based On The Knowledge Graph