Font Size: a A A

Research Of Deep Representation About Web Knowledge Resource

Posted on:2016-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:C Q LiFull Text:PDF
GTID:2348330503978265Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and the explosive growth of Internet, people often fail to acquire, utilize the required rich and diverse content network knowledge resources effectively. In order to make better use of network knowledge resources,we need to apply more automated and intelligent data mining and information extraction methods. Web document, as a network of knowledge resources, has the characteristics of natural language unstructured. Before using clustering, classification, text mining and mining technology, you need to change web documents into another format which machine-learning algorithms can understand. Text mining is an important support to convert text data of numerical data. This paper focuses on the representation of text and the in-depth research of named entity recognition. First, we summarized the basic concepts and theories of text representation and domain named entity recognition. Then,analyzed the most popular word vector space model vector and deep learning framework. On this basis, we propose a new representation approach of the network-based knowledge resource based on namedentity recognition and word vector. Then, we expand experimental research in the algorithm knowledge. Thesis launches the study and exploration in the following aspects:First, based on the research of the common representation of the text,it points out the limitations of the most popular text representation method of vector space model, and then use the word vector text entities and deep syntax, semantics mining features named, proposed a new representation method based on knowledge of the network resource named entity recognition and word vectors.Secondly, as the first part of the proposed model framework for algorithm knowledge areas, it expands named entity recognition research and experimentation. We crawl web documents, pretreatment and labeled corpus etc, and complete the construction of the corpus of knowledge of the algorithm, and regard CRFs as the main algorithm, fusion rules,dictionaries and statistical methods in one building models, in allusion to the algorithm Characteristics of knowledge and network report, select relevant features to generate a feature template, use open source tools to complete the training process CRF ++.Then, as the second part of the proposed model framework for algorithm knowledge areas, we train corpus word vector model in the field of knowledge, combined with the results of the naming of the first part of the entity, obtain a vector representation of network knowledgeresources, and use the proposed cluster network problem-solving methods report experimental results to show that this method has a good effect.Finally, we discussed the above two steps of experimental results,analyzed the reasons. Establish the next improvement goals and future research.
Keywords/Search Tags:Text representation, named entity recognition, CRF, the algorithm knowledge, Distributed representation
PDF Full Text Request
Related items