Research On Micro-blog Oriented Knowledge Entries Discovery Method

Posted on:2014-02-01

Degree:Master

Type:Thesis

Country:China

Candidate:H M Shi

Full Text:PDF

GTID:2298330422490427

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Accompanying with the coming of information age, fragmentation has alsobecome a trend of information serving and sharing. Hence, it becomes a criticalissue of mining interesting and useful knowledge entries from fragmentedinformation, which is also one of the hot topics of research on text processing.Micro-blog, as the most popular social media and the most representative type offragmented information service, has become one of the most important resour cesof acquiring novel knowledge. Nevertheless, for the short history and text, theinformation provided by a single micro-blog text is usually not enough forreaching a satisfied knowledge mining performance via traditional methods andfurther research is required.In this paper, motived by the requirements of real applications, we mainlyfocus on solving the problem of mining more similar knowledge entriesaccording to the few of knowledge entries provided as seeds. Considering theirregularity expression of micro-blog text, in this paper, we firstly combine thedependency parsing with the rule-based matching algorithm in the knowledgemining from micro-blog texts. Through the dependency parsing in the sentence,we only analyze the components with the direct dependent relation, making thelearning of template spin restrict on distance, decreasing the dependent degree ofthe training texts. This method achieves a higher enough recall rate forknowledge expansion, with a relatively lower precision.Based on previous method, we design and implement the knowledge entriesmining algorithm by introducing traditional statistic method. Among existingstatistic models of named entity recognition, CRFs reaches the-state-of-artperformance. However, the result of directly apply CRFs on micro-blog texts ispoor, it is especially worse at the aspect of recall rate. To address this issue, wecombine the dependency parsing and word vector technology with CRFs. Theformer makes the CRFs model spin the restriction of distance between words,while the latter expands the single words to a vector and makes use of theclustering algorithm to assign a class label to each word, which extends the classlabel to the characteristic template of the CRFs model. Both methods maketemplates in CRFs model more generalized. Through the combination of thesetwo methods, we improve the recall rate and the overall performance of theknowledge-mining algorithm.Finally, we apply this method into the micro-blog oriented knowledgeentries extraction system and reach obviously improvement for the system performance.

Keywords/Search Tags:

micro-blog, dependency parsing, conditional random fields, termvectors, clustering algorithm

PDF Full Text Request

Related items

1	Research On Fast Exact Structured Learning
2	Research On Japanese Dependency Parsing Technology
3	Research Of Sentiment Analysis For Chinese Micro Blog Based On Conditional Random Field
4	Research On Chinese Syntactic Parsing Based On Cascaded Conditional Random Fields
5	Research Of Sentiment Analysis For Chinese Micro Blog Based On Conditional Random Field
6	Research On Dependency Parsing With Partial Annotations
7	The Research Of Applying Conditional Random Fields To Chinese Lexical Analysis And Chunk Parsing
8	Research On Dependency-based Chinese Semantic Role Labeling
9	Research Of Chinese Phrase Identification Based On Conditional Random Fields
10	Multi-Task Learning In Conditional Random Fields For Chunking In Shallow Semantic Parsing