Font Size: a A A

Research On Micro-blog Oriented Knowledge Entries Discovery Method

Posted on:2014-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:H M ShiFull Text:PDF
GTID:2298330422490427Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Accompanying with the coming of information age, fragmentation has alsobecome a trend of information serving and sharing. Hence, it becomes a criticalissue of mining interesting and useful knowledge entries from fragmentedinformation, which is also one of the hot topics of research on text processing.Micro-blog, as the most popular social media and the most representative type offragmented information service, has become one of the most important resour cesof acquiring novel knowledge. Nevertheless, for the short history and text, theinformation provided by a single micro-blog text is usually not enough forreaching a satisfied knowledge mining performance via traditional methods andfurther research is required.In this paper, motived by the requirements of real applications, we mainlyfocus on solving the problem of mining more similar knowledge entriesaccording to the few of knowledge entries provided as seeds. Considering theirregularity expression of micro-blog text, in this paper, we firstly combine thedependency parsing with the rule-based matching algorithm in the knowledgemining from micro-blog texts. Through the dependency parsing in the sentence,we only analyze the components with the direct dependent relation, making thelearning of template spin restrict on distance, decreasing the dependent degree ofthe training texts. This method achieves a higher enough recall rate forknowledge expansion, with a relatively lower precision.Based on previous method, we design and implement the knowledge entriesmining algorithm by introducing traditional statistic method. Among existingstatistic models of named entity recognition, CRFs reaches the-state-of-artperformance. However, the result of directly apply CRFs on micro-blog texts ispoor, it is especially worse at the aspect of recall rate. To address this issue, wecombine the dependency parsing and word vector technology with CRFs. Theformer makes the CRFs model spin the restriction of distance between words,while the latter expands the single words to a vector and makes use of theclustering algorithm to assign a class label to each word, which extends the classlabel to the characteristic template of the CRFs model. Both methods maketemplates in CRFs model more generalized. Through the combination of thesetwo methods, we improve the recall rate and the overall performance of theknowledge-mining algorithm.Finally, we apply this method into the micro-blog oriented knowledgeentries extraction system and reach obviously improvement for the system performance.
Keywords/Search Tags:micro-blog, dependency parsing, conditional random fields, termvectors, clustering algorithm
PDF Full Text Request
Related items