Font Size: a A A

Unstructured Data Semantic Relationship Mining For The Field Of Ethnic Information Resources

Posted on:2017-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:P HuangFull Text:PDF
GTID:2358330503973363Subject:Engineering
Abstract/Summary:PDF Full Text Request
Unstructured data such as text is an important part of national information resources,How to fully develop and utilize the resources is very important that will absolutely be contributive to the social and promote the cultural communication between nations.This article discusses the use of Data Mining technology in unstructured data of national information resources semantic analysis from three aspects:1.We describes the word segmentation of text data of national information resources,using the maximum matching of the string method for chinese word rough segmentation,then using bidirectional maximum matching string method for ambiguity recognition.we compare the statistic method in the ambiguity processing.Based on analysis of them,some modified methods to further improve its capacity were proposed.2.Using large-scale corpus to recognize new words to solve the problem that national information resources have too many new words.Using MapReduce parallel models to solve the problem that it need too many calculation time.We will use N-Gram method to find possible options,and choosing chi-square,entropy,word frequency as feature,then using rule to filter new word.3.Relation extraction is an important task in text mining of national informatio n resources,We find that it can use some words in the context of sentence tod escribe the semantic relation. To solve the known difficulties and problems int he set up of tagged corpus and predefined the entities-relationships model,the p aper proposed a method of density-based multi-clustering of semantic similarit y to mining the binary entity relationship tuples.We can solve the national information resources management and service problem by mining the unstructured data.
Keywords/Search Tags:Text Mining, National Information, Unstructured information
PDF Full Text Request
Related items