Font Size: a A A

Research And Implementation Of Text Semantic Matching For Data Fusion

Posted on:2016-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2428330542457306Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The data fusion mentioned in the area of database and information retrieval is an important task in data integration.It tests and merges the records that represent the same entity from different sources in real world.With the developing of modern society,the human beings generate mass of data,and 80%of them are unstructured.Text data is regarded as one of the most important data among them.So taking the multimedia data,such as text data,in the data fusion system is necessary.The integration of text data can be interpreted as a fusion in two levels.The first level includes the entity linking in semantic information retrieval.That is to retrieval all the information that represents the same entity in real world and shows the results to users.The second level includes multi-sources data mining,such as integrated attributes extracted,predicting the trends in the future and so on.Most of the present approaches,like DATA TAMER,focus on the retrieval with key words directly.These approaches are weak in semantic information retrieval.Some researches,like TextFlow,are the analysis of topic evolution of multi documents.In this thesis,we propose a semantic entity linking method and a text matching method.The methods combine the advantage of data fusion and entity linking.It can help users for semantic similar information fusion.The major research contains:(1)We systematically introduce the research status at home and abroad of data fusion and entity linking problem,briefly summarize the representative related works,and point out their advantages and disadvantages,then analyze the deficiency of present research.(2)We propose an Skip-gram based entity linking approach called SimNet,which divide the entity resolution process into three stages:In the first stage,we utilizes word embedding to map texts to a lower space where each dimension has its semantic meaning.In the second stage,we compute the confidence for candidate.In the third stage,we iteratively complete entity linking task.(3)We propose a text matching method based on words alignment.This approach model the text as a text graph which extends the text feature by words alignment and adjacent point's generation.(4)We complete a tool for biomedical area,which can help users to retrieval semantic similar entities from multi sources texts.As an emerging field of study,text data fusion provides a perspective to the integration of structured records and unstructured text data.In this paper,we research some basic questions in data fusion process:entity linking and the semantic matching of texts that describing the entities.The experimental results verified the feasibility and the effectiveness of key technique proposed in this thesis.Comparing with other entity linking and text matching methods,the methods in this paper is reliability.
Keywords/Search Tags:entity linking, text data fusion, text matching, similar network, word alignment
PDF Full Text Request
Related items