Font Size: a A A

Research And Implementation Of Industry-Oriented Information Integration Prototype System

Posted on:2014-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z LinFull Text:PDF
GTID:2248330398972265Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The data available on the Internet is growing rapidly at a tremendous rate due to the rapid development in information industry and more search containing information about entities, such as names of persons, cooperation and place, are conducted by users. They are trying not only to conduct the search by keyword matching, but also by building search conditions from semantic analysis of those entities and related information.The existed general document search engines, like Google, Baidu and Yahoo, are all using keyword match operations to fulfill users’ need. However, these technologies are being found lack of satisfaction from Internet users and an entity centred search engine is in need.This paper firstly investigated the disadvantages of the existed search engines and users’ customs, purposed a method for information integration based on entity model, then build an industry-oriented information integrating prototype system with the help of machine learning algorithms to integrate information around entity concepts in order to make ordinary Internet users use this entity-based search engine more efficiency.This paper has conducted the following research work:First, we made an entity dictionary based on extraction, classification and sorting entries from Baidu Baike. Second, we collected IT news and famous blogs of IT industry from portal sites and make industry-oriented Chinese news corpus after extracting and sorting those passages. Then, an industry-oriented web information integrated prototype system is made based on machine learning algorithms, which uses sorting algorithms form map to calculate the correlation between text and entity and result in entity weighting in texts on semantic bases; in addition, the correlations between entities are calculated based on the texts containing each entity and its weight. After all, an entity centred prototype search engine is made beyond the above research.This paper contains experiments of industry-oriented information integrated prototype system made by using an existed Chinese news corpus as a test set. The results has shown that the model presented in this paper has a deviation of less than0.1about correlation between text and entity and between entities compared to hand-annotated results, which is anastomosed to human’s cognitive and thus have a good accuracy.
Keywords/Search Tags:Information Integration, Entity Model, Industry-oriented, MachineLearning
PDF Full Text Request
Related items