Font Size: a A A

Research And Implementation Of Clustering Oriented Entity Discovered Network Information Technology

Posted on:2016-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2348330509960906Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, network information resources has increased by great quantity multiples, which covers all aspects of people's life. But for the depth of target information is extremely deficient for users with the increasingly amount of information. The content of web pages could be externalized by the entities, which were contained in most of the titles of the pages and the body. It is particularly important that the search results were clustered using clustering techniques to find the related entity and attribute information of the retrieval target. The high correlation page s could be classified effectively by the current epidemic of traditional search results and clustering algorithm, that did not consider using information in the network entity which could effectively guide the user's search behavior.In connection with above problems found, the technology of clustering oriented entity discovered network information was researched and implemented in this paper according to the project. The information entities in the network was found combined with the iterative search systems. The entity information related with the target entity was obtained by clustering technology which could effectively clustering the web pages returned. In this paper, the following research work has been done to solve above mentioned problems.(1)The physical factor was introduced in calculating the feature weights since headings and text entities is the generalization of the web content when the web page text was represent in this paper. The clustering accuracy was improved with combination of word characteristic increasing weight of entities in web page text.(2)A threshold value was set in advance for text classification decision using Single-pass clustering algorithm, which is very difficult to grasp in practice. An adaptive Threshold SPT(Single-pass-Threshold) clustering algorithm was put forward. First of all, the similarity of adjacent text was computed in accordance with the order of the input, that was made as the data samples, of which the minimum variance algorithm was performed and the adaptive threshold was determined. The second cluster was carried out for text with Single-pass algorithm and the corresponding text classification after clustering was obtained.(3)The efficient text clustering algorithm was designed and realized using entity factor based on the SPT algorithm. The effectivity of clustering algorithm was verified by two groups of experiments in this paper. The effectivity of the adaptive threshold was verified through exhaustive testing in the first experiment. It was verified that the performance of the web page text using the SPT algorithm in the same or different fields by the second group of experiments. The experimental results showed that the accuracy, the recall rate and the F value of the SPT algorithm was confirmed to become better than the original algorithm.(4)The clustering module of iterative search prototype system facing entity finding was realized by the above method. The text related to entities was aggregately showed with the clustering module. The entity information of the text of each category was extracted and statistic by means of named entity recognition module and the related entity information system was provided for users. Based on the entity information system, search target could be clues-expanded for users and the entity information related with search target could be enriched gradually through iterative search application of the prototype system.
Keywords/Search Tags:Entity Finding, Clustering, Iterative Search
PDF Full Text Request
Related items