Font Size: a A A

Design And Implementation Of Aggregation System For Personal Information In Cyberspace

Posted on:2020-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:X N YeFull Text:PDF
GTID:2428330575957089Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid upgrade of Internet core technology and the continuous growth of the scale of network users,there is a large amount of data information in cyberspace.This information relates to all aspects of social life,but as a participant in all social activities,it is not difficult to derive the proportion and importance of people information in Internet information.People want to get specific people information from cyberspace,but in the face of huge amount of network information,the difficulty of retrieving information by users is greatly increased.Therefore,a system is designed to crawl the information that user may be interested in from cyberspace,aggregate the information and generate aggregated results of different people entities,and display the information that users really care about correctly,quickly and holistically.According to the actual needs of users,this paper analyzes the functional and non-functional requirements of the people information aggregation system,designs the system architecture and sub-module functions,and deeply studies the practical technical solutions of network people information aggregation to implements a people information aggregation system.The specific work is as follows:Designing multi-threaded crawlers combing with search engines to collect URLs of people's information which are interested in by users,constructing the web page information library of homonymous people by text extraction based on web structure and statistical features.Using the Bag-of-words model,TF-IDF algorithm and N-gram model to extract different feature information of people web page,constructing the text feature vector for each web page text by vector space model.Contrasting and analyzing different clustering methods in text clustering,evaluating the clustering effect of Affinity Propagation algorithm and Hierarchical clustering algorithm in web page text,realizing the aggregation of homonymous people web page collection by designing a network people information clustering method which combines with Silhouette Coefficient.Using Django framework and Adminlte framework to realize the integration of different functional modules into the system,users can interact with the system through various forms or check and adjust the aggregation results,and use the front-end technology such as Echarts to complete the visual realization of the operation.Based on the above research work,this paper designed and implemented the personal information aggregation system which in cyberspace to help users get the collection of people's web pages which are interested in quickly and accurately,and carried out a series of functional and non-functional tests on the system.The results showed that the system can receive input from different data sources and complete the aggregation of the people information by using the network people information clustering algorithm designed in this paper.And the aggregation results have achieved the expected results.Therefore,the system can meet the needs of users to obtain a collection of specific people web pages and provide functions such as information management and aggregation result adjustment,and has advanced practical value.
Keywords/Search Tags:cyberspace, people search, information aggregation, text clustering
PDF Full Text Request
Related items