Font Size: a A A

Research On Auto-Construction Technology For University Teacher Social Network

Posted on:2012-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:C W WangFull Text:PDF
GTID:2218330362450424Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the number of web pages has grown explosively. This makes it possible for people to obtain information from web. But how to acquire the useful information quickly and effectively from information-sea has become an urgent problem. On the other hand, the rise of social networking has effectively promoted the communicatin among people, and to some extent changed the way people access information. This subject aims to use machine learning, datamining and other natural language processing technologies to automatically build a social network of university teachers. Not only to provide Internet users personal information and research information, realize a more direct, high integration, all-round, multi-angle information platform, but also to create an academic exchange platform for researchers. This article focuses on the following issues:First, this article implements a block segmentation model for teacher information extraction. Teachers personal information refers to name, university, professional titles, and so on. They are basic components of teacher's information. We firstly do the pretreatment with teacher introduction web pages, and then divide them into discrete information blocks. Conditional random fields model are employed to label information fields in the block. For basic information and contact information, word level feature can archieve a good result. By expanding features from word level to block level, it's can effectively solve the long distance dependence problem with education related information fields.Secondly, as published papers best reflects teacher's research information, we design a framework to obtain the paper set of a teacher. There are name non-exact match and name ambiguation errors in the paper set. We can easily remove the first type of errors with rules, so this article focuses on the author name disambiguation problem, using a hierarchical clustering based method. Only basic paper information are used as features. The method uses two cluster termination conditions, prior knowledge based and similarity threshold based.Finally, based on teacher personal information and research information, we studied the construction of teacher social network and community detection. There are multiple relationships between teachers, here we build the teacher network according to teacher's research area. Two methods are employed to achieve the goal. In the first method, topic model are used to find the topic distribution of one teachers's paper set. We calculate the distance between two teacher according to the distribution feature. Then Markov clustering model is applied to find communities. Another method uses keyword collection of papers to establish links among teachers. Two complex network clustering algorithms are employed to detect the communities in the network. We then analysis the two methods on the community quality and time complexity.
Keywords/Search Tags:information extraction, name disambiguation, social network, community detection
PDF Full Text Request
Related items