Font Size: a A A

Research On Entity Clustering And Tag Extraction From Multi-views

Posted on:2015-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:G X WuFull Text:PDF
GTID:2298330422490914Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, as Internet develops in depth, human produce huge amount of data eve-ry day. The Internet has been the main source of Big Data. One of the significant datatypes is Entity. Unlike the common data, Entity is an object which usually consists ofmany different attributes and each attribute describes one aspect of the object. In otherwords, Entity has multiple views. For example, a product can be treated as entity. Aproduct has many attributes like price, size and weight. Besides that, a product mayhave some description texts, and also some User-Generate-Content like reviews, com-ments and rates.. All of these information together give a completely description aboutthe mobile phone. The Internet has become the largest entity repository and how to or-ganize these entities and provide a better search experience for user is significant. In thispaper, we will study how to use the clustering and tag-extraction techniques to organizethe huge amounts of entities. However, traditional techniques often focus on the sin-gle-view data. The common way to process multi-view data like entity is converting themultiple views into a single view. This simple method neither considers the differencein importance between different views, nor takes into account of the relationship be-tween multiple views. To address the two issues, we introduce the multi-view methodsinto the clustering and tag-extraction of entity. Our work includes:Firstly, we extended the K-means methods into multi-view situation using the ideaof co-training and apply it into the entity. First we reviewed several existing multi-viewclustering methods and point the weakness of them. We proposed a new clustering ob-jective and obtained a new clustering algorithm by minimize the objective. We evaluat-ed our method on three dataset and the results show the effectivity of our new algo-rithm.Secondly, we introduced the multi-view idea into the tag-extraction of entity. Firstwe analyzed two single-view tag-extraction methods: TFIDF and LDA. In contrast withthe single-view methods, we first extracted tags from multiple texts about the entity in-dependently, then ranked them by taking into account of the importance between dif-ferent views. The experiments on an open dataset show the improvement when consid-ering the multi-view information.Thirdly, for the smartphone app, a special type of entity, we built a vertical retrievalsystem by combining the multi-view entity clustering and tag-extraction techniques weproposed in chapter2and3. Using the multi-view clustering, the system canre-organize the search results efficiently and effectively. At the same time, the system will generate tags for each cluster. Users can explore the search results efficiently andfast by the tags. Thus we can build a user-friendly smartphone app retrieval system.
Keywords/Search Tags:entity, multi-view clustering, tag extraction, retrieval system
PDF Full Text Request
Related items