Font Size: a A A

Web-based Study Of Web Classification Mining Technology

Posted on:2009-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:F Y XuFull Text:PDF
GTID:2208360242493265Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the improvement of computer hardware storage capacity and software environment, data expansion of World Wide Web makes data and resource owned by people increase, the structure of World Wide Web becomes more complex too. The characteristics such as the mass one, the Heterogeneous one and distributive one pose challenges to this area. Recently Web mining has attracted much attention in information industry. The reason for this situation is that world wide data data can be used, it is necessary for us to transform data to useful information and knowledge. The goals of user on line activities are diversity. Understanding goals and intention can greatly help information providers to personalize contents and thus improve user satisfaction. For example, Ecommerce Web sites can display entertainment content based on users'EI.Recently, a new family of"Web2.0"application is currently emerging on the Web. These include user-centric publishing and knowledge management platforms likes Wikis, Blogs, and social sharing systems. Social bookmark services, such as Del.icio.us and Flickr, have attracted considerable users'interest and achieved significant success. These services not only provide user-friendly interfaces for people to annotate Web resource, but also enable them to share the annotations on the Web. Social annotations reflect that how user understand web resources content and provide rich meta-data for Web page classification. This paper combines web page and related tags create virtual document to classify web pages and gets promising results, which provides basis for further web mining task.This paper has done the work of several respects of the following mainly:1. User entertainment intention mining. Understanding goals and intention behind a users'can greatly help information providers. In this paper, we define the Entertainment Intention(EI) and present the framework of building machine learning models to learn EI based on Web pages content. Based on that framework, we build models to detect EI from web pages. Our experiments show that frequent keywords are more likely to have entertainment intention. The ability of EI detection shows promising results.2. Social annotation representation and distribution.The annotation is the freely and openly assigned text, which are some keywords describe the content of item in different aspects, thus provide rich meta-data for Web page classification. We analysis the dynamics of tagging systems and the distribution tag of popular Web site. Then we build the tripartite model for relational heterogeneous objects, user, tag and URL and give the representation of social annotation.3. Web page classification based on social annotation. In the social annotation environment, the same category annotations are usually assigned to the same category web pages by users with common interest. The the annotations assigned to the same category web pages are of the same category.In this paper, we build model to classify web pages: web page content, annotations metadata for corresponding pages and the virtual document of the Web page integrating the annotation metadata and the content of Web page. Experiments confirm that the tags are effective for web pages classification and the Virtual document-based method shows promise results.
Keywords/Search Tags:Web Mining, Social Annotation, Entertainment Intention, Web Page Classification, Virtual Document
PDF Full Text Request
Related items