Web Information Extraction System In The Bookmarks Research And Implementation

Posted on:2015-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:D M Yang

Full Text:PDF

GTID:2268330425487890

Subject:Software engineering

Abstract/Summary:

Social bookmark system is an effective tool to collect, manage and share web information. But its social features depends on the amount of users and resource. This thesis mainly researches on how to apply web information extraction and related studies of natural language processing to bookmark system, solving the cold start of system, therefore improving user experience.This thesis firstly makes a research on web information extraction algorithms. Based on the open source project of Goose, it improves the scrapping of web pages data, adds the identify of web pagesâ€™ charset automatically, then improves the preprocessing of the web pages as well as adds the supports for Chinese web pages and finally adds the formatting function of web page text, optimizing usersâ€™ reading experience. At last, this thesis implements Web information module based on ElementTree. This module could be used in production system with a high practicality. This thesis presents tag recommendation algorithm which is combined with web development pattern and implements a simple web summary function based on the results of Web information extraction and Web metadata.This thesis designs and implements a bookmark system, the reference architecture is Tornado as the web/application server and web development framework, MongoDB for the database server, AngularJSã€jQuery on the client side, along with Bootstrap3for styling, implements a client application with responsive layout and flattening grid, and develops a chrome plug-in. Web information extraction module was integrated to the system, users can read and editor bookmark content, which effectively improve the user experience. Based on the information extracted, this system adopts full-text search to implements the search function avoiding the limitations of search on page title as well as a search on entire web page.The system this thesis introduces is different from current popular recommending reading system. It focus more on management bookmarks rather than reading. If a combination with notes system and bookmark system, it will be more efficient on information secondary filter.

Keywords/Search Tags:

bookmark system, web information extraction, tag recommendation, MongoDB

Related items

1	Design And Implement The Social Bookmark And Statistics Recommendation System Based On Web2.0
2	Bookmarking Content Management System Design And Implementation
3	Design And Implementation Of Query Optimization Module In MongoDB Control System
4	Research On Forum Information Extraction And Storage Based On Cloud-Based MongoDB
5	Related Studied On Information Extraction And Information Recommendation Based On Web Data Mining
6	Research On Intellisensing Of User Preferences Based On Bookmark Social Network
7	Research And Application Of Product Information Extraction Analysis And Recommendation Based On NLP
8	Chinese Resume Information Extraction And Recommendation
9	Design And Implementation Of Travel Vertical Search System Based On MongoDB
10	Research And Implementation Of REST-Style Based Social Bookmark Service