The Research On Semantic-Based Web Information Automatic Aggregation System And Key Techonology

Posted on:2015-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:N Gong

Full Text:PDF

GTID:2298330467463873

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the success of the social network, personal blog and twitter, Internet has accessed to the age of Web2.0, the characteristics of which is open, equality and decentration. The enormous growth of the Web information resources makes information overload has become increasingly serious problem. Therefore, how to get the semi-structured and discrete information association and aggregation dynamically to providing effective service and promoting knowledge sharing, has become the main research direction of scholars.In this paper, on the basis of the study of text clustering analysis, with the help of text processing technology such as Chinese participle, combined with the traditional search engine technology and RSS information aggregation technology, this paper presented a kind of information processing method to refine information. This method can aggregate same or similar information automatically based on latent semantic, so as to find new topics and trace the existing topics. The primary researches in this study included:Firstly, aiming at the lack of information processing in traditional information aggregation technology, this paper proposes a web information automatic aggregation system. According to the different of function, the system is divided into three parts, which include information acquisition, information preprocessing and semantic aggregation. Secondly, this paper proposed a web content extraction method based on the punctuation distribution and HTML tag similarity. Experimental results showed that the proposed method can effectively and accurately extract web content in different themes. Thirdly, this paper deeply studied the theme model of text, especially the LDA model that can cluster text base on latent semantic information. According to the characteristics of Web information like diversity and changeable topic, this paper did some improvement of LDA to make the LDA model, which can only handle the offline information, apply to online Web information aggregation system. Experimental analysis showed that the algorithm can be classified documents which have similar subject based on latent semantic, and can also analyze the trend of topic according to the topic distribution and topic popularity in different time.

Keywords/Search Tags:

information aggregation, LDA model, content extraction, latent semantic

PDF Full Text Request

Related items

1	The Implementation And Research Of The Probabilistic Latent Semantic Analysis Model In The Search Engine's Business Text Classification System
2	Objectionable Information Filtering System Based On ATN Algorithm And Latent Semantic Indexing
3	The Study Of Latent Semantic-Based Personalized Search Key Technology
4	High Dimensional Aggregation Model Of Digital Literature Resource
5	Research On Heterogeneous Academic Information Extraction And Aggregation Based On Web
6	The Semantic Query Expansion Model Based On Latent Semantics Index Model
7	Latent Semantic Analysis-based Spam Filtering System Design And Realization
8	Study On Ontology-based Micro-content Aggregation And Inquiry Technology
9	The Research On Latent Semantic Classification Model
10	Application Of Latent Semantic Indexingin Chinese Information Retrieve