Font Size: a A A

Analysis Of Network Text Based On The Semantic Framework

Posted on:2015-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:S Y QuFull Text:PDF
GTID:2268330428972591Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the classification, retrieval and filer of the text information are the focus of the field. From the filtration technology, most of traditional text filtering technology is a simple keyword matching and a statistical method based on the word frequency. These filtering methods are very applicable, but it is difficult to realize intelligent because it can only achieve the judgment on structure corresponding to the levels but could not determine the meaning of a text and the semantic of text.In order to solve the problem above, this paper puts forward a kind of informa-tion filtering matching algorithm based on semantic framework of hierarchical structure to filter network text. The essential component of the algorithm includes five parts:word segmentation of the target text, part-of-speech tagging, feature ex-traction, semantic frames extraction and similarity calculation between the semantic frames. In the key text extraction, based on the different importance of the page ele-ments, the web pages are divided in two layers, the title layer and the text layer. The weight of the title layer is bigger than that of the text layer. In the phases of text processing, firstly, the work is word segmentation and part-of-speech tagging for the sentences. Secondly, With the help of grammar rules, the POS stored in word seg-mentation results and position information of words in the sentence, the keywords are selected and filled in the semantic frames. The last is to calculate similarity of the semantic framework of the sample text and the Text to be filtered. In the algorithm, the weighting strategy is improved and optimized, taking account of three factors, the semantic distance between frame elements and action verbs, the relevance of the frame elements and the hierarchical structure, computes and normalizes the weight. Compared with the traditional algorithm, tests show that the recall and precision ra-tio of the text all have been improved to some extent. On the basis of Information filtering algorithm, this paper designs a web test filtering system. After testing, the performance and filtering effect of the system is effectively improved.
Keywords/Search Tags:text filtration, semantic framework, hierarchical structure, similaritycalculation
PDF Full Text Request
Related items