Information Extraction And Information Visualization Based On Conditional Random Fields

Posted on:2018-02-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z R Li

Full Text:PDF

GTID:2348330515973893

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of the network,the security problem in network space has been paid more and more attention.The rapid expansion of the amount of data,the speed and the type of the security data brings the problem of how to integrate,store and manage the massive heterogeneous data.With the rapid growth of the amount of information in the network space,the personal information is also a geometric growth in network space,but the data are rich and the information is poor.The main source of information is the text type data,how to extract the mass text information effectively becomes a hot issue.The traditional method of artificial statistics is that view and analysis of these data in an artificial manner,and extract the required information from it.Although the character information extracted by this method has high accuracy,but it takes a lot of manpower and resulting in low efficiency of information extraction.This method has been impossible to meet the requirements of information acquisition efficiency.Therefore,information extraction technology appears.The main results of this paper are as follows:1.This paper proposes a rule for extracting the character information.Through the research on the format and characteristics of the network data,the rule of character information extraction is established.The rules mainly include three parts:the character of the word,the position and the method.Location information includes three types:Body,Cookies,Url;Method is the current session using GET type or POST type;The feature leading words are the first three key words in the position of the information value of the relevant characters,using word segmentation to extract feature leading words.The rule can be used to extract the character information accurately.2.This paper presents a method of information extraction based on CRFSuite for character attributes.CRFSuite is an implementation of a conditional random field(CRFs)algorithm for sequence data labeling.The model has the characteristics of fast training speed and high accuracy.By studying the existing domain,extraction of character information in the network data in the characteristics of the preamble,location,and methods to establish the character information extraction rules.CRFsuite is used to train the model,and the model is applied to the network data to match the character information,building a structured information database.Finally,got the structured information data.3.Design and implementation of visual analysis system.After the information exrtraction,the relationship between the structured information is displayed in graphical form,and the virtual character information is associated with the real character information.At last the advantages of information resources into a decision advantage.

Keywords/Search Tags:

CRFs, CRFsuite, Machine learning, Information extraction

PDF Full Text Request

Related items

1	Based On The Same Field Crfs And Interdisciplinary Under Brand Word Extraction
2	Research On Event Extraction
3	Research And Implementation On The Technique Of Citation Labeling Based On CRFs Model
4	Research And Application On Key Technology Of Chinese Information Extraction
5	The Research Of Land Cover Information Extraction With Remote Sensing Data Based On Machine Learning
6	Research On Extraction Of Web Textual Geographic Information
7	Information Extraction Of Chinese Biodiversity Document Based On Machine Learning
8	Research On Deep Learning Technology For Information Extraction Applications
9	Research On The Automatic Term Extraction In The Area Of Information Science
10	Machine learning for information extraction in informal domains