Font Size: a A A

Information Extraction And Information Visualization Based On Conditional Random Fields

Posted on:2018-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z R LiFull Text:PDF
GTID:2348330515973893Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the network,the security problem in network space has been paid more and more attention.The rapid expansion of the amount of data,the speed and the type of the security data brings the problem of how to integrate,store and manage the massive heterogeneous data.With the rapid growth of the amount of information in the network space,the personal information is also a geometric growth in network space,but the data are rich and the information is poor.The main source of information is the text type data,how to extract the mass text information effectively becomes a hot issue.The traditional method of artificial statistics is that view and analysis of these data in an artificial manner,and extract the required information from it.Although the character information extracted by this method has high accuracy,but it takes a lot of manpower and resulting in low efficiency of information extraction.This method has been impossible to meet the requirements of information acquisition efficiency.Therefore,information extraction technology appears.The main results of this paper are as follows:1.This paper proposes a rule for extracting the character information.Through the research on the format and characteristics of the network data,the rule of character information extraction is established.The rules mainly include three parts:the character of the word,the position and the method.Location information includes three types:Body,Cookies,Url;Method is the current session using GET type or POST type;The feature leading words are the first three key words in the position of the information value of the relevant characters,using word segmentation to extract feature leading words.The rule can be used to extract the character information accurately.2.This paper presents a method of information extraction based on CRFSuite for character attributes.CRFSuite is an implementation of a conditional random field(CRFs)algorithm for sequence data labeling.The model has the characteristics of fast training speed and high accuracy.By studying the existing domain,extraction of character information in the network data in the characteristics of the preamble,location,and methods to establish the character information extraction rules.CRFsuite is used to train the model,and the model is applied to the network data to match the character information,building a structured information database.Finally,got the structured information data.3.Design and implementation of visual analysis system.After the information exrtraction,the relationship between the structured information is displayed in graphical form,and the virtual character information is associated with the real character information.At last the advantages of information resources into a decision advantage.
Keywords/Search Tags:CRFs, CRFsuite, Machine learning, Information extraction
PDF Full Text Request
Related items