Font Size: a A A

Research And Implementation Of Key Technologies In Information Extration And Analysis Of Police Intelligence On The Internet

Posted on:2020-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:D FangFull Text:PDF
GTID:2428330596975125Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the times,the Internet has gradually become the primary way for individuals and authorities to release and share information,and the quantity of information on the Internet is increasing rapidly.Among this massive data,there exits a large amount of valuable police intelligence,it's crucial to obtain and analyze this part of information for the relevant departments.By utilizing the network police intelligence,the relevant departments can timely deal with various kinds of emergencies endangering social security and stability,and keep abreast of the public security situation in specific areas over a certain period of time.However,due to the huge scale and the high rising speed of the Internet data with unstandard fromat,it is necessary to seek for efficient and automatic methods to process and analyze the data.Based on the above background,this paper conducts an in-depth study on the key technologies in the extraction and analysis of network police intelligence,designs and implements a network police intelligence analysis system by applying the technologies in natural language processing.This paper focuses on the key issues in this system,and improves the CBOW and Skip-Gram algorithms by combining the work of this paper.The main research contents are as follows:a)Present a word sense disambiguation algorithm using CNN automatically extract features of the samples: generates the character list based on the clause samples set with polysemous words,then converts the clause samples set into data matrices as the input of the 6-layer CNN model and trains the model parameters,finally,takes the CNN full-connection layer output as the input of a Support Vector Machine(SVM)classifier to predict the word sense of the polysemous word.This method has higher accuracy than algorithms using hand-designed features in both police intelligence data and general word sense disambiguation data and has higher universality.b)Optimize the CBOW and Skip-Gram algorithms on the problem that the Word Embedding can't accurately describe the polysemy word for it only generate a single word representation per word.Present a Sense Embedding model which using CNN to recognize the meaning of the polysemy word;And on this basis present two kinds of Sense Embedding model using DBSCAN clustering algorithm and a Sense Embedding model using One-Pass algorithm which combines the recognition process and generation process.Generating word representation with stronger representation ability for multiple application scenarios.c)Design and implement an approximate text de-duplication based on the event features to recognize repeated and redundant text in the network police intelligence.Firstly,CRF model is used to extract the named entities in the text,and then the entities are used to do the event-level de-duplication.Secondly,two kinds of VSM models are established: the word bag model and the TF-IDF model to do the language-level de-duplication.Moreover,this paper designes and implements a text similarity comparison algorithm based on document fingerprint to further improve the efficiency of the de-duplication algorithm using VSM models.
Keywords/Search Tags:Network police intelligence, Word sense disambiguation, Word Embedding, Sense Embedding, Approximate text de-duplication
PDF Full Text Request
Related items