Research And Implementation Of Text Mining Technology Based On Public Security Information

Posted on:2009-09-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Xu

Full Text:PDF

GTID:2178360242983007

Subject:Computer application technology

Abstract/Summary:

With the popularity of computers and the rapid development of Internet, the amount of the text information in the public security field, including the existing text database and the up-to-date criminal information on web pages, is increasing dramatically, which establishes a necessary need for various automatic tools as an aid for police officers to retrieve interested information from massive criminal case database in an efficient manner. And thus text mining becomes one of the hot topics in the field of data mining.In the current research areas of text mining, major techniques related to plain text processing mostly focus on Chinese word segmentation, text feature extraction, classification and clustering algorithms. However, there is still vacancy in research on systemic integration of these algorithms and its application. Based on the analysis of various text mining techniques, this thesis nurtures a text mining model based on dissimilarity calculation of criminal cases, which can effectively address the issues such as meager information and feeble presentation of the extracted features in traditional systems.The text mining model mainly consists of dissimilarity calculation of criminal cases and text clustering on these cases. In the dissimilarity calculation, the model integrates a method of knowledge matching based on decomposing criminal cases with recurring to evolved Chinese word segmentation algorithms, which effectively enhances the semantic analyzing ability of matching similar criminal cases and improves the precision of mining system by means of extracting and matching the keywords of criminal case texts, This reduces the number of similar criminal cases in the repository by augmenting the knowledge representing ability of single criminal case and simultaneously increases the learning ability of criminal cases repository. In the text clustering part, K-Means algorithm was implemented based on the basis of the analysis of the criminal case information on web pages. This method implements text clustering algorithm according to the key words of the criminal cases, and thus merge similar information on web pages effectively.

Keywords/Search Tags:

Text mining, Chinese Word Segmentation, Keywords extraction, Matching, Similarity, Text clustering

Related items

1	Study On Chinese Text Similarity Computing Based On Word Segmentation
2	Key Techniques Of Text Ming On Criminal Cases
3	The Design And Implementation Of Text Topic Key Word Processing System Based Chinese Word Segmentation
4	Design And Implementation Of The Structured System For Pathological Microscopy Text
5	Research On Text Similarity Algorithm Based On VSM Combined With Word Semantics
6	The Analysis On The Basic Techniques For Preprocess Of Text Mining And The Study On The Application Of Text Mining
7	Research And Implement Of An Optimal Approximate Matching System Of Structureless Text
8	Study On Chinese Text Classification Technology Based On Improved Text Similarity Algorithm
9	Research On Antomatic Chinese Text Summarization Of Web-oriented Text Mining
10	Study On Text Category Oriented Chinese Text Mining And Its Implementation