Font Size: a A A

Research And Implementation Of Text Mining Technology Based On Public Security Information

Posted on:2009-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y J XuFull Text:PDF
GTID:2178360242983007Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of computers and the rapid development of Internet, the amount of the text information in the public security field, including the existing text database and the up-to-date criminal information on web pages, is increasing dramatically, which establishes a necessary need for various automatic tools as an aid for police officers to retrieve interested information from massive criminal case database in an efficient manner. And thus text mining becomes one of the hot topics in the field of data mining.In the current research areas of text mining, major techniques related to plain text processing mostly focus on Chinese word segmentation, text feature extraction, classification and clustering algorithms. However, there is still vacancy in research on systemic integration of these algorithms and its application. Based on the analysis of various text mining techniques, this thesis nurtures a text mining model based on dissimilarity calculation of criminal cases, which can effectively address the issues such as meager information and feeble presentation of the extracted features in traditional systems.The text mining model mainly consists of dissimilarity calculation of criminal cases and text clustering on these cases. In the dissimilarity calculation, the model integrates a method of knowledge matching based on decomposing criminal cases with recurring to evolved Chinese word segmentation algorithms, which effectively enhances the semantic analyzing ability of matching similar criminal cases and improves the precision of mining system by means of extracting and matching the keywords of criminal case texts, This reduces the number of similar criminal cases in the repository by augmenting the knowledge representing ability of single criminal case and simultaneously increases the learning ability of criminal cases repository. In the text clustering part, K-Means algorithm was implemented based on the basis of the analysis of the criminal case information on web pages. This method implements text clustering algorithm according to the key words of the criminal cases, and thus merge similar information on web pages effectively.
Keywords/Search Tags:Text mining, Chinese Word Segmentation, Keywords extraction, Matching, Similarity, Text clustering
PDF Full Text Request
Related items