Font Size: a A A

Research On Fine-grained Classification And Cluster Analysis For Criminal Cases

Posted on:2017-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:M XiaFull Text:PDF
GTID:2348330503489794Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, the system of public security intelligence is facing great challenges brought by the huge amount of data, mainly the text data. The traditional manual approach has been difficult to meet the needs of the business, so it is very urgent to use more automated and intelligent text mining technology to improve work efficiency.A new system has been achieved, which focuses on refined case classification and mining implicit cases that police concerns a lot.A two-level classifier based on Na?ve Bayes and keywords co-occurrence graph is proposed. The method takes the features of case text which includes short length, low frequency and hierarchical but imbalance distribution of categories. The method has two steps. At first, it brings the characteristics of word nature into the widely used formula DF, and proposed double-factor evaluation method to select features. And then uses Multivariate Benoit Model for uneven categories to achieve the first-level classification fast and accurately. Secondly, based on the result of first-level classification, we build keywords co-occurrence vectors for every document set of two-level categories, the weight of the keyword in vectors is calculated by words' co-occurrence that has been corrected by inverse class frequency factor. At last, we use simple vector distance algorithm to achieve the second-level classification. In addition, we use the synonyms network technology to eliminate the interference of the field synonyms.A method based on feature density, which is used to mine implicit cases, is proposed. This method includes three steps. First of all, we extracted the structured features of case text from unstructured case description information. Secondly, we defined a formula to calculate feature similarity between case texts, which takes time of the crime, scene of the crime and two-level classification into consideration and used Analytic Hierarchy Process to decide the weight of these three features. Thirdly, we proposed a feature density clustering algorithm based on OPTICS, named OPTICS-FD. And this algorithm could assist investigators solve the case by finding the clusters of implicit cases efficiently.Finally, experiments on double-factor evaluation, two-level classifier, case feature extraction and implicit cases clustering were tested. The results show that, compared to the traditional ways, our methods have enhanced three indicators include accuracy, recall and F-measure.
Keywords/Search Tags:Fine-grained classification, Cluster analysis for series of cases, Two-level classifier, Double-factor evaluation, Feature similarity
PDF Full Text Request
Related items