Font Size: a A A

Based On Maximal Matching Of The Library Of Shanghai Dian Ji University Of Feature Extraction System Design And Implementation

Posted on:2013-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:J LuFull Text:PDF
GTID:2248330395974060Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the Chinese search engine, the role of Chinese word segmentation is obvious,the results of a direct impact on the performance of search engines. The current Chineseword there are three main ways: word-based string matching methods, methods basedon understanding of word and word-based statistical method. The so-called chineseword segmentation system, is the modern Chinese word segmentation method in thesentence. Because of modern Chinese grammar habit, Chinese sentences and words thatbetween the mark. And english words and words used between spaces, so there is noword segmentation problem. But in china, every sentence, the word and the wordproblem is does not have the space, so we must use some intelligence technologyseparation. chinese automatic word segmentation algorithm from20th to now hasbecome a hot spot in the study of computer major, because language is complicated, thebottleneck of the computer technology that has been in the stage of development.Firstly, this thesis is to present the segmentation algorithm which has beenanalyzed, summarized, discussed the implementation of the Chinese has not beenidentified two major problems: ambiguous word recognition and not landing. Chineseword segmentation process of development most difficult problem is the ambiguityidentification and identification of new words. Chinese word of the future direction ofboth to solve such problems, making the correct word to achieve a higher rate, but alsofor the industry continued to expand word the application of Chinese wordsegmentation, Based on the term frequency matrix in the frequency of each item (wordfrequency) in statistics, according to the size of word frequency characteristics of apredetermined number of selected items constitute the feature subset (ie, keyword),word frequency design space for feature extraction. Firstly, the maximum matchingalgorithm for word segmentation file, then import the word frequency matrix, the matrixof word frequency statistics of the frequency of occurrence, and finally extracted textfeatures.word major research paper in the maximum matching algorithm.This thesis mainly studies library of feature extraction system development anddesign. The Chinese word segmentation and feature extraction techniques are applied to the design can be applied to libraries of feature extraction system, and the system designprocess and the experimental results are introduced in detail. Application of the system,the school library management becomes more efficient and faster.
Keywords/Search Tags:Chinese word segmentation, Research Progress, Probability, New wordrecognition
PDF Full Text Request
Related items