Font Size: a A A

Filter Key Technology Research, Based On The Text Of The Particle Swarm And Genetic Optimization

Posted on:2013-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z F ZhuFull Text:PDF
GTID:1118330371969965Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of computer technology and network technology, lead to largenumber of redundant information and garbage information, such as information flooding,information labyrinth and information disease. The overflow of redundant information and junkinformation affect the efficient and quality of the utilization of Internet, and block the healthydevelopment of Internet. On this occasion, information filtering technology arises. Informationfiltering is a process of filtering information automatically from a large scale of dynamicalinformation by certain technical methods in order to satisfy users'demands, and screeninguseless information in the meanwhile. The generally concept of information filtering , is filteringinformation of various forms, including text, voice, image, video and so on. Yet, the narrowconcept of information filtering refers to text information filtering. In this paper, the researchabout information filtering is based on the problem in text information, especially in Chinese textinformation.In recent years, many research institutions and individuals have carried out the informationfiltering technology especially for Chinese information filtering technology research, andaccumulated a lot of valuable experience; have also made some good results. However, due tocomplexity and multi-meanings of text information, especially Chinese information etc., lead totext information filtering research still has several problems remain to be solved as follows:(1) In the content-based text information filtering, lots of words usually emerge after wordssegment of training document set, if all words are used to represent classes, the time and spacecomplexity would be increase, and many words contribute little, or even effect filtering effect.So, searching term weighting method in text information filtering is a problem to be solved.(2) On the basis of the extracted feature items, proper optimized algorithms are required togenerate filtering template, but the established filtering templates using optimized methods atpresent and it can't satisfy the filtering requirement. Therefore, a problem to be solved is toselect a preferable optimized method, this method could make the generated class template better,meanwhile the template could be continuously improved.(3) When matching to the template, the whole text is usually employed, thereby, theparagraph features are ignored. Especially in the filtering of network text, the texts acquired alsohave some additional information. So how to optimize filter texts and increase match rate is aproblem to be solved.(4) The filtering template could only infinitely close to the real template, and deviationalways exists, so we need requires feedback information and adjust the filtering templateincessantly. Thus, a problem to be solved is how to collect feedback results to improve thefiltering effect.The goal of this paper is to overcome the above problems and five innovation points asfollows:(1) This paper establishes a feature weight calculation methods comprehensive document weight, paragraph weight, sentence weights and feature weights.The training sets are converted to vector space for sorting algorithm analysis incontent-based text information filter. However, the training document sets usually produce alarge vocabulary after words segmentation, if all words are used to express the category, it wouldincrease the complexity of text filtering, and many words of text filtering contribution is minimal.This paper presents a feature weight calculation method comprehensive document weight,paragraph weight, sentence weights and feature weights.(2) This paper establishes Chinese text information filtering model using geneticalgorithm, and proved it's feasibility through the theoretical and experimental.No matter which method is used to establish the filtering template, it is an approximateexpression of the filtering requirements. However, there is a real filtering template in theory, andthis real filtering template could accurately express the true needs filtering. But it is not availableto get the real template through mathematical calculation or experiment method, the mostcommon method is adjusting the initial template. This paper establishes Chinese text informationfiltering model using genetic algorithm, and proved it's feasibility through the theoretical andexperimental.(3) This paper establishes the logical structure paragraphs using characteristic wordconcept, and realizes the paragraph matching mechanism to improve the classificationresults.In the matching and classification applied vector space model, it is always the wholedocuments to be classified document matching and classification, which ignores the paragraphfeature of text to be classified. At the same time, the matching mechanism based on paragraph isalso often the traditional physical paragraph, that is to say, different paragraphs with differentweight values. Like doing this, it often contains certain mechanical, because these physicalparagraphs tend to be shorter or itself contain information that is too small. Especially in thenetwork text filtering, the text documents from network often have some additional information,and it often leads to matching errors. This paper establishes the logical structure paragraphsusing characteristic word concept, and realizes the paragraph matching mechanism in order toimprove the classification results.(4) This paper introduces an improved particle swarm optimization and realized thecollaborative filtering profiles based on this improve particle swarm optimization.In order to achieve better classification results, a mass of training texts must be used to trainthe system. However, it costs a lot timer to collect the training texts. If we could effectivelyutilize the texts to be classified to adjust filtering system, it would bring out pretty better effect.This paper discusses the content-based filtering and collaborative filtering, and proposes a hybridfiltering model with these two methods in order to overcome their own shortages. In this hybridfiltering method, genetic algorithm is used to generate initial profiles on server-side, and themodified particle swarm optimization is used to update the profiles with the information fromusers.(5) This paper realizes information filtering system based on the text information filtering model using these improved strategies of this paper.In this paper, we realizes information filtering system combined with the integration ofparagraph feature weight calculation method, filtering template generation algorithm based onfuzzy genetic algorithm, logical paragraph division method based on concept, feedbackmechanism using particle swarm optimization, et al., and introduced a hierarchical filteringmechanism, put forward a hierarchical information filtering system, like doing this, the filteringability to adapt and filtration efficiency are improved.
Keywords/Search Tags:Text information filtering, Weight calculation, Fuzzy genetic algorithm, Particle group algorithm, Shallow semantic
PDF Full Text Request
Related items