Font Size: a A A

Research Of Automatic Classification Technique In Uyghur Kazak Kyrgyz Search Engine

Posted on:2011-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2178360305487260Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Now, Web search engine has became an information retrieval tool widely used by people. But the retrieval results returned by traditional search engines always contain lots of irrelevant information. Thus it brings much inconvenience to users. Increasing the precision and recall of the traditional search engine is still a key problem in study of information retrieval system. One of the effective ways is the application of automatic classification technique.In the base of traditional search engine technique, this thesis combines the automatic webpage classification and makes detailed study on webpage preprocessing, term weighting approach, classification algorithm and application of classification in Uyghur,Kazak,Kyrgyz multilingual information retrieval based on webpage classification. We use the webpage classification technique in Uyghur, Kazak, Kyrgyz search engine, and optimize the query to further increase the precision and recall. The main contents of the thesis are as follows:1. This paper introduces the general process of webpage classification, the existing feature selection evaluation function and the common classification algorithm.2. We find the shortcoming of the traditional term weighting approach TF*IDF.Considering the distribution information of feature terms in different classes and web documents, I propose an improved weighting function TFIDF-DI.3. Using the above theory, we complete a Uyghur webpage classifier in Visual Studio 2005, and collect large numbers of Uyghur WebPages for classification experiment. The result of our test indicates that the precision of the classifier increases when using the improved term weighting approach.4. We add a webpage classification module in the traditional Uyghur, Kazak, Kyrgyz search engine and implement multilingual information retrieval system with the function of webpage classification. The development of Uyghur,Kazak,Kyrgyz multilingual search engine with automatic classification function in this thesis provides convenient retrieval service to minorities web users in Xinjiang region. It also provides a good start for further developing high quality Web search tool.
Keywords/Search Tags:Search Engine, Automatic Classification, Term Weighting, Information Retrieval
PDF Full Text Request
Related items