Research And Implementation On Intelligent Information Retrieval Based On Classification

Posted on:2006-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:C P Cheng

Full Text:PDF

GTID:2168360155964217

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The information Retrieval usually refers to the text information Retrieval, including the information saving and organize, express, search, accessing etc. .Its core is text information index and retrieval technology. Historically, the information Retrieval goes through the manual retrieval, the computer retrieves, actual network, the intellectualized retrieval and so on.Currently, the retrieval object of information is from the relative closing, stabilize consistent, be expand to the information contents of independent database centralized management to open, dynamic state, renew quick, distribute extensive, manage the lax contents of Web. The user is expand to by the professional personnel of original intelligence report to include the personnel of the business, manager, teacher student, various professional, etc., they put forward higher and more diverse request to the efficiency and accuracies that information retrieval. The recall and accuracy of current retrieval system (the search engine) are low, to enhance recall and accuracy of the information retrieval search tool, the people put forward various ways of technique and calculate, the aim is making the information retrieval tool near to humanized and intelligence gradually.This paper combines the existing classification technology of web page while studying traditional information retrieval technology, have carried on comparatively systematic research to the intellectual search engine. To the Chinese word segmentation in the intelligent information retrieve that is classified on this basis, web page index, web page feature selection web page classify, get sure thinking and opinion out of. The groundwork of the thesis is as follows:(1) This paper directs against the feature of the structure of the web page at first, has analyzed to the classification course contributory information composition in web pages. Used a kind of comparatively advanced dictionary to store, make it improve a lot in cutting the component velocity, satisfy the automatic classification course to the Chinese word segmentation in the Chinese web page basically too on the exactness of segmentation result. Have adopted the way in which string counts of word, has raised the discernment probability of recorded words.(2) The way of traditional feature selection and has not considered the semanteme between Chinese word is related (the antonym, near-synonym, synonym). In the paper, I have also drawn the web page title besides considering semanteme is related, participate in the abstraction of the feature word together, make the abstraction of the feature word rational even more than the traditional method, and has done some improvement to CHI formula, making it accord with the feature of Chinese Web even more expresses.(3) Study the existing classification method of web page, combined the feature of the web page, and proposed a kind of new web page feature weighting formula(4) Index and search of web page(5) Structure a comparatively intact classification retrieval system based on above-mentioned theories, and has appraised to the experimental result.

Keywords/Search Tags:

Information Retrieval, Chinese word Segmentation, Feature Selection, Inversed Document Vector, Space Model, KNN Categorization Algorithm

PDF Full Text Request

Related items

1	Research On The Chinese Science And Technology Document Information Retrieval System Based On The Vector Space
2	Research And Implementation Of Text Categorization System Based On VSM
3	Research And Application Of Automatic Categorization Of The Chinese Documents
4	Study On Feature Selection Of Chinese Document Categorization
5	Research On Chinese Text Categorization Algorithms Based On Technology Text
6	Chinese Text Data Classification
7	Research Of Chinese Text Categorization Algorithms Based On Information Entropy
8	The Studies On Chinese Text Categorization Based On Pso And Svm
9	The Research And Implementation Of Chinese Text Categorization
10	Improved Vector Space Model And Its Application To Document Classification System