Font Size: a A A

Research And Implementation On Intelligent Information Retrieval Based On Classification

Posted on:2006-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:C P ChengFull Text:PDF
GTID:2168360155964217Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The information Retrieval usually refers to the text information Retrieval, including the information saving and organize, express, search, accessing etc. .Its core is text information index and retrieval technology. Historically, the information Retrieval goes through the manual retrieval, the computer retrieves, actual network, the intellectualized retrieval and so on.Currently, the retrieval object of information is from the relative closing, stabilize consistent, be expand to the information contents of independent database centralized management to open, dynamic state, renew quick, distribute extensive, manage the lax contents of Web. The user is expand to by the professional personnel of original intelligence report to include the personnel of the business, manager, teacher student, various professional, etc., they put forward higher and more diverse request to the efficiency and accuracies that information retrieval. The recall and accuracy of current retrieval system (the search engine) are low, to enhance recall and accuracy of the information retrieval search tool, the people put forward various ways of technique and calculate, the aim is making the information retrieval tool near to humanized and intelligence gradually.This paper combines the existing classification technology of web page while studying traditional information retrieval technology, have carried on comparatively systematic research to the intellectual search engine. To the Chinese word segmentation in the intelligent information retrieve that is classified on this basis, web page index, web page feature selection web page classify, get sure thinking and opinion out of. The groundwork of the thesis is as follows:(1) This paper directs against the feature of the structure of the web page at first, has analyzed to the classification course contributory information composition in web pages. Used a kind of comparatively advanced dictionary to store, make it improve a lot in cutting the component velocity, satisfy the automatic classification course to the Chinese word segmentation in the Chinese web page basically too on the exactness of segmentation result. Have adopted the way in which string counts of word, has raised the discernment probability of recorded words.(2) The way of traditional feature selection and has not considered the semanteme between Chinese word is related (the antonym, near-synonym, synonym). In the paper, I have also drawn the web page title besides considering semanteme is related, participate in the abstraction of the feature word together, make the abstraction of the feature word rational even more than the traditional method, and has done some improvement to CHI formula, making it accord with the feature of Chinese Web even more expresses.(3) Study the existing classification method of web page, combined the feature of the web page, and proposed a kind of new web page feature weighting formula(4) Index and search of web page(5) Structure a comparatively intact classification retrieval system based on above-mentioned theories, and has appraised to the experimental result.
Keywords/Search Tags:Information Retrieval, Chinese word Segmentation, Feature Selection, Inversed Document Vector, Space Model, KNN Categorization Algorithm
PDF Full Text Request
Related items