Research And Application Of Word Segmentation Technology In Heterogeneous Data’s Unified Retrieval

Posted on:2013-08-19

Degree:Master

Type:Thesis

Country:China

Candidate:X M Han

Full Text:PDF

GTID:2248330362470885

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the high-speed development of informatization, various kinds of data accumulaterapidly,and data structures become more and more complex.Facing so much information,whichespecially have big differences on logical structure and storage structure, how to easily, quickly andaccurately search the effective information to gain important resources, is the urgent need of people inthe information age.To solve the problem of heterogeneous data unified retrieval, this thesis presentsan unified retrieval system of heterogeneous data and brings in the segmentation technology forimproving information retrieval precision and efficiency of searching system.This thesis introduces the research status at home and aboad on word segmentation andheterogeneous data retrieval.It analyzes and summarizes basic theory, common technologies andsolutions, typical algorithms and so on,which are about word segmentation and heterogeneous dataretrieval. Based on this, it puts forward the heterogeneous data retrieval general framework, andintroduces an overview of the framework level division,the function module of different levels, theoperation of the system process and the characteristics of the structure in detail.After analyzingtraditonal word segmentation algorithm and dictionary mechanism,it designs a fast word segmentationmethod,which combines the characteristics of heterogeneous data retrieval and is based on modifiedwhole word dichotomy dictionary. And in addition, concrete realization of the algorithm is given.Theexperimental results of the algorithm shows it can divide the text into words precisely and respondquickly.It does well in the segmentation of queries in heterogeneous retrival, extraction of key wordsand comparison of search results’ similarity. It studies the method of calculating similarity whichconsists of hard core in the layer of retrieval results’ processing.Similarity calculating algorithm basedon bayesian model is devised.And for improving retrieval efficiency,improved fast word segmentationis applied in the pretreatment of calculating similarity.Finally, the word segmentation technology in the heterogeneous data unified retrieval is appliedto the ship information management system of a provincial affair bureau. The application results showthat data retrieval coverage, response time of retrieval system, retrieval precision have obviousascending.It can solve the problem of heterogeneous data unified retrieval effectively.

Keywords/Search Tags:

Words segmentation, heterogeneous data retrieval, meta search engine, XMLdocument, bayes classifier

PDF Full Text Request

Related items

1	Current Status Research And Improved Design Of Meta Search Engine
2	Research Of Search Engine Key Technique And Optimize Performance
3	Based On Meta-search Business Model Practice
4	Research On The Key Technique Of The Meta Search Engine
5	Research And Implementation Of TheChinese Search Engine Based On Meta Search
6	Research On Key Techniques Of Intelligent Meta-search Engine
7	Design And Implementation Of Meta-search Engine System Based On Distributed Architecture
8	The Design And Implementation Of Vertical Search Engine For Position Query
9	Meta Search Engine Based On Neural Network
10	Research On Some Key Technologies Of Personlized Meta Search