Font Size: a A A

Research And Application Of Techniques For Collection And Retrieval On Unstructured Data

Posted on:2014-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:H F MaFull Text:PDF
GTID:2248330395480764Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data in computer informationized system can be divided into two types of organization form structured and unstructured. Unstructured data is indicating that data is presented without pre-defined organization mode or inconveniently using two dimensions’data structure. As a matter of fact,80percent of business-relevant information originates unstructured data texts. Therefore, researches of techniques for collection and retrieval on unstructured data have been continuing to present from the last century. With popularization of internet and Web applications, information existing in any place, techniques for collection and retrieval on unstructured data are still worthy to be studied and practiced.Information extraction is the basis of unstructured data processing. After key information was extracted from unstructured text, it can be used for further analysis. In this way information can be effectively utilized.This article took one of the functional modules in city safety production supervision information system-notice/document management module as the object of study. Key information was collected from different types of unstructured text by using open-source program library. Thus the traditional way of manual data input has been changed. Completion of the full-text retrieval made the archiving and retrieval of unstructured data become an important application in city safety production supervision information system.The main work researched in this paper is summarized as the followings:1) Study and analyze unstructured text by open source library tools, such as PDFBox to get related information of document/notice.2) Research and implementation of the dictionary-based word segmentation algorithm, splitting the key fields and then new a reverted index file with the help of search framework. 3) Realization of GUI(graphical user interface) which is user-friendly to search.4) Study and describe the extension of the application of full-text retrieval.The technical achievements participated in studying, designing and realizing by author of this paper has put into application successfully in city safety production supervision information system.
Keywords/Search Tags:unstructured data, information extraction, full-text retrieval, Chinese wordsegmentation
PDF Full Text Request
Related items