Font Size: a A A

Research And Implementation On Massive Unstructured Data Organization

Posted on:2009-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:B ZouFull Text:PDF
GTID:2178360275971924Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The continuous development of Computer application led to the dramatic increase in amount of data. The growth rate of unstructured data is far greater than structured because the speed of data structured process is limited by artificial processing speed. The use of traditional directory hierarchy mechanism to organize large-scale unstructured data management has a lot of shortcomings. Directory tree cannot be the expression of the logic of relations between the massive data and themselves; maintaining the consistency of directory tree will be very difficult and great expenses when the unstructured data become large scale. Therefore, the massive unstructured data to study the organization become pressing issues now.Through analysis of information on the organization and management methods (such as directory hierarchy mechanism, indexing and retrieval, database and semantic file systems), and the combination of massive information management needs of organizations: the user participation, automation and model extraction, etc. designed and implemented a massive unstructured data organization and management systems MUDOMS. It use object model to describe information, use attribute-value pair to describe the characteristics of information, provide interface to users for creating attribute-value pair and relationships among attributes base on their understanding, within these records the process of user's understanding on the data. System also uses mixed index mechanism THLI (Tree Hash and Link-list Indexing) to index the attributes and relationships. MUDOMS also provided hot navigation; through this user-friendly users can find and access data and information quickly. Based on user habits, it also creates a personalized logic view, which use different classifications and display order to convenient use.On the basis of the user participation in attribute creation, also discussed the mechanism about obtain attributes and relationships automatically according to time, space and context, then re-organization them.According to the test and comparison, MUDOMS achieve a method on massive unstructured data management; added the artificial intelligence to obtain semantic attributes. According to comparison with similar software (Baidu hardisk search and Google desktop) testing, the time used to index data is 60 percent lower than similar software, and the space is 70 percent lower than similar software in average. When memory capacity is large enough, the time MUDOMS used to retrieve data in average is 20 times less than similar software.
Keywords/Search Tags:Unstructured Data, Massive Data Organization, Attribute Semantics, Semantic Extraction
PDF Full Text Request
Related items