Font Size: a A A

Text Search Techniques And Optimization Strategies On Hybrid Data

Posted on:2013-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:H J ZhuFull Text:PDF
GTID:2298330467478166Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continious development of computer science and network, the amount of information data increases exponentially. Among this huge data set, the majority of data is in the hybrid form, which is hybrid data. Hybrid data contains not only unstructured data but also structured data, and the unstructured data is mainly the text. It is very challenging and urgent to search users’ desiring information from the huge data set. The task of text retrieval is research about how to find the text sets which are related to goal text from the huge data set. Accordingly, text retrieval is always a hot topic and urgent issue in computer field. And text retrieval (search) is the foundation and core part of the imformation retrieval. This paper focuses on text search and its improvement on hybrid data.In recent years, there are many excellent contributions on text search. These approaches are mainly classified into two Categories. Some approaches are aimed at constructing the accurate and reasonable similarity function, while others are focused on improvement or expansion on query object (texts, keywords, documents etc). However, these techniques are mainly focused on mathing the query text with single text and have ignored other structured data in hybrid data. So if we utilize these methods or techniques on hybrid data, the ranking of returning results usually is not very ideal or perfect. In other hand, because the hybrid data may contain various kinds of strucutured data and unstructured data, this can result in the complexity of data. It can be seen that it is a challenging work to make the most of the structured data to improve text search on hybrid data.According to the characteristic of hybrid data and uses’real search need, in this paper, we provided an attribute classification strategy. We proposed a basic method and a improved method about text search based on the attribute classification strategy. Our contributions are summarized as follows:(i) We build an attribute classification strategy;(ii) We provide the corresponding scoring methods of the attributes based on the attribute classification strategy and propose the basic method of text search on hybrid data;(iii) We excavate some useful and untrivial rules for the structured data;(IV) Based on these excavating rules, we utilize them to rank the results and give some filtering rules to reduce the search space. This is our improved method of text search on hybrid data and modified improved method. Finally, a lot of experimental results on HP real data show that our approaches guarante the search results with high recall, top-k precision, mean average precision and good search performance, time respectively.
Keywords/Search Tags:structured data, text, text search, rules, improvement
PDF Full Text Request
Related items