Font Size: a A A

Research On Scientific Literature And Scientific Data Storage Retrieval Based On Elastic Search

Posted on:2017-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ChenFull Text:PDF
GTID:2278330488964860Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Universities and research institutions of Yunnan province have a large number of scientific and technological resources, but there’s a series of problems with these resources, such as heterogeneous, the huge number, distributed widely and so on. In order to promote the development of the science and technology, we urgently need to implements the unity storage and retrieval of the massive heterogeneous scientific and technological resources, furthermore, to implement the storage and retrieval platform for the scientific literature and data, which satisfies the requirement of Yunnan province.Based on the above research goal, we studied the storage and retrieval of the scientific literature and data, some progress is made. The main research works will be followed.First, we studied problems of unity storage of the massive scientific literature and data. Combining with the current application requirements, we stored the indexes of the resources by collecting the metadata, stored the indexes into the relational database. Then synchronized them to the distributed retrieval cluster used the JDBC plugin of ElasticSearch so as to realized the unified storage of the resources, and laid the foundation for unification technology resource management.Second, this paper studied the efficient retrieval of the scientific literature and data systematically. The standard unified description language (SUDL) and the unified pretreatment of retrieval request are designed and implemented, then we discussed the index building of the resources, and made the efficient distributed resources retrieval used ElasticSearch. At last, the prototype system was designed and implemented. Otherwise, we also made the experimental verification.Third, the retrieval results’ overlapping and large number problems have unified handling. Because of the great number of the resources, the above problems cannot be avoided. Focus on the overlapping problem, we analyzed the removal algorithm based on the characteristics and keywords, we processed the repeated retrieval results by the access algorithm based on simple access price. Focus on the large amount of retrieval results data, the most useful information on the top through sorting the results to provide best services to users. At last, we analyzed the performance of the algorithm, and put forward an optimization method.On the whole, this paper put scientific data into the unified scientific and technological resources management effectively, therefore implemented resources distributed storage and retrieval which is valuable for the resources management. This paper realized a service platform of scientific and technical intelligence based on ElasticSearch effectively. Experiments proved that the method is completely feasible.
Keywords/Search Tags:Scientific and Technological Resources, Massive, Heterogeneous, Storage and Retrieval, ElasticSearch
PDF Full Text Request
Related items