Font Size: a A A

Research And Implement Of Distributed Full-text Retrieval System Based On Golang

Posted on:2016-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z TangFull Text:PDF
GTID:2308330479993937Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At Present, with the development of the urban rail industry, the amount of text data generated by the system also dramatically increased every day, how to get the information frorm the large amount data needed by the staff becomes a major challenge. Full text search refers retrieve information from the text data, people can retrieve information by natural language based on the information content instead of external features, to support the multi-angle, multi-laterally utilization of information resources. Since the closure of the enterprise environment, the good search engine on the open internet such as google、baidu and bing couldn’t be directly applied to the enterprise product environment. Therefore, we need to design a full text retrieval system for urban rail industry.At the background of urban rail network, this paper use golang programming language、nginx、Flask and Wukong implement a distributed full text retrieval system for the urban rail system, Focus on the problem in the distributed text retrieval system, such as string hashing algorithm, Chinese word segmentation, weight problems. This paper contains follow parts:(1) Anlayzed and compared the commonly used string hashing algorithms, the system choose the murmurhash string hashing algorithms implements a distributed search model to solve the bottleneck of system performance on single retrieval model.(2) Compared several types of common Chinese segmentation algorithm, and focuses on the segmentation algorithm based on statistical models: hidden Markov model and conditional random fields model, improved the hidden Markov model word segmentation, composed the hidden Markov model and conditional random fields to implement a improvement Chinese word segmentation, experimental results show that the segmentation algorithm has a good performance and segmentation segmentation effect.(3) On the basis of these studies, use nginx as proxy server, flask as web framework, wukong as search engine, design and implements a distributed full text retrieval system. Experimental results show that the full-text retrieval system with good performance, has a higher retrieval recall and precision and able to accurately and timely to provide users with the desired results.
Keywords/Search Tags:full text retrieval, string hash algorithms, Chinese word segmentation, Golan g, distributed system
PDF Full Text Request
Related items