Font Size: a A A

Design And Implementation Of Text Retrieval System Based On Deep Learning

Posted on:2020-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z L TangFull Text:PDF
GTID:2428330575957050Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increase of Internet data,different text retrieval systems have been applied to different products.At the same time,the increase of data makes the development of neural network and deep learning technology greatly.However,the existing text retrieval systems seldom use the depth learning technology.Therefore,a text retrieval system has been designed and implemented.Users can search text through this system and get some text that is closest to their own goals.Text retrieval technology and deep learning algorithm are focused and a text retrieval system using distributed operating system has been designed.The following three aspects have been completed.Use the distributed crawler based on the Master/Slave architecture to crawl data and clean the crawled data.The samples are constructed based on the crawled data,and the constructed samples are combined with the TREC data set.In order to improve the deep text matching result,two text matching models are introduced:Siamese semantic network model which is based on single semantic feature extraction and the MatchPyramid model which is based on drectly semantic modeling.At the same time,a new semantic network model is proposed based on these two models:a joint model based on the Siamese semantic network model and the MatchPyramid model,which combines the feature extracted from Siamese semantic model with the features extracted from MatchPyramid model.Experiments show that using the MAP value as an evaluation index,the model can achieve more than 8%better results than traditional retrieval methods and 3%higher results than existing deep learning algorithms.Text Retrieval System Based on Distributed Architecture has been designed and implemented.To speed up text retrieval,the system uses the Hadoop and Spark streaming computing frameworks,the most commonly used distributed systems in the industry.The system implemented several maj or modules:1.Offline data processing module,including data cleaning and distributed index construction.2.Offline model training module,in order to speed up online retrieval speed,the system adopts offline training online loading strategy.3.In order to improve the retrieval speed,the historical high frequency query with there search result cache module are added.4.Search word processing module:contains text error correction and feature extraction of search terms.5.Display of search results:The system uses a web server built on Flask to display the final search results.
Keywords/Search Tags:text retrieval, deep learning, semantic similarity matching, distributed system, retrieval system
PDF Full Text Request
Related items