Font Size: a A A

Application Of Chinese Document Retrieval Based On Deep Learning

Posted on:2016-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:J W SunFull Text:PDF
GTID:2298330467495558Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Social progress is often accompanied by the improvement of the means ofproduction, and people through a variety of social tools continue to satisfy their ownneeds are becoming more and more smart. With the popularity of Internettechnologies in recent years, people from seeking needed information online more andmore opportunities. With the improvement of people’s needs, information retrievaltechnology is more and more attention. Especially in the past10years, with theadvent of the age of big data, text, images, sounds, and so on more and moreresources are constantly filled with network database, data show that10years ofhuman history, data and produce data. This brings up a question, how to quickly andefficiently from the vastness of the data to extract useful information. It is clear thattraditional models of information retrieval is difficult to satisfy people’s needs, peopleneed to find another way to deal with, machine learning techniques have emerged.Based on parallel hard to fast to be mining in the context of big data problems,machine learning techniques is presented in the depth of the cutting edge learningtechnologies, used for fast and accurate extraction of Chinese relevance document,and take the analysis of the influence of practical retrieval system may occur in somecases.First of all, analyzing the characteristics of parallel documents in Chinese, andfully comprehensive analytical characteristics of deep learning technology. Throughunderstanding the traditional model of information retrieval information retrievalmodel and the more widely used in dealing with the advantages and disadvantages ofChinese parallel document retrieval, and efficiency. Then verify depth of neuralnetwork through experiments in the area of indicators; experiments show that thismethod is more accurate, comprehensive retrieves a large number of Chinese paralleldocuments that contain hidden information.Secondly, the proposed depth of learning technology combined with traditionalinformation retrieval model new model to solve the problem of deep learning modelof training for a long time. And by adjusting parameters and adjust the depth the number of hidden layer in neural network and node number for each tier, to optimizethe entire neural network. Finally, we suggest using Google latest deep learning tools,Doc2Vec, turning every article into Word vector in the form of re-training through thedeep neural network, the results show that turning the article into a word vector formcan be better in some ways reflects the intended meaning within the document inorder to retrieve more accurate and comprehensive documents in parallel.Finally, design and implement deep learning for Chinese documents retrievalsystem. The system supports the user through some search terms, and retrieved morequickly and comprehensively the needs of Chinese documents in parallel.
Keywords/Search Tags:Information retrieval, Deep learning, Neural network, Chinese paralleldocuments, Doc2Vec
PDF Full Text Request
Related items