Font Size: a A A

Applied Research Of Chinese Word Segmentation In Agriculture Search

Posted on:2016-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:L J ZhouFull Text:PDF
GTID:2308330482975270Subject:Agricultural informatization
Abstract/Summary:PDF Full Text Request
For the convenience of the relevant researchers engaged in professional agricultural, accurate and timely access to relevant information, to provide the decision basis, to speed up agricultural informatization and intelligent building is an irreversible trend. Chinese word segmentation technology is an indispensable important segment of agricultural vertical search, agricultural expert system, agricultural knowledge push, agricultural information retrieval, agricultural data mining and so on. Based on the research of the existing Chinese word segmentation method, It’s found that the segmentation accuracy mainly depends on word segmentation method and word segmentation dictionary. Therefore, this article puts forward the N the shortest path method based on particle swarm segmentation model, and applied to agricultural search. The full text of the main research results are as follows:(1) Based on the word segmentation method of N-gram model, for this method is based on a dictionary word segmentation, this paper constructs all the word segmentation path, using related search algorithm, find the path of least cost from all paths as the final segmentation result. The search algorithm used in this article is improved particle swarm algorithm, which has the following two improvement.First of all, for its convergence precision is not high and easy to converge to local optimum problem, this paper introduces a kind of inertia weight which is dynamically changing along with the number of iterations and the distance between the particle size, and sets the ratio control of both the influence of the inertia weight, in order to increase the population diversity, Introduces the hybrid mutation operator, and finally, it puts forward a dynamic particle swarm optimization algorithm based on hybrid mutation, through testing algorithm effectively improves the efficiency of the algorithm.Secondly, combining the advantages of particle swarm optimization (pso) algorithm, using the optimal particle and other particles in the different role of population, this paper proposes an adaptive mutation particle swarm optimization (pso) algorithm. In this algorithm, the optimal particle automatically adapts to adjust its own search neighborhood size according to the population evolution degree, which enhances the local search ability of population, with small probability, randomly initializing the location of the particle which is not optimal. When the velocity is zero, the speed adaptively changes, resulting in increasing the population diversity and global search ability.(2)The improved particle swarm algorithm are used to word segmentation algorithm to find the shortest path, This paper creates N-Shortest-Paths Method based on particle swarm segmentation model, and use it to word segmentation. The experimental results show that under the same core participle thesaurus dictionary, particle swarm N-Shortest-Paths Method compared with other algorithms of the sentence correct recall rate higher, further analysis of the experiment we also found that a large part of segmentation correct rate depends on the core dictionary(3)By using the Python programming tools and web scraping technologies, real corpus of agricultural field is established. The corpus, which contains a total of 694 kinds of journals, is mainly derived from the Chinese network of agriculture as the foundation of science, agricultural engineering, agronomy, plant protection, crop, horticulture, forestry, animal husbandry and animal medicine, silkworm bee and wildlife protection, fishery and fishery, etc. This Paper selects abstract of the first issue in 2014 from 694 journals, and a total of 21269 records as a participle standard corpus, on this basis, build a training corpus and the word segmentation dictionary.(4)By using N-Shortest-Paths Method based on particle swarm segmentation model, web scraping technologies and vertical network search technology, the Chinese word segmentation technology is applied to agriculture field search. This paper analyzes the demand of keywords vertical search system, designs related technology module, and finally successfully develops application tools of keywords vertical search.
Keywords/Search Tags:Chinese word segmentation model, web crawler, agricultural corpus, agriculture specialized dictionaries, agriculture vertical search
PDF Full Text Request
Related items