Research On Information Extraction And Full Text Retrieval Of Crop Diseases Articles

Posted on:2014-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:M X Zhu

Full Text:PDF

GTID:2248330395992363

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Along with the rapid increase of the network information, people can access to vast amount of information indoor. But how to find the information we need from the huge information pool quickly and accurately has become a problem which need to be solved urgently. We have to find a flexible, rapid, accurate information extraction technology to solve the problem.Crop diseases and pests is one of the main agricultural disasters in China. It has many features such as multispecies, strong influence and often outbreak. Its occurrence area and severity have a strong influence on national economy of our country, especially caused heavy loss on the agricultural production. The database establishment of crop diseases and pests is important to guide the prevention and control of crop diseases and pests. At present, there are so many Chinese webpages of crop diseases and pests on the Internet, and itâ€™s very useful for preventing and controlling crop diseases and pests. But these webpages often dispersed in different sites, and the structure, content format of webpages are also different with each other. By using information extraction technology we extract crop diseases and pests information from different websites and use the full-text retrieval technology for management.In order to realize the information extraction of crop diseases and pests we need to filter out the noise of the webpage, so that we can avoid the interference of noise on extraction algorithm. We propose a noise removing method according to the position of content blocks and the number of words in each content block that is produced after the webpage partition. Then, after research the ontology technology and anaylze the crop diseases and pests we construct the crop diseases and pests ontology with stratified and event-based construction mode. Combining the ontology with information extraction, we make the different sources of information to present in a unified view. We present a double layer cascading text classification algorithm to partition the effective information and judge the category of each block according to the optimal classes of all text blocks, thus completing the event extraction.In order to mange and query the long text fields of crop diseases and pests information, we construct the full-text index of crop diseases and pests information based on Lucene. Because Lucene do not good enough in Chinese retrieval, so we add our Chinese word segmentation tool on the basis of Lucene to realize high efficiency of Chinese word segmentation function.We have implemented a crop diseases and pests information extraction and full-text retrieval system based on the above research. The system can also extraction and retrieval of crop diseases and pests effectively.

Keywords/Search Tags:

information extraction, webpage segmentation, domainontology, Lucene, inverted index

PDF Full Text Request

Related items

1	Distributed Retrieval System With Webpage Ranking Improvement Based On Lucene
2	Based On Research And Optimization Lucene Inverted Index Performance
3	Research And Application Of Sorting Algorithm Based On Lucene
4	Research And Application Of Full Text Retrieval Technology Based On Lucene
5	Military Retrieval System Design And Implementation
6	A Study On Compression Algorithm Performance Based Inverted Index
7	Website Crawler And Retrieval System Based On Lucene
8	Design And Implementation Of Chinese Webpage Automatic Collection And Classification
9	Parallel Search On Ciphertext Based On Index In Cloud Computing
10	Research And Design Products Lucene Search System Based On Parity