Font Size: a A A

Research On Information Extraction And Full Text Retrieval Of Crop Diseases Articles

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:M X ZhuFull Text:PDF
GTID:2248330395992363Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid increase of the network information, people can access to vast amount of information indoor. But how to find the information we need from the huge information pool quickly and accurately has become a problem which need to be solved urgently. We have to find a flexible, rapid, accurate information extraction technology to solve the problem.Crop diseases and pests is one of the main agricultural disasters in China. It has many features such as multispecies, strong influence and often outbreak. Its occurrence area and severity have a strong influence on national economy of our country, especially caused heavy loss on the agricultural production. The database establishment of crop diseases and pests is important to guide the prevention and control of crop diseases and pests. At present, there are so many Chinese webpages of crop diseases and pests on the Internet, and it’s very useful for preventing and controlling crop diseases and pests. But these webpages often dispersed in different sites, and the structure, content format of webpages are also different with each other. By using information extraction technology we extract crop diseases and pests information from different websites and use the full-text retrieval technology for management.In order to realize the information extraction of crop diseases and pests we need to filter out the noise of the webpage, so that we can avoid the interference of noise on extraction algorithm. We propose a noise removing method according to the position of content blocks and the number of words in each content block that is produced after the webpage partition. Then, after research the ontology technology and anaylze the crop diseases and pests we construct the crop diseases and pests ontology with stratified and event-based construction mode. Combining the ontology with information extraction, we make the different sources of information to present in a unified view. We present a double layer cascading text classification algorithm to partition the effective information and judge the category of each block according to the optimal classes of all text blocks, thus completing the event extraction.In order to mange and query the long text fields of crop diseases and pests information, we construct the full-text index of crop diseases and pests information based on Lucene. Because Lucene do not good enough in Chinese retrieval, so we add our Chinese word segmentation tool on the basis of Lucene to realize high efficiency of Chinese word segmentation function.We have implemented a crop diseases and pests information extraction and full-text retrieval system based on the above research. The system can also extraction and retrieval of crop diseases and pests effectively.
Keywords/Search Tags:information extraction, webpage segmentation, domainontology, Lucene, inverted index
PDF Full Text Request
Related items