Font Size: a A A

The Research And Implementation On The Biological Information Mining Model Based On Document Set

Posted on:2017-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:N N JiangFull Text:PDF
GTID:2180330485953316Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the status of science and technology,change with each passing day, Internet information era came, A variety of information resources with the Internet as the carrier aggregation, The achievements in the vast information resources about a wide range and large storage of information databases.how accurate, high-speed extract information to meet the interest to researchers as a new scientific research.Researchers with various areas of expertise can use summative,normative scientific language accurately describe the methods and results of experimental studies, And aid in the form of text stored and Internet can also be borrowed way of sharing out, So share the scientific literature there are a lot of expertise knowledge can be adopt. The scientific literature are expressed through the practice of research, highly summarized, And presentation by scientific experts in the fields of expertise is most similar. In the category of biological inf ormation, a number of experts proposed the thinking about literature knowledge discovery, through literature title, abstract, keywords, and other professional terms, will be hidden in the biological literature unearthed to be digged many little-known information.A large number of important experimental results are recorded in the bioinformatics literature.For example, quantitative trait locus(QTL) location information is usually in the table.QTL(quantitative trait locus) means that the position of genes controlling traits in the genome, It is an important basis for crop breeding research. However, with the rapid development of bioinformatics technology, the total number of biological literature is growing faster. Artificial taken the required information in the literature not adapt to rapidly growth of biological information literature. Text mining technology can automatically discover knowledge from text, Thus, the method of information extraction has been widespread use. However, most using text mining techniques to extract knowledge from the literature are focused on extracting from unstructured texts. In this paper, we propose a method for extracting structured information in the literature, and the complex and diverse forms into structured data, then combine the information extracted from the text to build the database.The text mining method we achieve through Soybean QTL data mining system to automatically mining analysis QTL, gene function and other information from the relevant literature. We use soy literature as a basic data, a comprehensive description of the current status of information extraction and development processing at home and abroad, In the text preprocessing, syntactic analysis and training on the basis of samples, we commmence extract rules and proposed distinctive text information extraction way, build a soybean soybean QTL database to provide data resources for research of soybean breeding and molecular bioinformatics. At the same time, this method can greatly reduce the burden o f labor to build the database. By construction of Soybean QTL database, extracted 245 records from 3638 literature records, data accuracy was 94.3%, the recall rate was 80.5%, F value of the method is 0.87.
Keywords/Search Tags:Text mining, Information extraction, Stanford Parser, Text preprocessing, Dependencies
PDF Full Text Request
Related items