Font Size: a A A

Research Of Domain-oriented Extraction Method Of Text Information

Posted on:2015-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:F K ZhouFull Text:PDF
GTID:2298330467977029Subject:Natural language processing
Abstract/Summary:PDF Full Text Request
As computers are widely used in various domain and the rapid development of Internet, moreand more event information are stored and processed as the form of an electronic document in thecomputer. The Internet is slowly becoming the main carrier of information and communicationplatform, it has become the largest collections of the various informations. As the times of big datacoming,80%of the information data is stored on the network as unstructured data (natural language,images, videos, etc.).As the chinese text has the characteristics of unstructure, untandardize and uncertainties, itadoptes the technology roadmap of “text description-normalized expression-structured extraction-pattern mining” to focus on the temporal attribute information extraction, the classificationmethods, the resolvelation methods and the extractions methods of the incident event field. It hasmade a solid theoretical foundation for the study of the extraction of the event information andservived a viable solution for the constructions of the national geographic-based informationservices.Firstly, based on the study of the emergencies structured expression, several extraction methodsof the Chinese text event property information was proposed, to make it exact to extract theinformation. For chinese text classification, the SVM model was applied for Chinese textclassication and achieve good results. For the non-temporal property information of theemergencies, the rules model and the statistical model was proposed and applied,. Not only the rulemodel but the statistical model were studied that they can bring different results in the field ofnatural language processing, so the combination methed of the both can be effective to achieve theextraction of the chinese event text in oriented domains. The combine method of HMM model andsyntax analysis model were finally used in this thesis for text attribute extraction, experimentsshowed that the method has better results. Finally, the feasibility of the method was proved throughthe realization of the prototype system.
Keywords/Search Tags:Domain-Oriented Extraction Method of Text Information, HMM Model, SVMModel, Natural Language Processing, SyntaxAnalysis
PDF Full Text Request
Related items