Font Size: a A A

Research On Oil And Gas Resources Network Information Collection And Analysis Methods

Posted on:2019-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:B H LiFull Text:PDF
GTID:2370330542996547Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
With the growth of massive amounts of information on oil and gas resources on the Internet,how to accurately extract information that satisfies the needs of users from the numerous and complex information is particularly important.To Collect and extract the information of major oil portals on the Internet,store and analyze these information,and present them to users in a personalized and customized way that according to their needs and interests are the need for the development of the oil and gas industry.For the collection process of oil and gas resources network information,when the users collected information,they have the requirements of deep focusing and complete information for the collected information.The following three aspects are studied in this thesis:1)Combining the technology of jsoup and Lucene,and the extensibility research on Heritrix,in this thesis we propose a set of industry dynamic information collection strategy and analysis method in professional dynamic information collecting,the strategies and methods can be used to achieve the three objectives: timeliness,content extraction accuracy,and completeness;2)In order to meet the needs of users,focus on providing information that users are interested in,this thesis has adopted a comprehensive database of oil and gas resources information professional thesaurus in improving the user's search information focusing strategy;3)In order to construct the professional vocabulary of oil and gas resources information,it is necessary to be able to identify new words in the industry documents.This thesis has developed a method based on the improved Prefixspan algorithm to extract new words in Chinese texts,and applied Sequence pattern mining algorithm Prefixspan to the extraction of new words in Chinese texts.For the Prefixspan algorithm,the sequence patterns mined out are not continuous,and the mined sequence model items contain inclusion relations.This method improves the algorithm and uses a combination of semantic features and statistics to extract new words from Chinese corpus effectively.This thesis applies the above research results to the project of Ministry of Land and Resources,the Dynamic Information Collecting System of Oil and Gas Resources project implementation,the application results show that:1)The improved method has high accuracy in Chinese text for the recognition ofnew words in professional fields.2)The industry dynamic information collection strategy and analysis method proposed in this thesis can better meet the user's requirements for the dynamic information collection of the personalized customized industry.It can not only ensure that the interest information in the specified time interval on the collection website can be fully captured,but also provide user-interested information according to the needs of the user,which improves the efficiency of the industry's dynamic information collection system.The industry dynamic information collection strategy and analysis method proposed in this thesis is universal,and it can be used to construct the dynamic collection system of network information in other industries.
Keywords/Search Tags:Dynamic information collecting, Web crawler, Prefixspan, Sequence pattern mining
PDF Full Text Request
Related items