Font Size: a A A

Discovery And Classification Of Information Sources In High-end Equipment Manufacturing Industry

Posted on:2019-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2348330542458085Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The Internet has become an important source for us to access information.However,the rapid growth of Internet information has also brought people the problem of information overload.As a way to solve information overload and meet information requirements in specific domain,the topic-oriented information integration technology has received wide concern and become a research focus.Information integration technology can integrate the topic information that are distributed in different information sources and provide information services.Effective and accurate discovery and classification of information sources is the key technology to web-based information integration and information service,therefore,this research is particularly important.The research on discovery and classification of information sources in high-end equipment manufacturing industry(HEMI)aims at the demands of information service for HEMI and the integration of information and industrialization.Since manually accessing the HEMI information source has the shortcoming of heavy workload,low efficiency and difficulty to meet real time requirements,we need an automatic method for discovering information source.Through investigating the existing related technologies and considering the features of a HEMI web site,we presented a method based on distributed representation for the automatic discovery of HEMI information source and its related topics.First,two different distributed representation methods are used to extend query keywords.Second,the relevance degree of web page with the query keyword was calculated by utilizing the distributed representation of it.Then,according to the proportion of the related pages in the column,whether or not the column is the HEMI information source is determined.Meanwhile,we proposed a Bi-LSTM and distributed representation based method for web pages relevancy computing.Finally,the CNN and RCNN models are used to classify the text of the pages under the related columns in the previous step.Experimental results show that:(1)For web pages relevancy computation,the Bi-LSTM and distributed representation based method is superior to the method based on vector space model or LDA model in performance.It improved the accuracy of web ages relevancy computing effectively;(2)Compared to CNN,RCNN can get better results on information sources classification task,the F1 value is over 90%.
Keywords/Search Tags:Information Integration, Information Sources, High-end Equipment Manufacturing Industry, Distributed Representation, Relevancy
PDF Full Text Request
Related items