Font Size: a A A

Research On Resource Text Classification Algorithm And Construction Of Resource Database

Posted on:2019-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:G S LiuFull Text:PDF
GTID:2428330623468998Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Resources refer to the collective name of all substances,energy,and information that can be exploited and utilized by humans.Resource test refers to the text describing resource and relevant information.It mainly contains resource name,property,parameters,value,function and application field.Classifying resource function by application area is more targeted to help people to search resource.Resonable representation of resource texts will greatly help us solve problems encountered in real life.Resource function texts are usually short texts.Short texts are sparse,real-time and non-standard,which can affect classification accuracy.On text feature selection,some experts and scholars select high-quality features by weighting feature.And some experts have expanded short text features in the text representation.This paper will also start with text feature selection and text representation to improve classification.To solve the problem of high feature dimension disaster during the text feature selection,this paper proposes a two stage text feature selection algorithm.Based on the mutual information algorithm,balance parameter,frequency,concentration,part of speech,and position are introduced.Then,the training of the genetic algorithm is initialized by the first feature set,and the better feature set is further obtained.TFIDF formula does not consider the unevenness between classes,this paper improves the TFIDF and weight Word2 Vec.Finally,this paper applies the improved algorithms to resource function classification and achieves good precision.This paper also proposes a method to build resource database.The data comes from the entry of Baidu encyclopedia in chemical industry.Extensible knowledge representation is applied to the most critical knowledge representation in resource database construction.For resource name,properties,parameters and values,this paper uses a structured four tuple.For resource name,functions and application areas,this paper uses a structured compound three tuple and finish the visualization of the resource database.
Keywords/Search Tags:Text Classification, Text Feature Selection, Text Representation, Resource Database, Knowledge Representation
PDF Full Text Request
Related items