Semi-automatic Construction Of Classification Trees Oriented To Commodity Field

Posted on:2018-01-05

Degree:Master

Type:Thesis

Country:China

Candidate:D Pan

Full Text:PDF

GTID:2348330512487256

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of e-commerce,T20(TV to Online)business model,in which e-commerce and medias are cooperated,is widely discussed.In this business model,users can get real-time recommendation of video products.However,the current recommendation comes from manual processing and only shows the links of goods,without the corresponding attributes and values of attribute.Applying product tree to the T20 business model is helpful for finding the goods in videos and showing their corresponding attributes,which is friendlier to users.The commodity classification trees are usually constructed manually,which requires a great amount of human effort,and the current classification of goods does not have the specific attributes of goods.Therefore,semi-automatically constructing a product classification tree that contains the attributes of categories is of great value for business applications.The key steps of semi-automatically constructing a product classification tree are knowledge fusion and commodity category clustering.Knowledge fusion refers to the fusion of different expressions of the same concept in the heterogeneous databases.It is mainly used for fusing the category in the construction of category tree.Category clustering refers to the clustering of the same category,for saving the time and labor when constructing the category tree.The key work of category clustering is the calculation of the similarity of categories.This paper focuses on the knowledge fusion and commodity category clustering.The specific contributions are as follows:(1)An attribute matching approach based on Word2Vector and structure information is proposed for calculating the similarity between attributes for fusing attributes.The traditional knowledge fusion methods based on knowledge base or semantic dictionary are used to compute attribute similarity by directly calculating attribute similarity.However,the similarity between different attributes of same kind of goods is usually high,for which the traditional methods are not effective.In this paper,we find that the values of same attribute in different databases are usually similar.Therefore,a method of attribute fusion by calculating the similarity between attribute values is proposed to improve the fusion accuracy.(2)The semantic gain preprocessing method by semantic extension and semantic complement is proposed for enhancing the semantic context.Word2Vector can be used to calculate the similarity between attribute values,but the commodity data is a kind of semi-structured text with weak semantic,which leads to poor Word2Vector training results.In order to solve this problem,a data preprocessing method to enhance the semantic relation of commodity data is proposed:the values of the same attribute are merged,and the value of "two value" attribute is substituted.The data is enhanced at the semantic level.(3)The clustering algorithm based on hybrid similarity computation is proposed for commodity category clustering.The traditional clustering algorithm based on editing distance,only taking the literal features of the category name into account,cannot dig the semantic features of the category and the attribute characteristics.In this paper,four kinds of similarity calculation methods are proposed,containing the characteristics of words,semantic features,attribute characteristics,keyword characteristics and related information,which improves the clustering accuracy.(4)An evaluation based on editing distance is proposed.Fusion results and clustering results need to manually determine the accuracy of the results.But it uses a lot of manpower and time to judge the result of each experiment.Therefore,a kind of evaluation index based on editing distance-class editing distance is proposed.Class editing distance calculate the cost of results turn to right result by move,insert and delete.This evaluation saves a lot of manpower and time.Two sets of experiments are performed to verify the algorithms we proposed.The experimental results show the effectiveness of our methods.And the commodity classification tree management system(CCTM)is designed and implemented on the basis of the two algorithms proposed in this paper.

Keywords/Search Tags:

classification tree semi-automatically construction, knowledge fusion, category clustering, Word2 Vector, mixed similarity

PDF Full Text Request

Related items

1	Research Of Clustering Algorithms For Mixed Data Based On Attribute Weighting And Similarity Measuring
2	Research On Semi-supervised Classification Of Data Stream Based On Adaptive Density Clustering
3	The Research Of Personalized Recommendation Methods Based On Category Similarity And Classification
4	Fuzzy Clustering Applied Research, To Automatically Determine The Field Of Expert Knowledge
5	Research On Clustering Algorithm For Mixed Datasets
6	Extension Of Twin Support Vector Machine In Clustering And Fuzzy Semi-supervised Classification
7	Reasoning Recognition Of Multidimensional Attributes Text Based On Decision Tree
8	Large Scale Classification Algorithms Based On Clustering Feature Trees
9	Research On Clustering Methods And Their Applications
10	Tree Decomposition For Large Scale Semi-supervised Classification