Font Size: a A A

Research On Keywords Extraction Techniques For E-commerce

Posted on:2019-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:J K FanFull Text:PDF
GTID:2428330566997295Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The keyword extraction is one of the most important topics in the natural language process field.The keyword extraction can help people find the key from the massive information,and it play a very important role in the explosion of network information today.The research of keyword extraction for the e-commerce domain aims to extract the keywords from the product title.Compared with other researches of keyword extraction problem,one of the difficulties is that the keyword in the ecommerce title is often A compound of words(for example,"television remote control").As a important step in the research of key word extraction technology for ecommerce,this paper first established a large-scale corpus related to e-commerce.Then relying on the corpus,we implemented the Textrank algorithm related to Chinese keyword extraction and the keyword extraction based on LSTM classification model.The latter allows the machine to automatically learn the characteristics of keywords and integrate feature learning into the model building process,avoiding feature engineering.In addition,using LSTM to build a neural network hierarchy can make better use of the semantic information of words.Experiments have proved that the effect of the keyword extraction method based on LSTM classification model has a good effect on keyword extraction in the field of ecommerce.This article also focuses on the keyword extraction algorithm based on RNN sequence annotation model.The algorithm inputs the word sequence obtained by segmenting the product title into the model,and finally outputs the probability that each word becomes a keyword.Considering the superiority of LSTM over RNN and the ability of bidirectional LSTM to make better use of context information,we improved the model and implemented a keyword extraction method based on the Bi LSTM sequence annotation model.Experiments show that the improved model has achieved better results in keyword extraction.However,the keyword candidates in the title often consist of multiple words.Therefore,an extra post-processing module is required to obtain the keyword candidate word's score in the title,and then sort the key words.So we built a large-grained word segmentation dictionary and experimented with a large-grained word segmentation strategy in order to remove the rigid post-processing module.Experiments show that using a large-grained word segmentation strategy helps to improve the effectiveness of keyword extraction.Finally,we try to further improve the Bi LSTM sequence annotation model and introduce the attention mechanism,which combines the fixed-length sentence vectors obtained through the LSTM and the attention information between the sentence representation and the representation of each word in the sentence.Experiments have proved that the keyword extraction algorithm implemented by the new model with the attention mechanism is effective.
Keywords/Search Tags:E-commerce, keyword extraction, sequence annotation, Recurrent neural Network, Bi-directional Long Short-Term Memory
PDF Full Text Request
Related items