Font Size: a A A

Information Extraction Based Data And User Behavior Of E-commerce

Posted on:2017-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:J GanFull Text:PDF
GTID:2348330485488114Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the Internet and E-commerce grow explosively in China, mounts of data being generated and hundreds of millions of users being attracted by Alibaba and other companies. In other words, the big data age is coming soon. The face of massive data, how to use data effectively, how to extract what the users most want, and what is the most valuable information, are the core of the problem. We need to transfer from data to information more and more quickly especially in the E-commerce data processing. In a word this is what the article to study the information extraction issue, with a particular focus on the area of electronic commerce.The existed information extraction technologies include NER(Named Entity Recognition) and relation extraction(Relation Extraction). Named entity recognition technology is now mainly in the following methods: Method based on rules and dictionaries, method based on statistic and method based on a mix of both methods. When the method based on rules and dictionary has targeted optimization rules, the accuracy rate is high, but it is difficult to find so many people to do the things and it is not easy to copy to another place. The method based on statistic has lower precision and recall rates and the algorithm complexity is higher, but the scalability and great room for improvement is better. Considering that a large number of scholars manage to improve mathematical and statistical models to achieve higher accuracy and recall, in order to get truly intelligent machine identification. The classic models in named entity recognition are HMM(Hidden Markov Model), ME-HMM(hidden Markov model maximum entropy), CRF(Contional Random Field) and so on. Relation Extraction is analyzed to extract the relationships between entities from the massive corpus, such as geographical names, institutional affiliations between name, item name similarity relationship and the various synonymous relationship between short and full name etc.And it is important to make the information extraction system come true. That is the way to evaluate the algorithm. OPENIE is the best system to exact English information. The goal is to make a system to extract Chinese information.The main contribution of this paper is:1) Introduce the classic information extraction model such as HMM(Hidden Markov Model), ME-HMM(hidden Markov model maximum entropy), CRF(Contional Random Field) and word embedding model. And these models and algorithms will be the baseline in the experiment.2) Based on the classic model--HMM named entity recognition for processing E-commerce data, proposing a new model called the Lexical Hidden Markov Model(Lexical-HMM) which is based on a keyword to enhance the model in the electricity supplier application scenarios get a higher named entity recognition accuracy. Propose a method to extract the similar relationship between entities based users to search and browse behavior. In the regular text data, adding the information of user searching and clicking bipartite relations, it is creative to extract the similar relationship between entities.The proposed algorithm is experimented and analyzed. Model and algorithm comparing to the classical model, the accuracy of the proposed has a better performance than others, as well as recall and F-values. Experimental results show that the proposed model and technology have a better performance in the real data set especially in the multiple electricity supplier website.3) Design and implement an information extraction system based on the algorithms of this article. It is clear to show the designing process and program result. That is evident the system is a better way to mine information from big data.
Keywords/Search Tags:named entity recognition, relation extraction, hidden Markov model, user behavior, Thesaurus relations
PDF Full Text Request
Related items