Research And Application Of Product Information Extraction Analysis And Recommendation Based On NLP

Posted on:2019-06-29

Degree:Master

Type:Thesis

Country:China

Candidate:X B Huang

Full Text:PDF

GTID:2428330572469120

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The role and impact of the Internet is self-evident in the era of knowledge and information.It not only affects our living habits,but also improves the quality of our life.The constant development of e-commerce websites also makes it convenient to browse product information on them.However,the key feature of e-commerce websites is the trading relationship,the essence of which is profit making.As a result,the information of many products is provided by intermediaries,while the websites often only give prices,pictures,introductions and other information of products.Besides,what kind of products are displayed to users and how to recommend products according to user preferences in massive products are also challenges currently facing the e-commerce.The continuously increasing user demands have rendered e-commerce websites not able to meet some special demands of users.To cite one example,an entrepreneur wants to customizea car.From a cost perspective,the user wants to know the manufacturers of auto parts and choose his favorite steering wheel and lighting.All parts are not necessarily from the same manufacturer.The current e-commerce websites are at the end of their wits for this kind of demand.In this context,it is especially important to recommend the products with specific manufacturers according to the user preference.The extraction of enterprise product information was mainly studied in this paper to recommend the information of personalized products after analysis to users.The product information comes from the official website of relevant manufacturers.But with the development of web design,an increasing number of web building tools are available to designers.In addition to the layout templates of web pages,web pages usually add a lot of items that are not related to the theme of the web page,such as navigation bars,ads,and pictures.Official websites of enterprises are always different in template layouts.In this case,it is very difficult to extract product information.In respect of personalized recommendation of products,the traditional collaborative filtering method is easily subject to the issues of inaccurate calculation of user similarity and cold start.Based on the above-mentioned issues of product information extraction and personalized recommendation,the method based on conditional random field and the DOM(Document Object Model)tree were combined to acquire product information.First,product names were taken from the annual reports of enterprises as the major category of the official online products,and then the specific product information is extracted from the official websites based on the extracted product categories.The main methods for extracting product names on the annual reports of enterprises are shown as follows: replacing product names in the operation discussion and analysis(Section 4)in annual reports of enterprises with product information displayed in the enterprise profiles on www.0033.com;identifying product names in the business overview(Section 3)in the annual reports of enterprises based on the conditional random field model;for the method of extracting product information on official websites,product names extracted from the annual reports of enterprises were used as the keyword group of official websites to determine the critical path and then extract the product information.To solve the issues of user similarity and cold start,a collaborative filtering algorithm based on trust and project preference was used in the present study.The main steps include the trust establishment between users,calculation of user personal feasibility,calculation of user preference similarity,and calculation of product attribute preference.A-share listed enterprises engaging in auto parts were the subject of the present study.The data sources include the annual reports of the A-share listed enterprises engaging in auto parts,the official websites of the A-share listed enterprises engaging in auto parts,and the website www.0033.com.In the extraction of product information,the information of products on the official websites of the auto parts industry was extracted,and the product system was established.The product category was deemed as the first level and the product name as the second level.Thus,the problem that the e-commerce websites cannot provide the relationship between manufacturers and product information is solved.In terms of the personalized recommendation of product information,the similarity between users can be calculated according to the browsing history of users,and products can be scored in advance,and then the products with the predicted score in the top 4 will be recommended to users.This method does not only solve the inaccurate calculation of user similarity caused by sparse user scoring matrix in the traditional collaborative filtering algorithm,but also solve the cold start problem of new projects and new users.The information extraction method,data analysis method and personalized recommendation method used in this paper can completely meet the demand of users to purchase products of different enterprises according to their own preferences.This is not only to solve an urgent problem of A-share listed enterprises engaging in auto parts,but more importantly to provide a complete solution and method for viable and effective personalized recommendation of products for users with similar needs in other fields,as well as to provide a more effective theoretical basis and practical application for the personalized recommendation of products with enterprise architecture.

Keywords/Search Tags:

Product name, conditional random field, Web information extraction, personalized recommendation, annual report of listed companies

PDF Full Text Request

Related items

1	Research On The Influence Of The Tone Of The Annual Report Of Listed Companies On Investors’ Attention
2	Information Recognition And Extraction From Chinese Periodical Papers Based On Conditional Random Fields
3	Product Information Words Recognition Based On Conditional Random Field In Electronic Commerce
4	Conditional Random Field Based Object Extraction
5	Research On Personnel Resume Intelligent Extraction System Based On Conditional Random Fields
6	Financial Risk Warning For Listed Companies Based On Deep Learning
7	Research On The Influence Of Negative Reports On The Stock Price Fluctuation Of Listed Companies
8	Research For Event Extraction Method In Specific Domain Based On Tree Conditional Random Field
9	Research On Object Extraction Of Automobile Product Based On Sequence Labeling
10	Research Of Text Structure Information Extraction Methods