Font Size: a A A

The Design & Implementation Of Industry Attribute Keyword Expansion Method Based On JAVA

Posted on:2017-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiuFull Text:PDF
GTID:2348330512964409Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the computer is more and more recognized by the public,the information showed a rapid expansion situation.Then,it also triggered a corresponding negative impact with the massive information: the public face lots of complex information,and it is difficult to seize the precise information they need.How to extract valuable information from a large amount of data is the hot debate during the current research.Information extraction is created in this background siltation.Attribute is the characteristic for each different things,and it is also the key point to understanding the information.Attribute extraction has the important practical meanings and widely application future,this has already been a hotspot in the research field of information extraction,which attracts a large number of scholars to carry out extensively and in-depth research.Although some progress has been made,many problems still inevitably existed in current property extraction methods,such as the current extraction method requires higher skills and abilities,lower portability,lower recall rates and lower extraction efficiency,.On the one hand,the existing text information extraction has a rule-based approach and a statistical-based approach.The rule-based approach does not need to use a large number of annotated corpus,which will reduce the number of work,however the accuracy and recall rates of this information extraction method entirely depends on the rules,which requires the wealth of experience and the priori knowledge.The statistical-Based method need to study a large number of corpus,and then develop an information extraction strategy.As a result,the accuracy and the recall rates of this information extraction method relies on the richness of the corpus and the scope of the corpus information.On the other hand,the keyword of the attribute is the important characteristic in the attribute description,which is the necessary condition for making rules.However Chinese expression is flexible,it can describe the same attribute by using multiple words,and the attribute of the words exists difference literal points caused by scattered attribute values.Existing keyword extension algorithms uses the principle of extending keywords based on existing word-base,which rely on the completeness of the used word-base,this lacks the efficient usage of existing corpus information.In this paper,we focus on the research on above issues between the keyword expansion method and information extraction method.First of all,downloaded the information with the help of open-source reptiles such as Baidu,Encyclopedia,Wikipedia,interactive encyclopedia.However,there are many categories of encyclopedic links in this field,and a wealth of attribute information is included in the category entries.The research of attribute keyword expansion based on network encyclopedia has great significance to attribute extraction.Secondly,this paper proposes an extension algorithm of keywords attribute based on the existing word-base,which combines the word-related degree in the existing corpus data,and realizes the attribute keyword expansion for the FMCG category.Then,a semiautomatic self-learning information extraction method is proposed based on this keyword extension algorithm,which can reduce the workload of information extraction while guaranteed the accuracy and recall rates.Finally,the effectiveness of the method can be verified by the experimental method.Finally,to achieve the purpose for accurate and efficient extraction which based on encyclopedia data and the FMCG attribute information.
Keywords/Search Tags:Information Extraction, Attribute Keyword, Synonym Dictionary, Pattern Matching
PDF Full Text Request
Related items