Font Size: a A A

Knowledge-based Access To Web Pages

Posted on:2016-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:J W LiuFull Text:PDF
GTID:2348330536987037Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Web page on knowledge acquisition is the mass of the page,carried the theme of the excavation,which is converted into the machine's internal structure of the data.This transformation process is to make the computer a transformation process from outside to inside,the image of this process is called knowledge form or called knowledge extraction.The acquired data is converted to the knowledge which can be shared,and for us to be able to retrieve and use of a process.But extracting the main content of the process involved in the text,and conversion to RDF sentences are a few of today's larger difficulty,this paper analyzes research on these two difficulties.But extracting the main content of the process involved in the text,and conversion to RDF sentence,they are a few of today's larger difficulty.The traditional extraction technology topic sentence does not take into account the semantic association behind the text,for example,there are two sentences are as follows: "Jobs has left us" "Apple prices will drop or not" ? Using traditional information extraction technology to extract these two sentences,the two sentences would not be matter,you can not get an accurate key sentence,which is one of the issues.In addition,the current theme---LDA model calculation step when the document does not consider the length of the document or the length of words,how much of the problem paragraphs,points directl y to the theme to the document,but it is possible that only a certain period,or certain paragraphs relating to,an error will be small theme given to major themes,affect the accuracy of the final topic.When finally,LDA topics distributed computing model between lexical items---theme,ignoring the relationship between words,ignoring the possibility might have the same meanings continuous lexical items,the final extraction theme.It is bound to be inaccurate.Therefore,this paper presents an improved model to extract web page LDA topic sentences for the above points,the first point is why we should choose LDA model of reason,after the two is the innovation of this paper.After extracting the key to a topic sentence,according to the rules will be the key to the success of sentences from active object phrase formats.So the next thing to deal with is the function of verb-object phrases semantic role labeling classification problem,we first get a glossary with statistical methods,the glossary is related fields more professional knowledge.Then use segmentation POS tagging,parsing it into a tree and then processed into a syntax tree,and finally according to the maximum entropy classifier trained using artificial data in advance,the sentence is converted to RDF triples resources,properties,property values group,finally filling the knowledge base.Experimental results show that this model and traditional knowledge extraction techniques and contrast and LDA model,this model has a deeper excavation,digging content higher accuracy rate,strong adaptability to new samples.
Keywords/Search Tags:Knowledge Extraction, Semantic association, Information extraction, Named Entities, Excavation
PDF Full Text Request
Related items