Font Size: a A A

Knowledge Discovery Model And Empirical Study On Social Media Textual Data

Posted on:2017-05-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L HuangFull Text:PDF
GTID:1364330548494123Subject:Medical informatics
Abstract/Summary:PDF Full Text Request
At present,with the concept of big data proposed and the advent of the era of big data,social media,with its characteristics of data transmission speed,wide application range,and quick update frequency,has become an important part of data warehouse in the era of big data.Social media contains a large amount of data,with high value and has the features of complex and in the form of variety,which lays a solid foundation for data mining and knowledge discovery,has attracted full attention of many researchers in the field of mathematics,computer,library and information science.For the huge amount of social media textual data generated by the user,the number of social media,we could do the following processing:data collection,cleaning and structuring,and then to do information analysis and data mining,to explore the research hotspot,research frontier and the research trend in the field,and to find out the special case in a specific field and to reveals the correlation of events,and so on...So as to provide new information,new clues and new knowledge to scientific research and practical application,all of above has become a certain practical significance to the work for the present.DIKW(Data-Information-Knowledge-Wisdom)system presents the procedure from data to information,to knowledge and final to intelligence.Therefore,we can out a general model based on the data-information-knowledge conversion process of DIKW system,which could provide guidance for users in social media to do the research on knowledge discovery of social media textual data.Besides,the syntactic structure and semantic relations of sentence is the key problem in text content analysis,whether we can correctly identify and extract the entity relationship from text of social media is the important premise of the realization of implicit knowledge discovery.Most of the traditional entity relation extraction only consider the lexical information,without considering the influence of the semantic information of entity semantic relations,and few study put the word order of entity into the extraction of entity semantic relations.Therefore,this study will propose inference rules which based on theory of syntactic analysis,with high resolution capability and comprehensively consider the effect on the word order of entity to realize the extraction of entity semantic relations.In addition,applying knowledge discovery theory model of social media textual data to specific analysis of social media textual data to find implicit knowledge discovery,could verify the feasibility of the proposed knowledge discovery model,at the same time can also shows that social media text data knowledge discovery model would help to achieve the discovery of implicit knowledge in large-scale text data resources.So this study will build knowledge discovery model of social media textual data,and then applied to the knowledge discovery research on virtual health community data.In view of this,this paper summarizes the related research results domestic and overseas,on account of the problem of non-standardization of social media textual data,analyzing the problems and difficulties may encounter in the process of data mining,under the guidance of the theory of linguistics,information organization,computer science,put forward social media resources named entity recognition strategy,entity semantic relation extraction strategy and event detection strategy,and on the basis of this eventually form a relatively complete social media data mining and knowledge discovery strategy,so as to guide the analysis and solve of the knowledge discovery problem in social media textual data.In addition,social media textual data has many characteristics,such as multi-source,a variety of forms,large amount of data,wide application and complexity of knowledge,etc.These features increase the difficulty of semantic analysis and semantic description after the data extraction of social media textual data,which makes it hard for domain users to do semantic analysis and sematic description after information extraction,and makes it hard for domain users to discover new knowledge.Given that there is no widely accepted data mining method and knowledge discovery model,this study using DIKW system as a theoretical guidance for the research of domain knowledge discovery of social media textual data,under the enlightenment of the DIKW system from the data to the transformation to wisdom,to build knowledge discovery model of social media textual data,and establish data extraction and semantic tagging rules,to realize semantic analysis automatically,and improve the efficiency of data semantic annotation and semantic description of data,and verify the scientific nature and effectiveness of the proposed model by data from virtual community health.The main content of this paper include:(1)Study on social media text data knowledge discovery strategyWe summarizes the difficulty of data mining and knowledge discovery of social media data,on account of textualized characteristics of social media data,and the features of colloquial concept description,the liberalization of the expression of relationship,event showed blurred and the covert of knowledge in text,under the guidance of the theory in linguistics,information organization,and computer science,and etc.,we drew up the social media resource named entity recognition strategy,entity semantic relation extraction strategy and event detection strategy.And on the basis of these eventually form a relatively complete social media data mining and knowledge discovery strategy,so as to guide the analysis and solve of the knowledge discovery problem of social media textual data.(2)Building knowledge discovery model of social media textual dataUnder the guidance of knowledge discovery strategy of social media textual data,based on DIKW system,we constructed an outline model of data mining and knowledge discovery of social media textual data,detailed the model layers which include natural language processing layer,semantic analysis layer,relationship extraction layer and event detection layer,and then described the functions of each layer in detail.(3)Operation mechanism of subsystem based on knowledge discovery model of social media text dataOn the basis of the proposed knowledge discovery outline model and detailed model of social media textual data,complete the construction of the various modules of knowledge discovery subsystem of social media textual data,and expounds the different function of each module in the model and the correlation between each other in detail.From several aspects,such as the external birth condition/demand mechanism of subsystem model,the semantic mapping mechanism,rule-based reasoning mechanism and event detection feedback mechanism within subsystem of social media text data knowledge discovery to have a discussion of the operation mechanism of knowledge discovery model.Each mechanism within their respective module by module elements interact with each other to realize their respective functions,various modules together form a social media knowledge discovery model,a variety of operating mechanism work together,to complete the social media knowledge discovery task together.(4)Empirical study on social media textual data knowledge discovery modelIn this study,we chose data from a virtual health community named MedHelp as research object,applying the proposed knowledge discovery model of social media textual data to mine adverse drug effect from text generated by users in community.Applying database technology and Java programming technology to collect free textual data from virtual health community to build local text library,with the help of medical domain ontology to do semantic mapping between the health related concepts from free text data of virtual health community and the domain ontology,through mining the potential adverse drug reactions from virtual health community data,to verify the he operability of the theory model put forward.Adverse drug reactions information found by data mining will provide to domain expert to do verification,finally provided to domain users.The significance of this study is:(1)This study using free-text data from social media as research object,is different from the study on traditional structured data,and also differs from study on scientific research literature and institutional repository,this study is the supplement to the research on knowledge discovery formal discipline research data.(2)Building knowledge discovery model of social media textual data based on DIKW theory,could provide a good environment for data analysis of the discovery of valuable information from social media.Ming adverse drug reactions from the virtual health community data could provide a reference value to the monitoring of adverse drug reactions in China,which will contribute to the drug safety and disease control and prevention,and also could provide validation to practical clinical from the data direction.(3)Putting forward a method of formulating inference rules to realize data extraction,semantic analysis and semantic interconnection and knowledge discovery aiming at the virtual health community data which rendering showed in the form of free text,this will help to promote the research on data integration and knowledge discovery theory and method in the field of medical informatics and information science.
Keywords/Search Tags:Knowledge Discovery Strategy, Social Media, Knowledge Discovery, Data Mining, Knowledge Discovery Model
PDF Full Text Request
Related items