Font Size: a A A

Symptom Named Entity Recognition System In Web Text

Posted on:2020-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:P PengFull Text:PDF
GTID:2404330575492691Subject:Engineering
Abstract/Summary:PDF Full Text Request
There is a large amount of valuable clinical medical information on the Internet,and there are only dozens of professional medical websites.However,extracting medical knowledge from the network and then structuring and establishing medical knowledge maps has always been a very challenging task in the field of WEB mining.In previous studies,many scholars identified related terms such as diseases,symptoms,incentives,laboratory indicators,and treatment measures from the textual information of the web page by naming entity recognition technology,and established the relationship between various words.Since there are many proverbs in the expression of symptom entities,there is no standard and complete symptom pool.There has also been no good way to identify symptom entities.In order to solve this problem,this paper uses JAVA language to develop a symptom entity recognition system based on the rules of symptom entity composition.And designed corresponding strategies for specific problems in the process of symptom entity recognition.:(1)Identify and extract structured symptom information from professional medical websites.Strategy1 gives the flow of extracting symptom entities in the form of an anchor in a generic website.Strategy 2gives a functional way to resume the interruption of the crawling process to solve the program interruption problem.The system operation results showed that a total of 18,114 symptom entities were obtained.(2)Extract the part words and symptom words from the symptom list to provide basic data for the combination of the following part words + symptom words into symptoms.According to the characteristics that most of the symptoms are composed of part words + symptom words,algorithm 3 is designed.From the list of symptoms obtained in algorithms 1,2,1209 part words and 1111 symptom words are extracted.(3)Extract the part words and strong symptom words from the symptom list,and provide basic data for the combination of the subsequent part words + strong symptom words into symptoms.According to the characteristics that most of the symptoms are composed of part words + strong symptom words,strategy 3 is designed,and part words are extracted from 18114 symptom entities that have been acquired in strategy 1 and strategy 2.After the system was running,a total of 1209 part words and 1111 strong symptom words were obtained.(4)Extract the position words from the symptom list.Strategy 5 implements this functionality.A totalof 47 orientation words were extracted after the system was run.(5)Extract prefix modifiers and suffix modifiers from the symptom list.Strategy 6 implements this function.After the system is running,a total of 706 prefix modifiers and 320 suffix modifiers are extracted.(6)A method for identifying symptom-named entities based on symptom formation rules is proposed.Combining the symptoms of the formation constitutes the unit vocabulary,analyzes the formation pattern of the symptom entity,and proposes "participation word + symptom word","azimuth word + part word +symptom word","prefix + part word + symptom word","part word" + symptom word + suffix" and other symptoms of the entity constitute a pattern.On this basis,a symptom entity recognition method was developed.The combined new symptom entity was used as a keyword to obtain relevant Web text data.According to the rule,the number of generalized texts containing the symptom entity data was used to measure the rationality of the new symptoms.The standard has been designed with six methods to judge the rationality of new symptoms.Manually verify the rationality of the symptoms obtained after the system is running.(7)There are few prepositions in the symptom entity before the location of the word in the web text,and this finding can be applied to remove unreasonable symptoms.Method 7 implements this function.Most of the strong symptom words in the symptom entity have no nouns or pronouns after the position in the web text.Applying this discovery can remove unreasonable symptoms.Method 8 implements this function.Policy 9 implements this function by removing the contextual or symptom words in the symptom entity from the context-related data in the web text.Method 10 implements this function by removing the symptom entity from the web text as an incomplete symptom.Method 11 implements this function by removing the same source of data after removing the source portion of the website from the relevant data.Method 12 achieves this function by removing relevant data with a high degree of similarity.Based on the model of the other author's symptom triad,this paper proposes a five-tuple model of symptom composition,namely <prefix modifier,orientation word,part word,strong symptom word,suffix modifier>,and develops the corresponding system.Various tuple extractions were performed.This paper also proposes a rational strategy for judging new symptoms after combining various tuples into new symptoms.After the system is running,the irrational symptom vocabulary is approximately 97%.
Keywords/Search Tags:Web Text, Symptom, Entity recognition
PDF Full Text Request
Related items