Font Size: a A A

Research And Implementation Of Retrieval Technology In Web Medical Consultation Data

Posted on:2016-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:C W GuoFull Text:PDF
GTID:2298330452966407Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Web has become the world’s largest information data source, how to retrieveuser’s interested information from the massive Web data has become one of the hotissues of growing attention from both academia and industry. Healthcare has alwaysbeen a hot topic, and is closely related to people’s life, searching information aboutdiseases and consulting on the Internet has becomes more and more frequent.Research of querying and retrieval technology of large-scale medical consultationdata has important application value and practical significance.Text retrieval methods of information retrieval field were studied in this paper,especially for the vector space model and TFIDF algorithm. Vector space modelrepresents the text as a vector and uses cosine similarity between vectors as similaritybetween texts, which is simple and intuitive. TFIDF algorithm is a classic algorithmthat calculates the weight of terms in texts. The algorithm is simple and effective butit uniformly calculates the weight of terms according to its frequencies in the text andtext set. However, Web medical consultation data has its own distinct semanticsegment structural characteristics, and part of terms in consultation texts have aprofessional field meaning, not suitable for vector space model and TFIDF algorithm.Aiming at the above-mentioned problems, firstly a segmented vector space model wasproposed, and then it was applied to retrieval of Web medical consultation data,finally analysis was carried on Web medical consultation data from different views.The major work of this thesis is summarized as follows:(1) We studied the vector space model and TFIDF algorithm in text retrieval areaand proposed a segmented vector space model. Firstly, we represent the text as asegmented vector in accordance with its own semantic segment structuralcharacteristics. Then, component value and similarity of each part of the segmentedvectors can be computed respectively in different ways. Finally, the weightedsimilarity of each segment is calculated as a general similarity between two vectors. Segmented vector space model has flexibility in representation so as to improve theaccuracy of retrieval results and time efficiency and space efficiency.(2) We studied the characteristics of Web medical consultation data and didpreprocessing word on Web medical consultation data. Then we applied the SVSMmodel to retrieval of Web medical consultation data. The experiment resultsconducted on real world Web medical consultation data sets indicate that thepresented strategy can improve the precision of retrieval results efficiently.(3) Analysis of Web medical consultation data was carried out from differentperspectives. One perspective is from user’s basic information, which includes theanalysis of user’s gender, age and region, aiming at exploring the distribution ofgender, age and region about diseases, using SAP HANA in-memory database asanalysis tools. Another perspective is from user’s description of the disease, whichincludes the analysis of symptoms and drugs, aiming at exploring the relatedsymptoms and drugs about diseases, using statistical analysis methods.
Keywords/Search Tags:Text Retrieval, TFIDF Algorithm, Segmented Vector Space Vector, Web Medical Consultation Data, Preprocessing, SAP HANA In-Memory Database
PDF Full Text Request
Related items