Research And Implementation Of Retrieval Technology In Web Medical Consultation Data

Posted on:2016-07-31

Degree:Master

Type:Thesis

Country:China

Candidate:C W Guo

Full Text:PDF

GTID:2298330452966407

Subject:Computer Science and Technology

Abstract/Summary:

Web has become the worldâ€™s largest information data source, how to retrieveuserâ€™s interested information from the massive Web data has become one of the hotissues of growing attention from both academia and industry. Healthcare has alwaysbeen a hot topic, and is closely related to peopleâ€™s life, searching information aboutdiseases and consulting on the Internet has becomes more and more frequent.Research of querying and retrieval technology of large-scale medical consultationdata has important application value and practical significance.Text retrieval methods of information retrieval field were studied in this paper,especially for the vector space model and TFIDF algorithm. Vector space modelrepresents the text as a vector and uses cosine similarity between vectors as similaritybetween texts, which is simple and intuitive. TFIDF algorithm is a classic algorithmthat calculates the weight of terms in texts. The algorithm is simple and effective butit uniformly calculates the weight of terms according to its frequencies in the text andtext set. However, Web medical consultation data has its own distinct semanticsegment structural characteristics, and part of terms in consultation texts have aprofessional field meaning, not suitable for vector space model and TFIDF algorithm.Aiming at the above-mentioned problems, firstly a segmented vector space model wasproposed, and then it was applied to retrieval of Web medical consultation data,finally analysis was carried on Web medical consultation data from different views.The major work of this thesis is summarized as follows:(1) We studied the vector space model and TFIDF algorithm in text retrieval areaand proposed a segmented vector space model. Firstly, we represent the text as asegmented vector in accordance with its own semantic segment structuralcharacteristics. Then, component value and similarity of each part of the segmentedvectors can be computed respectively in different ways. Finally, the weightedsimilarity of each segment is calculated as a general similarity between two vectors. Segmented vector space model has flexibility in representation so as to improve theaccuracy of retrieval results and time efficiency and space efficiency.(2) We studied the characteristics of Web medical consultation data and didpreprocessing word on Web medical consultation data. Then we applied the SVSMmodel to retrieval of Web medical consultation data. The experiment resultsconducted on real world Web medical consultation data sets indicate that thepresented strategy can improve the precision of retrieval results efficiently.(3) Analysis of Web medical consultation data was carried out from differentperspectives. One perspective is from userâ€™s basic information, which includes theanalysis of userâ€™s gender, age and region, aiming at exploring the distribution ofgender, age and region about diseases, using SAP HANA in-memory database asanalysis tools. Another perspective is from userâ€™s description of the disease, whichincludes the analysis of symptoms and drugs, aiming at exploring the relatedsymptoms and drugs about diseases, using statistical analysis methods.

Keywords/Search Tags:

Text Retrieval, TFIDF Algorithm, Segmented Vector Space Vector, Web Medical Consultation Data, Preprocessing, SAP HANA In-Memory Database

Related items

1	Improvement And Application To Weighting Terms Based On Text Classification
2	Research On Data Mining Technologies Applied To Web Chinese Text
3	Research On Core Technologies Of Full Text Retrieval In DM DBMS
4	Study On Chinese Text Classification Algorithm Based On Rough Set And It's Application
5	Application And Reasearch Based On Sap Hana In-memory Computing Database Technology
6	Research On Feature Selection Of Text Classification
7	The Research And Implementation Of Chinese Text Categorization
8	Research On Classification Of Chinese Documents Based On Vector Space Model
9	Design And Implementation Of Large Scale Data Analysis System Based On Hana Sap Memory Computing
10	Research And Improvement Of Automatic Text Classification Algorithm Based On The Vector Space Model