Font Size: a A A

Behavioral Analysis Of Male Sex Workers And The Application Of Data Classification Technology

Posted on:2020-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:C M GuoFull Text:PDF
GTID:2370330590998240Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective: Money Boys are at high risk of HIV infection.Due to the lack of legitimacy,the MSM often attract public discrimination,and researches on this marginalized subgroup are extremely limited.Previous studies have focused on the prevalence of HIV,risk factors associated with sexually transmitted infections,and consistent condom use.With the rapid development of the Internet and the increasing popularity of intelligent terminals,online social network is a faster and more convenient channel.However,limited studies were conducted on MBs using the internet-based venue to seek sexual partners and the mobility pattern.The purpose of this study include:(1)understand the characteristics of MBs in finding sexual partners based on the internet-based venue sex seeking,and the mobility patterns.(2)the use of latent class analysis to identify high-risk MBs.(3)using data classification to predict the HIV status and SMOTE to solve class imbalance problems.Methods: This study was conducted from December 2014 to June 2015 in Tianjin.Convenient sampling was used to sample targeted population,finally,330 MBs were included in this study.Then questionnaire was completed and blood samples were collected for each participant,the basic statistical description was conducted using SAS9.4 statistical software,using SAS PROC LCA process for latent class analysis and data classification was conducted based on R language program.Hierarchical bootstrap method was adopted to carry out the sampling with replacement for the original data set,and the extracted data set was used to form the training set,while the unextracted data set was used to form the test set.Each algorithm adopted the training set and trifold cross validation to find the optimal parameters,and the test set was used to compare the classification results of the classifier(logistic regression,neural network,support vector machine,random forests and CART).The required data sets are generated by SMOTE technique,and the classification performance of each corresponding classification algorithm is compared in the newly synthesized data set.Results: Among the 330 MBs,38(11.52%)were laboratory confirmed HIV positive and 63(19.09%)had a history of sexually transmitted infections(STI).147(44.55%)had used the Internet-based venue to seek sexual partner.The first three types of intercourse were anal(99.39%),masturbation(86.39%)and oral sex(83.23%).The results also showed that male sex workers who use internet-based venue to seek sexual partner are more likely to be local,having higher monthly income,part-time employed,having high-risk sexual behaviors(such as more anal sex,anal kissing,finger sex),use of sex-aids,and sexually transmitted diseases history.The HIV infection rates among male sex workers who used the internet-based venue and those who did not were 12.93% and 10.38%,respectively,but the difference between the two groups was not statistically significant.In this study,MBs were mainly from northern China,such as Liaoning,Jilin,Heilongjiang and Shandong provinces in northeast China.MBs included in the study have strong mobility,and Tianjin,Beijing and Shanghai are the top three cities with the largest inflow of MBs.Of the surveyed MBs,257(77.9%)had visited two or more places in the past six months.Further studies on the mobility patterns of MBs showed that those who had visited two destinations and had sex in the past six months were more likely to be non-local(99.1%),monthly income less than 8,000/month(88.2%),have never been previously tested for HIV(51.8%),and have less knowledge of free antiviral treatment policies(59.1%).Meanwhile,MBs who have been to 3 or more destinations in the past 6 months are more likely to having been engaged in sex trade for more than 12 months(69.4%),full-time employed(88.4%),having more than 16(46.3%)sexual partners,having more than 16(55.1%)anal intercourse,having tested for HIV(76.9%),and known the relevant policies of free antiviral treatment(60.5%).According to the latent class analysis,MBs can be divided into four subgroups,namely,“relatively safe behavior” group,“higher sexual risk” group,“multiple sexual-partners” group and “unprotected sex and substance abuse”.The differences in HIV infection rate among the four subgroups were statistically significant.At the same time,the study found statistically significant differences in residence,monthly income,employment status,knowledge of HIV testing and free antiviral policy.Moreover,the probability of HIV infection in the " higher sexual risk group" was 4.06 times higher than that in the "multiple sexual partners group"(1.31-12.59).The data classification results showed that the running time of Logistic regression,CART,SVM,random forest and neural network increased accordingly.In the original data set,neural network performs best in AUC,support vector machine performs best in F1,and random forest performs best in G-mean.In the newly generated data set,support vector machine performs best in AUC,random forest performs best in F1,and neural network performs best in G-mean.Compared with Logistic regression,other algorithms have different degrees of improvement in different data sets and different indicators,and the same algorithm has different degrees of improvement in the new data set compared with the original data set.The application of data classification algorithms significantly improved the classification performance of MBs.Furthermore,SMOTE can solve the problem of unbalanced data,when comparing the classification algorithms performance.logistic regression,random forests and support vector machine slightly ascending on the running speed and AUC.Meanwhile,logistic regression and neural network slightly ascending on G – mean.The five classification algorithms all increased on F1.Which show that SMOTE technique can increase the efficiency of the classifiers.Conclusion: The results showed that internet-based venue sex-seeking,strong mobility,high-risk sexual behavior,poor protection consciousness MBs are at highest risk of HIV infection,which indicates that special attention should be paid to this group and carry out targeted intervention.The data classification algorithm(logistics regression,neural network,support vector machine,random forest and CART)can accurately and reliably identify the HIV infection risk of MBs.And the SMOTE technique can solve the problem of imbalance data classification in a certain extent.
Keywords/Search Tags:Money Boys, Net-based venues sex seeking, Mobility, LCA, Data Mining Algorithm, Data synthesis technology
PDF Full Text Request
Related items