Font Size: a A A

Research And Implementation Of Hot Medical News System Based On Big Data

Posted on:2019-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2428330545959289Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Since the era of web2.0,Internet medical information has been growing exponentially,and traditional technology has been unable to meet the performance requirements of acquiring and storing massive unstructured data.Based on this phenomenon,this paper studies the technology of data and public opinion information processing technology,has realized a hot medical news system based on big data environment,for the vast number of users more quickly and more accurately access to the most popular health information.This article mainly completes the following work:1.Built the distributed incremental Nutch crawler in the big data environment,and used Nutch to crawl the news data of famous medical websites in China in real time;Using the text parsing algorithm based on tag attributes to extract data from crawling;The Chinese word segmentation algorithm of double-word hash mechanism is used to deal with the text.2.Using the tf-idf algorithm to extract keywords from the segmentation results,extract the key words of the tf-idf value ranking in the top 16 of the word set,and construct the 16-dimensional vector space model of the text.3.Improved the traditional single-pass clustering algorithm;In this paper,a method for calculating the weight of key words is introduced to improve the accuracy of the traditional single-pass algorithm clustering.Using the idea of clustering center,the clustering center is used to replace the news collection of the theme,and the clustering efficiency of the clustering news is improved.The time function model is adopted to keep track of the heat of news and improve the accuracy of finding hot news.4.Based on the research and improvement of the above algorithm,based on SSH in the web framework with large data in HBase unstructured database and the model of graphs,we implemented a hot medical news system based on big data environment.
Keywords/Search Tags:Internet healthcare, Big data, Nutch crawler, Single-Pass clustering
PDF Full Text Request
Related items