Font Size: a A A

Design And Implementation Of Topic Tracking System Based On Frontier Hot Technology

Posted on:2021-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhuFull Text:PDF
GTID:2518306575453884Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the continuous popularization of 5G,for people,obtaining information through the network has gradually become a main way.Frontier hot technologies mainly refer to major technologies that are forward-looking,pioneering,and exploratory in the high-tech field.They are an important foundation for future high-tech upgrades and the development of emerging industries,and are a comprehensive manifestation of the country's technological innovation capabilities.For a company or a department,it is very important to find the frontier hot topics that suit its own,and to obtain their development status and trends in this massive and disorderly network information.Therefore,it is of great value and significance to research and design a topic tracking system for frontier hot technologies.The core idea of topic generation is to merge similar text information together.The traditional method is to form topics by calculating text similarity and text clustering.However,if the amount of data is too large,the vector dimension will often be too large,the calculation is difficult,and it does not conform to reality.Happening.For the traditional topic generation model LDA,the generated results are prone to the problem of greater keyword similarity between topics.In the the frontier hot technology topic tracking system,the raw data types obtained are news,blogs,and papers.Their title information can represent the central idea of the article they are in.Therefore,this article introduces the concept of information entropy,which represents an event.For the latter's influence,a text clustering algorithm based on the basic headings combined with K-means and Canopy is proposed.On the basis of this clustering,the LDA topic model based on information entropy is used for topic generation,which further improves topic generation.Accuracy.In terms of topic tracking,most of the traditional methods are based on the previous data to predict the expected result value,but this method is not suitable here.In response to this situation,this article proposes a hot topic evaluation model to achieve the popularity of the topic Evaluation and topic tracking based on text similarity algorithm.In the word segmentation module,by improving the stuttering word segmentation model,the word segmentation test and entity extraction functions are designed and completed,allowing users to better understand frontier hot topics.On the basis of this research,we designed and completed the frontier hot technology topic tracking system,which mainly includes: web crawler,information display,log record,text word segmentation,text clustering,topic generation and tracking functions.Through the comparison test between the traditional topic generation model and the entropy-based LDA topic generation model,the improved topic model has an accuracy rate of 80% in topic generation and tracking,which has been able to meet the company's basic requirements.
Keywords/Search Tags:Web Crawler, Text Clustering, Topic Generation, Topic Tracking
PDF Full Text Request
Related items