Font Size: a A A

Design And Implementation Of Blog Media Analysis System

Posted on:2019-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:R L ZhuFull Text:PDF
GTID:2428330548467235Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the network,Blogs have been used by more and more people to share information and discuss hot topics.by the end of the first half of 2015,the number of users reached 47,457 people.When faced with massive and complicated blog data,users need to find ways to get information that they are interested in and valuable,and they want to know what is the hot topic in the near period.Blog analysis,as a very necessary means of monitoring public opinion,is increasingly necessary to achieve.However,there are still some problems to be solved.For example,blogs are updated every day,how to make the analysis real-time,and the analysis of the blog language is the latest release of the current;Another example is how to make better use of blog data,be targeted,and more comprehensively analyze blogs.The main work of this article is:(1)Designed and implemented a blog media analysis system.The system is mainly divided into the previous pre-processing module and the later analysis module.The corpus preprocessing module needs to crawl and extract the blog corpora,First,analyze the blog language in detail,Analyze the data and attributes needed to implement the system function to facilitate the crawler crawling,After crawling,extract each attribute to be used in the Blog corpus,Indexing of extracted corpora,Index construction is the premise of retrieval,The later analysis module needs to be based on these corpus.The analysis module needs to retrieve the data of the previously built index and complete the corresponding analysis.Analysis functions include time search,keyword search,trend analysis,cluster analysis,and user analysis.After designing,determine the system framework and the technology used and implement it.(2)In the process of implementing the system,several unique designs and optimizations have also been added.For example,due to the massive nature of blog corpora,Distributed processing using distributed search engine ElasticSearch and the underlying search tool library Lucene is currently recognized as having a high performance,This improves the response speed of the system when the data volume is huge and strengthens the system's scalability.
Keywords/Search Tags:Blog media, Building index, ElasticSearch, Full text retrieval
PDF Full Text Request
Related items