Font Size: a A A

Research And Implementation Of "Yilan"Intelligent News Client

Posted on:2016-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:G S LvFull Text:PDF
GTID:2298330467993037Subject:Mechanical Manufacturing and Automation
Abstract/Summary:PDF Full Text Request
Online news system began to appear since the1990s. And it is gradually replacing the traditional media with the rapid development of Internet. Browsing real-time news on the web page has become an important means of people getting information. However, in this information explosion age, the problem of information overload gradually worsened. It is wasting a lot of time and effort allowing users to manually find the information themselves. More and more users want to get pleasant reading experience by quickly and accurately obtain information on their interest. The new user needs with big news data, led to further develop and strengthen of news systems, and bring higher requirements to news system developers.With the user experience designer’s help, I design, research and development of a complete news client system. The news client product contains innovative function of massive news aggregation and news speeching, and conventional functin of news reading, collecting, sharing and subscription. This paper will detail the design, development, testing details, and related research of function realization about web search, machine learning, text deduplication, WebService technology.The System can efficiently, automated news aggregate major site resources available to users, collect tens of thousands of real-time news every day and provide a pleasant reading experience. In the process of obtaining the structured data in the news, this paper presents a noise eliminating algorithm based on KNN, and it got a relatively high accuracy96%when applied to the news data acquisiotion. Test results and user feedback indicates that the client has a better product functionality and stability.
Keywords/Search Tags:web crawler, full-text index, machine learning, webpage noise elimination, text similarity
PDF Full Text Request
Related items