Font Size: a A A

Research And Implementation Of Aggregation Analysis System Of Science And Technology Information

Posted on:2018-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2348330518495961Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the internet, more and more information and data increase with high speed. Valuable information is overwhelmed by redundant and useless information. When users want to find the interesting information, they will have to turn to the search engine, input the keywords and review the results page by page. That will be very distracted.And when users encounter an interesting or unfamiliar entity while reading text, they typically search and turn to web sites to obtain detailed information about it.In mobile terminals, especially mobile phone, users need to spend a lot of time on the operation,such as select the entity, submit a query to a search engine,which will also distract user's attention, and reduce their interest in reading.If there is a short section of text to analyze and explain the entity, that will be greatly helpful to enhance user's reading experience.There are some information aggregation systems based on RSS technology which bring great convenience to users to obtain the information. However, on the one hand not all sites support RSS specification to transfer information, on the other hand there aren't any aggregation systems to do the entity analysis in the field of science and technology information.This paper uses crawler and RSS technologies to aggregate the theme articles in various fields of science and technology information to help users to keep up with the latest technology and research progress.Moreover, in order to help users to analyze the valuable entities, this paper deeply studies the algorithms in each stage of Entity Linking based on the open source Entity Linking Systems such as Dexter,TAGME and WAT. And then this paper utilizes open source Chinese word segmenter and English Entity Linking methods to explore and realize the Chinese Entity Linking System. However, after we developed our system, we find that excessive linking of entities can be distracting and reduce the user experience especially the useless and unhelpful ones. Some methods have been proposed to solve this problem, but the traditional ones such as LinProb.SVM are either too complex to implement or too simple to get helpful results.This paper proposes a new method for evaluating the helpfulness of linking entities to users. The unuseful results should be pruned. Experimental results show that our method outperforms base pruning methods in science and technology information field.Based on the above research, this paper designs and implements an information aggregation systems and a Chinese-English Entity Linking system. The third chapter, in this paper, proves the correctness of our pruning algorithm and the necissity of using Chinese word segmenter in our system. And the fifth chapter not only proves that the proposed algorithm can meet the needs of the system but also proves that our system can provide users with access to science and technology information needs and help them to read more interested.
Keywords/Search Tags:science and technology information, aggregation, entity linking, machine learning
PDF Full Text Request
Related items