Font Size: a A A

The Research And Realization Of Screening System Orienting To The Sensitive Message About Web

Posted on:2017-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuanFull Text:PDF
GTID:2348330512458002Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the daily life, Internet has been widely used in our life.We use it to connect with our friends, play games, write something and so on. We already cannot live without Internet. A lot of information is storing in the Internet. We could collect these information and use it for our life. This kind of usage is promoting our work and life.But the disadvantage for it is that the volume is so large. If we want to get useful message in it, we need a technology to fetch the useful things for us. This is our aim for research. Recently, the technology has already got the skill for fetching. But we want to improve the function that it can more precise. So in this text, we combine the Web spider with indexing to do twice screening and design a system to operate more easily.Firstly, we snatch the Web based on topic. The topic is setting by which our office apartment needs. After that, the module is analyzing and filtering the data. This is the first screening among the file set. This file set is remaining in the local. We can read these as the transferring format. The second screening is done by lucene which is a indexing tool. Through the twice screening, the result is more precise. And we analyze these result and make a decision about what to do next.We use two main tools for forming our system, which are nutch and lucene. The language of this system is written by Java. After snatching data from Web, the data is storing in the local. So the system is screening again to the data. At the same time, we intercept the related word about the sensitive topic word. This function is providing more information for officers to judge. This is our destination for achieving the system. It brings more convenient for us and make us more decisive for intelligence.In the future, we will further improve this system. And we will do our best to contribute our country when we do our jobs.
Keywords/Search Tags:Web information extraction, searching engine, Web spider, nutch, lucene
PDF Full Text Request
Related items