Font Size: a A A

Study And Implementation Of Attribute Discovery Oriented Collaborative And Iterative Search System

Posted on:2015-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z B ZhengFull Text:PDF
GTID:2348330509460907Subject:Software engineering
Abstract/Summary:PDF Full Text Request
When using the search engine to look for information, it usually returns huge amounts of results, and users need to open some result pages to get useful information that they need. Generally, entities and their attribute information extracted from the title or the text of the webpage can well indicates the content information of this webpage. Sometimes, those entities and their attribute information may be the exact answer to what the users are searching for, like, when you are searching a famous person's birthplace, his spouse's name, or in the scenario that when you are looking for the price, manufacturers of a certain product. In addition, during certain search tasks such as making travel plans, you may need to search iteratively and assign the task to other collaborators to work out the plan together. On the basis of the above, we developed an attribute discovery oriented collaborative and iterative search system, our main work includes:1. We propose an attribute discovery oriented collaborative and iterative search system model. The model is constructed by several function modules such as web crawling, information processing, information storage, iterative searching, and multi-user collaboration. Using the results returned by search engines, we extract the title, text and other information of every webpage from the search result list, then pretreat the text using Chinese word segmentation tools. After that, the entities and their attribute information will be extracted and sorted, which will be displayed in the UI of the system, from which users can pick up some entities as useful information; users can set up search task, thus the picked up entities can take as clues and stored to the search task; those clues will be used for the generation of new search keyword in iterative searching through keyword weight calculation; each task can be fulfilled by multiple collaborators working together.2. We provide an implementation of the model. We use CAS NLPIR Chinese word segmentation tool and Stanford Named Entity Recognizer to implement the word segmentation and named entity recognition in text processing. We use Redis as our database, as it can easily handle the request of frequently read and write data, process large amounts of data generated by collaborators when working together; We design how the data was stored in our system concerning the data structural features of Redis like key-value pairs, hashes, sorted sets and so on. At the end, we also design and implement the UI of the system using Java SWT.3. We ran some tests on our system, including functional test to verify that the system functions properly, and accuracy test to evaluate the correctness of the extracted data. We analyzed the results of the experiment, pointed out the deficiencies and room for improvement of the system.
Keywords/Search Tags:Attribute Discovery, Iterative Search, Collaborative Search, Named Entity, Redis
PDF Full Text Request
Related items