Font Size: a A A

Research On Domain-Oriented Ir System Architechture And Related Techniques

Posted on:2011-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:H Y HuangFull Text:PDF
GTID:2178330338979955Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
For domain-based information retrieval system, precision is usually the most important aspect, cause present general IR systems do not perform well on highly domain-oriented information, some even cannot be used at all, and thus, systems for specific domains are needed to be built. Two main aspects should be considered while building a new information system:1. Design the system architecture according to the real situation;2. Design all the functional points according to the domain. Designing and implementing a domain-oriented information retrieval system is the task of this paper, and the two points above is just the key research of this paper.1. system architecture is always a key part of computer application research. The target of an application is not to show some theory values but to make daily life more easier, and more efficient and more automatic. And system architecture is aiming on how to make the system suit for the specific domain and how to go on well with the applicant situation and guaranteeing the system deployment. Based on general information retrieval system architecture, this paper presents a domain-oriented information system architecture. This architecture achieves requirements on in-time response, query throughput and system failure recovery, it helps the system work well on daily affair handling.2. what a system can do affects its quality. Based on the basic functional points of a general IR system, this paper designs functional components of our system according to the characteristics of the domain. These components involve many text mining techniques, and take in enough domain knowledge, making themselves very effective and efficient on domain information handling. These components are word segment, keywords extractor, cluster, query extension, auto-summarization. These components are designed and implemented by fixing some points to suit for domain on both algorithm and technique. Where, the precision and recall of word segment are 97.10% and 98.99%; precision and recall of keywords extraction are 93.36% and 95.06%; Keywords cover rate of the auto-summarization achieves 94.56%.3. when implementing the system, popular J2EE light framework is used, and this makes every parts more clearly, and makes the system simple and with high maintainability.
Keywords/Search Tags:system architecture, J2EE architecture, information retrieval, domain knowledge mining
PDF Full Text Request
Related items