Font Size: a A A

Study On The Theory & Practice Of Automatic Indexing Of WWW Science And Technology Information Resources

Posted on:2002-10-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:M XiaoFull Text:PDF
GTID:1118360032451208Subject:Library science
Abstract/Summary:PDF Full Text Request
Study on the Theory & Practice of AutomaticIndexing of WWW Science and TechnologyI nformat ion ResourcesBy XIAO MINGDirected by Professor SHEN YINGFollowing the rapid development and widespreading application of Internet, it has become the supreme huge information resource library and the main information communication channel of today The primary character of Internet information resources is that it is massive in quantity and out of order in quality, and thus a strange phenomenon , that is ich in data, poor in knowledge? is appeared Most of current search engineers are fulltext retrieval systems based on keywords, they seldom give consideration to the semantics issues of Internet information resources , and cannot meet the different demands of most Internet users , so many people find that it is difficult to use them to access a lot of valuable knowledge in Internet . The author makes study on the theory & practice of automatic indexing of WWW science and technology information resources in this doctoral dissertation .The following are the main purposes of this research works (1) to provide technical support for the process of Internet information resources ; (2) to provide convenience for common users to access Internet information resources ; (3) to provide new research ideal for the development of digital library of China.A designing scheme of Automatic Indexing system of WWW Science & Technology information resources (STAI) based on Chinese Classification & Subject Thesaurus is given in this dissertation . The STAI system has several functions, such as automatic detection and conversion of the file formats of web pages , automatic word segmentation and automatic keyword indexing , automatic classification indexing and automatic subject indexing. STAI is an easy-use software with a high automation degree . This experimental system can realize the function of automatic indexing of both Chinese web pages and English web pages in the same time , and it gives consideration to how to integrate with the respectively information retrieval advantages of both natural language and information retrieval language. During the procedure of system design and implementation of STAI ,the author has made innovative attempts in some areas , such as applying structured programming method ; designing and applying ActiveX controls in order to improve the reuse and portability of programming codes ; introducing the new concept of 揷lass phrase?firstly in thisiidissertation , and designing some mapping tables that integrating natural language tightly with information retrieval language, such as 揷lass phrase and subject mapping table ?and 搒ubject and class numbers mapping table ?, which are used in the implementation of automatic classification indexing and automatic subject indexing The aboving research works have laid a good foundation for developing the software product of STAI in the future , which has self-owned intellectual property right.In addition , the author makes a systematic study on the theory and method of document automatic indexing , and a lot of reference materials are cited in this dissertation , they are very useful for the researchers in related research fields...
Keywords/Search Tags:automatic classification, automatic indexing, automatic word segmentation, automatic keyword indexing, automatic subject indexing
PDF Full Text Request
Related items