Font Size: a A A

Part-of-speech Effect And Affect In Search That In Chinese Literature Of Science And Technology

Posted on:2009-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:B B ChengFull Text:PDF
GTID:2178360272988285Subject:Information Science
Abstract/Summary:PDF Full Text Request
Part-of-speech(POS) tagging is a natural language processing lexical analysis of more mature technology,and natural language processing occupies a decisive role in Information Retrieval.POS for foreign-language literature has certain information re- trieval research,it shows a certain extent,but little impact.The purpose of this thesis is to study part of speech for Chinese science and technology literature search,trying to use the survey data on the impact and role of size.The whole course of the study,to achieve the Animal Husbandry and Veterinary corpus and the establishment of working vocabulary.POS tagging process is the use of the Chinese Academy of Sciences Institute of Computing Technology developed the multi-storey Hidden Markov Model Based on the Chinese lexical analysis system ICTCLAS,Nanjing Agricultural University graduate Chengchong designed system of unknown words function as well as self which named CARMM and The Animal Husbandry and Veterinary vocable table together to achieve,POS tag-Chinese text is optional POS tagging tag set(Beijing university's edition).Using the two key words from a variety of ways and two retrieval models,including two from Search term form of reservations from the 14-dimensional POS retrieval term approach and artificial participation from Search term;retrieve a variety of models including traditional Boolean logic Retrieval Model,"part of the match" Boolean logic model and retrieval vector space model.The vector space model,in accordance with the threshold value has its own shortage of features,this study used two threshold 2%and 5%of the way,have a variety of survey data.According to survey data,obtained with the results of POS retrieval and non-POS retrieval.Evaluation results of a survey in four ways,are summarized in Table statistics (including the retrieval of each of the questions R,P and Ray,Pay four indicators of the survey results table),R,P broken line map,the difference histogram of R,P and R,P average margin table.According to final results of the survey,found that in the Recall Ratio, without the part of speech retrieval efficiency is higher than that of POS with the retrieval efficiency.In the Precision Ratio,in addition to "part of the match" Boolean logic of the search results show the non-POS access to the seizure of high-rate,other results show that with the POS is better than non-POS.Overall,with part of speech and did not reflect the retrieval of the superiority of how much.Moreover,the results of the survey from the POS option for the retrieval of the retrieval model is the final result of constraints as a factor.Overall results of this study can be attributed mainly to four areas.First,part of speech for the first time in Chinese literature search.Prior research has been on the part of speech information retrieval performance of the article,but based on non-Ch- inese literature studies,in this paper on the basis of the language of the sublimation,making part of speech for the retrieval of more comprehensive study,filled a part of speech Chinese literature search for gaps;Second,the literature of words and a search was part of speech the word 14-dimensional reduced-order processing and improve the efficiency of retrieval;Third,the design can be used to retrieve the part of speech "part of the match Boolean logic model ";Fourth,the use of survey data retrieval on the part of speech that the Chinese document retrieval degree of impact.
Keywords/Search Tags:part-of-speech tagging, chinese document retrieval, natural language questions, retrieval evaluation
PDF Full Text Request
Related items