Font Size: a A A

Compound Analysis And Its Application In IR

Posted on:2009-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:C X ChenFull Text:PDF
GTID:2178360242976736Subject:Computer Software and Theory
Abstract/Summary:PDF Full Text Request
Nominal compounds phrase (compound noun) is an important part of the natural language processing, and plays a more and more important role in today's computer language. The research of the compound noun has a wide application in many actual fields, such as information retrieve, machine translation, text classification and so on.The number of the way of extraction and recognition of the compound noun is generally three: based on statistics, based on rules and based on both the two. The method based on statistics is highly depended on the corpus, and often make errors on the low-frequency compound nouns and high-frequency words which are not compound nouns. The method which is based on the rules always needs to create artificial rules, but the rules in the language usually are hard to be discovered. And if only depending on this method, method can't dynamic update the rules. And when the combination of the two methods which are mentioned above is related to the specific language rules, it is limited to a language and depends on the available resources, which makes the limited using of this method. In this paper, this paper makes a certain exploration on the extraction and recognition of the compound noun, creatively use maximum entropy (ME) model into the extraction and recognition of the compound noun, and achieved certain results.This paper begins with the recognition of the compound noun, and explorer the possibility of the extraction and recognition of the compound noun. Then this paper uses ME model to judge the candidate term. The ME model has already thought about the frequent information of the words which are the component of the compound noun, and also contains the content POS, the number that the compound noun appears, the length of the compound noun and so on. This paper creatively uses the web as a large corpus to mine the information of the compound noun. The paper also creatively uses the analysis into the web information retrieve, which provides a possible direction of the intelligent Search.This paper makes three experiments. First is making the POS of the nominalization verb. It insinuates the POS of the nominalization verb which has a special sense to the extraction and recognition of the compound noun is different from the common POS. Second is using ME and web information to judge the candidate term of the compound noun, and proves the compound analysis which is using web mining is possible. Third is the exploration of using the compound analysis into the information retrieve.
Keywords/Search Tags:verb nominalization, compound, maximum entropy (ME) model, corpus, information retrieve (IR)
PDF Full Text Request
Related items