Font Size: a A A

The Construction Of Chinese Lexical Analysis Platform

Posted on:2017-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YeFull Text:PDF
GTID:2348330488959917Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of modern information technology, the number of computers and the amount of information in the Internet are both increasing rapidly.Computer technology has been closely related to people's daily life. NLP (natural language processing) is an important research topic in the field of Computer Science, its research object is people's natural language. Through the research and processing on natural language, realize the normal communication and mutual understanding between natural language and computers.Lexical analysis is an important problem in the field of Machine Translation. Most of problems about Machine Translation are needed to do lexical analysis. At present, most of the research and implementation of lexical analysis are confined to the experimental level, but the Internet environment of mobile office and resource sharing determines that the usefulness of an online platform has more practical significance. Some practical lexical analysis platform have appeared in the Internet which can achieve the function of segmentation and part of speech tagging. However, due to the source code can not be modified and part of speech tagging set, it is unable to get support about future research work.Based on the research about word segmentation and part of speech tagging in the laboratory, this paper develops a platform on lexical analysis. This platform adds the module on the artificial intervention, the user can modify the segmentation results. The platform will start the thread periodically to scan the results modified and find the results contain the new words to expand the new dictionary. For each user, you can also apply a temporary dictionary and add the new words you need to the temporary dictionary to correct the results temporarily. Users can choose the original method of word segmentation when we submit a task. We can also use the new word dictionary and temporary dictionary word segmentation methods. Users can do the part of speech tagging on a sentence directly or the intervened results. The results of the correction of part of speech tagging also can provide the foundation for the improvement of the mark effect in the future.This paper makes up for the limitations of the statistical methods throuth the rule of putting small probability events into the platform. It also makes a better integration between the statistics and rules. The open interface is convenient for function calls. It also joins the module of rights management and the workload of statistics. The development of the background server uses a J2EE framework called SSM, the database uses MySQL and runs on Ubuntu.
Keywords/Search Tags:Lexical Analysis, Manual Intervention, New Word Dictionary, User Dictionary
PDF Full Text Request
Related items