Font Size: a A A

Information Extraction Technology Based On Semantic Expansion

Posted on:2012-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2218330368482546Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the information century coming, computers have become more widely applied to various fields of human society, in particular, voice technology, the rapid development of spoken document retrieval technology, making it possible to quickly get the sources you want from a large number of spoken information, enable the people's daily lives become more and more convenient. Therefore, the query expansion technique and feature extraction technique into the platform of speech recognition technology system, thereby improving speech recognition rate is very useful.This paper based on the research of traditional feature extraction technology, selected three representative feature extraction techniques (statistics, the maximum a posteriori probability, inverse text word frequency), using the three feature extraction techniques for feature extraction training documents will be the extracted features as the basic features of the basic features on the right to re-adjust the secondary constructs a new hybrid features, greatly improve the speech recognition rate of the document, combined with the forward-backward algorithm, Lattice files posterior probability information in the text of the docurnent with the weight of probability information and effective integration into the speech recognition platform, to further enhance the spoken document retrieval results.Moreover, it is in order to better address existing problems, which is the practical application of the section of user input query, and to avoid the user due to lack of knowledge in specific areas or difficult to provide adequate information on the expression of queries caused by a complete search inefficiencies.This paper related on intelligent retrieval scientific articles,that applied to the query expansion technique, the proposed document frequency words, then based on the training text using document frequency characteristics of items in the expansion, through the intrinsic link between the document will be the theme of information that implies added to the query word list, thereby enriching the user's query request; as same time in order to further improve the performance of spoken document retrieval platform, the rules will be introduced to the Rocchio expansion of the word based on the most critical information, that related to expansion of technology (based on the expansion of word document frequency technology), and achieved very good retrieval results.But since Rocchio rules need a lot of experiments before they can determine the optimal parameters, and selecting a different set of training text optimization parameters are different, which change the text you need to re-training experiment to determine which no doubt to the query expansion of research have a very great difficulties. Therefore, this paper is based on the extension method, that is proposed based on the expansion of focus information technology, focus factor by introducing rules to replace any original Rocchio optimal parameters, the focus factor will be set as different and change the text, while can reflect the internal relations of a text document, making the query expansion technique is more universal. The experiment proved the final focus-based information technology to further enhance the expansion of the spoken platform for the performance of document retrieval.
Keywords/Search Tags:speech recognition, feature extraction, query words, focus factor
PDF Full Text Request
Related items