Font Size: a A A

Research And Implementation Of Intelligent Information Acquisition Function Based On Android Platform

Posted on:2020-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:A R ShenFull Text:PDF
GTID:2428330572972152Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of modern society and the maturity of Internet technology,our life is full of more and more information,which is carried by smart devices such as mobile phones and computers,and exists in the form of words and images.Among many image information,some important text information needs to be extracted to be further stored and used by people.Optical Character Recognition(OCR)extracts text areas from images,transforms the light and dark formed by text and background into two-dimensional digital images of black and white,and automatically inputs text images into text documents through feature extraction and template matching.After optical character recognition,the correct rate of text cannot be guaranteed to be 100%.At this time,the extracted text needs to be processed at the semantic level.In this paper,the text post-processing technology of intelligent information acquisition function is studied in depth.According to the existing N-gram language model,a bidirectional N-gram model is proposed,which combines the characteristics of words and their adjacent words.According to the characteristics of OCR output,an adaptive text post-processing method based on sliding window is proposed,and the Android platform is designed and implemented.Intelligent information acquisition system is designed.This paper mainly completes the following parts:(1)The key technologies used in the intelligent information acquisition system,including OCR technology for text information extraction from text images,focus on text post-processing technology after text information extraction and probability calculation of N-gram language model.(2)Based on the characteristics of N-gram language model and the position relationship between the first and last position of words,a bidirectional N-gram probability model is proposed,that is,the conditional probability model of current words based on adjacent words.At the same time,the concept of sliding window is introduced.Three characters in the text sequence are treated as a processing object,and the probability of the occurrence of intermediate characters is calculated,and the error correction is judged by comparing with the threshold value.This method not only makes full use of linguistic knowledge,but also makes full use of the candidate set obtained by OCR to extract text information,which improves the accuracy of the information acquisition system from the perspective of language itself.(3)According to the actual requirement of Chinese character information extraction in pictures in patrol system,an intelligent information acquisition system based on Android mobile terminal is designed and implemented,which mainly includes four modules:image acquisition module,image preprocessing module,information extraction module and text post-processin g module.The function of the system is tested to verify the accuracy and feasibility of the system.
Keywords/Search Tags:Information Acquisition, Optical Character Recognition, Post-processing of Text, N-gram Model, Patrol Inspection System
PDF Full Text Request
Related items