As a carrier of information resources,mathematical expressions are widely used in the field of education and scientific research because of their objective and accurate expression ability,and widely exist in the network.In order to meet accurate information requirements,it is very important to quickly and automatically extract mathematical expressions.However,due to the particularity of unstructured text and the complexity of mathematical expressions,there are still a little researches on extracting mathematical expression in unstructured text.Considering English text,a method of extracting mathematical expressions in unstructured text is proposed based on Hidden Markov Model(HMM).Firstly,aiming at structural defects of the traditional HMM,the strategy of model observation directly depending on the latter states,and the model parameter learning and training algorithm is improved.Based on the above researches,an improved HMM with a new structure is designed.Then,aiming at the problem that the HMM training effect is sensitive to the initial parameters,the quantum particle swarm optimization(QPSO)intelligent optimization algorithm is introduced to optimize the selection of the initial parameters,a HMM initial parameter optimization algorithm based on QPSO is designed.Next,the improved HMM is applied to extract mathematical expressions of unstructured text,the method of constructing the HMM model state for unstructured text is given,and the mathematical expressions extraction algorithm based on improved HMM is proposed.Finally,13423 standard mathematical expressions and unstructured texts with mathematical expressions are used to verify the proposed algorithms,the experimental results show the effectiveness and superiority of the improved HMM and mathematical expression extraction algorithm,and have a strong practical application value. |