Font Size: a A A

Extracting High-level Multimodal Features

Posted on:2018-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2348330518996542Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the information technology and computer technology developing,massive information emerging. Feature extraction for information is the pri-mary work of machine learning algorithm, and the most important work. Deep learning neural network can fully study this information and get deep character-istics, and has made remarkable development in recent years. Information ex-ists in the different form and mode, for different modes of information require different processing methods and information extraction methods. For exam-ple, the existence of information in the real object of the film mode, including text information (film profiles, etc.) and image information (film posters, etc.).At present the film industry is booming, the annual total film consumption has exceeded tens of billions of dollars. This has provided a variety of data for pat-tern recognition, feature extraction, and other fields, and has also proposed for the film application requirements, such as the movie recommendation system.Therefore, the feature extraction of the multi-mode information for the movie has attracted much attention. The abstract should present the core of the re-search work, especially the purpose and importance of the research, the method adopted, the results, and the conclusion.The main work of this paper is divided into three parts: the deep feature extraction of text information, the deep feature extraction of image feature and the deep feature extraction of text-image multi-modality. Key words are aligned at the bottom left side of the abstract content.In this paper, the deep feature extraction of text information is extended on the basis of deep learning neural network, and the text information is extracted by using convolution neural network and recurrent neural network. That is,convolution neural network and recurrent neural network are trained to get word embedding. Then the feature vector of the movie profile information (word segmentation of one or more sentences, concatenation of word vectors for all words in the sentence) is input into a recurrent neural network according to the obtained word vector matrix. In the movie emotion predicting task, the final goal of fitting all the network structures is the emotional tendency of the film.And compared with the traditional text information extraction methods.For deep image feature extraction,we use deep convolution neural network to get deep image features. In recent years, convolutional neural network has been achieved great success. And we also compared deep learning methods with traditional image information extraction methods.For the deep feature extraction experiment of text-image multi-modality,we designed a neural network structure to capture the relationship between text and image data. To compare the multi-modality features with uni-modality fea-tures, experiments about movie emotion prediction and movie ratings prediction have also been done. Those experiment results have shown the effectiveness of our structure.
Keywords/Search Tags:deep learning, feature extraction, text features, image features, text-image multimodal features
PDF Full Text Request
Related items