Extracting High-level Multimodal Features

Posted on:2018-02-12

Degree:Master

Type:Thesis

Country:China

Candidate:X Li

Full Text:PDF

GTID:2348330518996542

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the information technology and computer technology developing,massive information emerging. Feature extraction for information is the pri-mary work of machine learning algorithm, and the most important work. Deep learning neural network can fully study this information and get deep character-istics, and has made remarkable development in recent years. Information ex-ists in the different form and mode, for different modes of information require different processing methods and information extraction methods. For exam-ple, the existence of information in the real object of the film mode, including text information (film profiles, etc.) and image information (film posters, etc.).At present the film industry is booming, the annual total film consumption has exceeded tens of billions of dollars. This has provided a variety of data for pat-tern recognition, feature extraction, and other fields, and has also proposed for the film application requirements, such as the movie recommendation system.Therefore, the feature extraction of the multi-mode information for the movie has attracted much attention. The abstract should present the core of the re-search work, especially the purpose and importance of the research, the method adopted, the results, and the conclusion.The main work of this paper is divided into three parts: the deep feature extraction of text information, the deep feature extraction of image feature and the deep feature extraction of text-image multi-modality. Key words are aligned at the bottom left side of the abstract content.In this paper, the deep feature extraction of text information is extended on the basis of deep learning neural network, and the text information is extracted by using convolution neural network and recurrent neural network. That is,convolution neural network and recurrent neural network are trained to get word embedding. Then the feature vector of the movie profile information (word segmentation of one or more sentences, concatenation of word vectors for all words in the sentence) is input into a recurrent neural network according to the obtained word vector matrix. In the movie emotion predicting task, the final goal of fitting all the network structures is the emotional tendency of the film.And compared with the traditional text information extraction methods.For deep image feature extraction,we use deep convolution neural network to get deep image features. In recent years, convolutional neural network has been achieved great success. And we also compared deep learning methods with traditional image information extraction methods.For the deep feature extraction experiment of text-image multi-modality,we designed a neural network structure to capture the relationship between text and image data. To compare the multi-modality features with uni-modality fea-tures, experiments about movie emotion prediction and movie ratings prediction have also been done. Those experiment results have shown the effectiveness of our structure.

Keywords/Search Tags:

deep learning, feature extraction, text features, image features, text-image multimodal features

PDF Full Text Request

Related items

1	Research On Video Memorability Prediction Based On Multimodal Feature Fusion
2	Research On Entity Relationship Extraction Technology Based On Deep Learning Fusion Of Text Features
3	Learning-Based Text Extraction In Natural Background
4	Research On Image Feature Extraction Method For Design Patent Image Retrieval
5	Visual Semantic Understanding And Question Answering Research Based On Knowledge Graph
6	Based On Multi-core Learning Mode Of Medical Image Classification
7	Spliced Image Forgery Detection By Texture And Statistical Features
8	The Research Of Based On Visual Features Image Classification Retrieval Technology
9	Video Semantic Analysis Based On Multimodal Features
10	Research On Fusion Of Multi-level Image Features For Image-text Matching