| Question answering system is currently a hot point in natural language processing research. It provides precise answer directly to the users who ask queries in natural language form. For question analysis, keywords help understand the semantic for given question; for information retrieval, the results of keywords extraction can influence retrieval results, answers similarity calculation and ranking. In sum, keywords extraction is the foundation of the question answering system. And research on techniques of question keywords extraction for question answering can improve the performance of the question answering system and user experience.In this thesis, we focus on two kinds of techniques of question keywords extraction: i.e. unsupervised approach and supervised approach. For supervised approaches, we mainly research on two techniques: machine learning-based approach and deep learning-based approach.In recent years, there have been lots of work on graph-based keywords extraction. In this thesis, we propose an unsupervised dependency-based ranking approach for question keywords extraction. We utilize word embeddings to better evaluate similarities between two words at the semantic level, and employ the dependency relationship between two words to evaluate their relevance at the syntactic level. A graph-based ranking model is utilized to more precisely rank candidate keywords, thus it improves the performance of question keywords extraction.For machine learning-based approach, we integrate dependency features into our model. By feature analysis, we choose the most effective features. A max entropy machine learning algorithm is employed to train the classifier that classifies whether the given word is a keyword. Experimental results show that dependency features can help improve the performance of question keywords extraction.By utilizing the deep learning technique to extract keywords, we can jointly integrate the feature selection and model building processes together, to automatically learn effective features for question keywords extraction and avoiding the feature engineering. In this thesis, to better utilize the semantic information for contextual words, a LSTM architecture is employed to build the neural network. Meanwhile, we propose a two phases training method to solve the problem of the lack of enough manually annotated training corpus. Experimental results evaluate the effectiveness of the deep learning-based approach. |