In this research, the author extracts verb patterns based on Pattern Grammar, from a large scale corpus in a data driven manner. The pattern components are semantically classified in terms of semantic roles. A knowledge base and a web-based search engine is built upon the extracted patterns that allows querying the patterns in a semantic way. The pattern types extracted in existing studies such as StringNet, Grasp. Linggle are not rigorously defined, e.g., represented in word lemmas and/or pure part-of-speech. In additional, the query criterion is not quite user-friendly, whereupon it motivates this research.Method in this study takes advantage of full syntactic parsing and through a supervised binary classifier recognizes the verb arguments, which constitutes the candidates of pattern components together with the particle and the verb itself. The pattern components are sorted by the order in which the corresponding syntactic constituents appear in the sentences to form a pattern candidate, which is then ranked using statistics measures and the probabilities. A multi-class classifier is trained to label the semantic roles on the pattern components, by using the standard Semantic Role Labeling (SRL) model. This study extracts the whole structure patterns by looking at both sides of a verb, treating the passive voice as a different pattern. Further, the particles and phrase head words are included as pattern component types. As a unified way to extract a word’s selectional lexicons and structures based on Pattern Grammar, it is, to our best knowledge, a novel data driven approach. Using standard SRL classification schema also provides new insights for the semantic preference study. The author proposes a model to both extract pattern and label semantic roles on the components, as an examined solution for the research questions. The other contributions of the work includes a significant improvement for SRL core arguments, and a demonstrated parallel software framework based on Map/Reduce to process large scale data to carry out the tasks. |