Font Size: a A A

Translates Natural-language Command To User Action

Posted on:2011-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:X WuFull Text:PDF
GTID:2178360305454757Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Keyword command is a research field related to natural-language programming. Keyword command splits sentence into words (or called keywords) and then translates the keywords into application scripts. It accuracy is very high, so it's very easy to write keyword commands to someone never writes programming codes. Meanwhile, application with macro-recording function is increasing rapidly. Some researches are focusing on finishing user requirements by watching user behavior, like a macro-recorder, and doing the needed actions, we called those researches End-user programming. Instead of translating user actions into script codes, they dose real actions for the user. Both keyword command and End-user programming are very significant research directions in Human-Computer Interaction research, they both help user finishing their work batter and faster.Although keyword command has a very high accuracy in translating user natural-language command into application script, it also has some limitations, such as: 1) the host applications must support script programming, and 2) APIs in script language should be less than (or equal to) 20, etc. On the other side, an outstanding feature of End-user programming is: instead of translates macros into script code, it directly output macros as actions and help user does the work. It's more direct to end-user. Therefore, I present an approach directly translates natural-language command into actions. My approach can apply to far more applications, whatever they support script programming or not. And it is easier to end-users too.All I have done are: present an approach to translates natural language command into user actions and implement it under Windows 7. There're three steps to do the translating, are:Step-1: Tokenization. The first step is tokenization user input (natural-language command). In this step, the program splits user input sentence into words and lowered all up-cased letters in order to easy match in following steps.Step-2: Construct―Action-Tree‖. Create one or more Action-Tree(s) according to different type of actions, elements in system UI and tokens. Action-Tree is a tree with a set of actions as root and objects as leaves. There're 6 types of general actions: 1) Click action, 2) Double-click action, 3) Drag action, 4) Mouse scroll, 5) Text input, 6) Single keystroke. Remark that, the root of an Action-Tree is not a single word but a set of words with similar meaning, for example: {click, tap…}Step-3: Select the best match tree. The last step is to select the action-tree most likely to be the user's intention. To do so, we need to predefine some scoring rules. The scoring rules can vary in different environments (OS or application environment). There're 4 rules can be used in most cases: 1) Rule-1: +1, if a parameter is supplied in token set; 2) Rule-2: +2, if a word in token set equals the name of a UI element; 3) Rule-3: +1, if a word in token set equals the type of a UI element; 4) Rule-4: in―Click action‖and―Drag action‖, +3, if the second parameter and the last element are all supplied and the last parameter, in UI, is the parent element of the second parameter, in UI.In my implementation under Windows 7, there are 4 major modules:1) Observer: Observation module. Stays in background and observes the changes of OS and application windows. Another task of the observer is to obtain the handle and Microsoft Active Accessibility pointer of active application window (which are used to access the elements of a window).2) Command Parser: Command parse module. Waits for user input and then parses the input. After parsing, the output would be lower cased words set, we call the words tokens and the set token set. The token set is the input of following two modules (UI Finder and Arm).3) UI Finder: Query UI module. Query UI elements for information and relationship of elements according to the words in token set using window handle and MSAA pointer we got in step-1, then constructs action-trees.4) Arm: Action module. Scoring all action-trees and selecting the tree with highest score to execute.In this paper, I implemented the system under Windows 7 to verify my approach. In order to ensure my experimentation data were reliable, I selected 2 applications come with Windows 7 as the test applications and all natural-language commands were from their help document. I chose 8 pieces different operations in each help document, so totally 16 pieces operations, some simple one like this:―On the View tab, in the Show or hide group, select the Ruler check box‖(intention of this operation is: Display the ruler). Some operations are more complicated and have more than one step, such as:―1) On the Home tab, in the Shapes group, click the Line tool. 2) Click Size, and then click a line size, which determines the thickness of the line. 3) In the Colors group, click Color 1, click a color, and then drag the pointer to draw the line‖. The final result is: 87% natural language commands can successfully translate into user actions. The result is satisfactory.Research on how to translate natural language command into user action is useful. It can be the fundamental for other related research such as vice control computer. It can help vice control computer to parse and execute user commands in computer. In application field, it can become a standalone program to parse and execute application help documentations, like some similar systems do, such as DocWizards system.The experiments proved that my approach is verified and useful and its application field is very wide, most user operations can be successfully translated. Therefore, my research on translating natural language into user action has high value in research aspect as well as in application aspect. My research and approach can be the fundamental to other related research, so it has good effects in HCI field.
Keywords/Search Tags:Keyword command, End-user programming, Natural-language command, User action, Human-Computer Interaction
PDF Full Text Request
Related items