| In the long-term operation of Alipay service system, a large amount ofservice data have been accumulated, including the static knowledge arrangedby customer service staff, the communication records between customer andcustomer service staff and so on. However, this mass of data has not beenmade fully use of. In order to improve the overall quality of service, cut downthe cost of service, and provide support to varied service business, we designand build a vertical search engine which is efficient, stable and adapt to thedemand of the development of service based on service data. And on thisbasis, we will particularly describe the implementation of the user inputprocess module, which is the core module of the engine, in this paper.The user input process module aims at the pre-process of user’s inputquery, including operations of query parsing, words segmentation, keywordsextraction, synonymous substitution, tokens’ proximity information annotationand second search. In this process, for generating the stop words, businessbased synonyms and the proximity information between words, based on theHadoop distributed computation framework, we try to combine N-gram model,vector space model, edit distance algorithm, cosine similarity algorithm andpart of speech of the words, by optimizing the model calculation and logicoperations between different algorithms. Meanwhile, we proposed an efficientkeywords extraction algorithm, which is good for better understanding of user’ssearch intention and effectively extract the key points of user search, thus improving the rate of accuracy and recall of our vertical search engine. Beside,for the synonymous substitution in user input process module, we offer aselectivity synonymous substitution strategy, which tries to covert the usersearch intention into several similar or more accurately expression, so as toenhancing the coverage rate of search and the effectiveness of the searchresults returned. |