| How to get the exact results that the users want has become the main goal of modern Search Engines. Search Engine is based on several techniques, Natural Language Procession is a significant one, which is also the foundation of improvement to other researches. Along with the Segmentation&Stop Words, the Tightness, as a significant index data to the Relevance Ranking of Web Search, is a dominating factor to the ranked results and takes a big part in the Search Engine. Tightness means a lot to improve the precision and recall of the searched results.Segmentor will segment the sentence to several parts as tiny as possible, which makes long-term phrases apart into several terms, and lead to recalling a lot of web pages that are not satisfied with the query requirements from users, decreasing the precision of search results, and making bad user experience to users. In this paper, based on actual project in Sogou Search Engine, the author researches the strategies and algorithms of new phrases discovery in Chinese segmentation, designs the method of extracting the relations between terms based on strategies, and forms those relations into several features, classifies different terms through Support Vector Machine, improve the result of the Tightness. The paper mainly completes following works:(1) Processing of meta-data, segmentation and statistics to the original query logs, getting the foundation data to the following algorithms.(2) Category based on Session Log. Calculates the query distance in the query session logs, gets some session data.(3) Category based on Web Page. To improve the result of proper nouns, calculates and statistics the foundation data based on the new phrases discovery algorithms, such like Information Entropy, Mutual Information. Gets the relations and features between terms. Classifies those features through SVM.(4) Validation and analysis. Does examination through the train set to the final off-line data, post-processing strategies improve the result of Relevance Ranking and the precision of search results.(5) Categories’ result. After post-process to Tightness, results of Relevance Ranking become more accurate, good pages get front positions, bad ones get backs. |