Chinese Webpage Feature Extraction In Learning To Rank Algorithms

Posted on:2010-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:J Liu

Full Text:PDF

GTID:2178360332457875

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Existing Sorting technology in search engine has evolved over two generations. The first-generation search engine is based upon the statistical ranking of word frequency and position, e.g., Infoseek, Excite, Lycos, etc. However there are some disadvantages with this method, such as the fact that it doesn't utilize the properties of web pages, like hyperlinks and anchors. Moreover, many webpage editors stack keywords on their pages to intervene the judgment of search engines for the sake of higher ranking in search results. The second generation search engine is based on the sorting of link analysis, such as Hyperlink Analysis from Baidu and PageRank from Google. In order to be displayed on the first page of search engines, web sites often use increasing links or exchanging links among themselves or set up cheat links. As a result, those websites with excellent contents but less links are very difficult to be found by search engines.Learning to rank, a new method in webpage ranking, is able to compensate the insufficiency of the two methods mentioned above. However, the existing methods of learning to rank only apply to the English webpage, while the learning to rank in Chinese webpage is lack of research.In order to do this, the thesis for the different features of Chinese webpage and Chinese webpage to design and implement a Chinese webpage feature extraction system in learning to rank. In addition to applying the traditional TF, IDF, DL such as word frequency statistical methods, but also applied to the classic language models of document relation extraction methods, such as BM25, LMIR_ABS, LMIR_DIR and LMIR_JM. At the same time, this thesis applied Edit Length to Chinese webpage feature extraction in learning to rank.Then, we set up the learning to rank platform, and implemented the classical RankNet and RankSVM algorithms for the extracted Chinese webpage features. And we compared the performance of features in the Chinese webpage ranking using experimental results.Finally, we input the features with Edit Length (EL) and features without EL to RankNet and RankSVM systems, respectively, and then compared the error rate. The experimental results show that between the RankNet and RankSVM systems, the error rate with EL reduced more than 3% to 10% compared with error rate without EL, which confirms the contributions of introducing additional features.

Keywords/Search Tags:

learning to rank, feature extraction, Edit Length, RankNet, RankSVM

PDF Full Text Request

Related items

1	Research On Training Learning To Rank Algorithm With Heterogeneous Data
2	Research On Multimodal Learning To Rank Based On Deep Semantic Features
3	Exploring The Use Of Low-rank Matrix Recovery In Dimensionality Reduction Of High-dimensional Data
4	Rank Optimization For Person Re-identification Through Intelligent Machine Learning Techniques
5	Research On Learning To Rank Answers For Vcsa
6	Research Of Dimension Reduction Algorithm And Its Application
7	Research On Low-rank Presentation And Recognition Of Human Actions In Video Sequences
8	Research Of Learning To Rank In Information Retrieval
9	Research On The Application Of Learning To Rank In Recommendation Systems
10	Research On Model Learning Based On Sparse And Low-rank Constraints