Font Size: a A A

Earch On Time-Aware Web Search

Posted on:2016-04-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:S LinFull Text:PDF
GTID:1228330470957949Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and information technologies, Web data shows an explosively-growing trend. Massive Web data make people difficult to find useful information quickly. This situation leads to the increasing deveploment of Web search engines. Search engines collect a great number of Web pages on the In-ter-net through crawlers. These Web pages are thereby indexed and stored in seachr en-gines, after some specific processing. Then, search engines return ranked results to users using some ranking algorithms. So far, search engines have become an im-portant tool for people to obtain Web information, and thus optimizing search engines has been a hot topic in the Web area.One critical problem in current search engines is that they lack efficient tech-niques to deal with time information within Web pages. Time is closely related with people’s daily life. People often submit time-related querying terms to search engines when they perform search behavior. A recent survey showed that about1.5%of user queries contain explicit time constraints, while about7%contain implicit time con-straints. In addition, our experimental results indicated that a Web news report gener-ally has about five time expressions. These data emphasize the importance of time in Web search. However, current search engines regard time expressions as tex-tual key-words, which is not enough for processing time-related queries. Further, they only consider the publishing time of Web pages in their indexing and ranking process and do not take into account time expressions embedded in the texts of Web pages as well as in user queries. Therefore, existing search engines can not get high effective-ness when evaluating time-related user queries.Based the aforementioned background and the situation that current search en-gines lack efficient techniques for processing time-aware queries, in this disserta-tion we studied some key issues in time-aware Web search and proposed a series of solu-tions. We first discussed the background and importance of time-aware search and analyzed the existing problems as well as challenges. Then, we focused on several is-sues, including time information extraction from Web pages, time-aware ranking, time expansion for user queries, and time-aware prototype search engines. Briefly, we made the following contributions in this dissertation:(1) Based on the current situation that there are few studies on the relevance be-tween Web pages and time expressions, we proposed a focused-time extraction algo-rithm for Web pages, which is based on time expressions’ frequency of Web pages and the relationship between these time expressions. This algorithm not only considers the number of time expressions appeared in a Web page, but also takes the inherent mean-ing of time into account, it is more suitable for the understanding of the text, it is also considers the different extraction accuracy between the explicit time and implicit time. This algorithm gets a high and acceptable accuracy.(2) Aiming to resolve the issue that the content time of Web page and the rele-vance between the time expressions and the Web page are not considered suffi-ciently in the time aware ranking algorithm, we propose a time aware ranking algo-rithm based on Web page focused time. This algorithm fully considers the content time of Web pages, and it uses the relevance weight between the time expressions and Web pages, it also takes the different extraction accuracy between the explicit time and implicit time. The time aware ranking algorithm proposed in this dissertation outperforms the algorithms compared.(3) Aiming to resolve the issue that users do not know the time constraints of their queries, we propose a query time words expansion algorithm based on weight matrix. This algorithm analyses the content of Web pages, considers the co-occurrence relationship between the time expressions and the text keywords, computes relevance scores, and then return a sorted time expressions list according the relevance scores when users come up with some query keywords. This method make use of the content of Web pages, it can update the expanding time words in time when a new Web page is recorded to the index of search engine. This query time words expansion algorithm get a high accuracy in our experiment.(4) Aiming to resolve the issue that there is no unified experimental platform for time aware ranking algorithm, we implement a time aware search prototype system called TASE (Time-Aware Search Engine). The system introduces the Web page time expressions representation model which can satisfy a variety of time aware ranking algorithms, our system combines the time similarity and the text similarity with the linear weighting to get the final similarity. So we only need to implement the time sim-ilarity of a time aware ranking algorithm, this algorithm can be extended to this pro-totype system, our system has a strong extensibility. In this dissertation, our pro-totype system implements a variety of time aware ranking algorithms, and can give users a good experience by using of the front-end technology such as AJAX. The studies in this dissertation offered feasible solutions for some critical issues in time-aware Web search. We proposed several new methods, such as time in-for-mation extraction from Web pages, time-aware ranking, and time expansion for user queries, and built a prototype system to demonstrate the performance of our pro-posal on real data sets. The results showed that our proposal can effectively improve the performance of search engines in processing time-related queries. Further, our studies can offer some new references for the development of future search engines as well as time-related Web applications.
Keywords/Search Tags:Web-Page Focused Time, Time-Aware Web Search, Time Expansion forQueries, Web Search
PDF Full Text Request
Related items