Font Size: a A A

Research And Implementation Of A Time-based Vertical Search Engine

Posted on:2013-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:R LiFull Text:PDF
GTID:2248330374972560Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In recent years, the rapid development of Internet resulted in explosive growth ofinformation, it also causing the user to obtain the information accurately and timelybecomes more and more difficult, and the search engine appears to ease this kind ofcrisis. Later, the vertical search engine which oriented given field appears, which cangive users the particular areas of the personalized information search service, andcompensates for the general search engine’s theme is broad, so that make the retrievalresults more targeted, but also improves the satisfaction of query.Temporal information plays a very important role in natural language. Accordingto statistics, temporal information share27%in all text information, and only litterthan the proper nouns which shared31%. As this paper on how to introduce temporalinformation into the vertical search engine as the factors studied.In this paper, how to recognition and standardized the time expression which inweb page’s text is studied, and then proposed expression the document as vectorwhich component is the time in document, and proposed a index structure which takethe time as an index term. At last, base on these methods, we completion a verticalsearch engine can query information according to the time in web page’s text.The main works of this paper are as follows:1. Expounds the development process and present situation of the search engines,and introduced the working principle, the basic architecture and the core technologysuch as: crawler, text processor and retrieval device.2. Analysis the inevitability of the vertical search engine to generate, and itsdistinction with universal search in effect and the technical realization.3. Classification of the expression of time reference TIMEX2, given the methodcombined of identification the rule-based templates and time dictionaries, and thendiscusses how to standardize the time expression.4. Proposed Chinese place name recognition method based on the rules, and theexperimental results show that the recall rate of this method is around90%.5. Based on the vector space model proposed a method expression the documentas vector which component is the time in document, and proposed an index structurewhich take the time as an index term, and proposed a method of calculating thesimilarity of time vector. At last, a document sorting algorithms and the search rulesaccording to the time vector similarity was given.6. Based on the previous theory and algorithms, design and implementation oftemporal vertical search engine, and the system architecture and the concreterealization of functional modules described in detail.
Keywords/Search Tags:information retrieval, search engine, temporal information, entityrecognition, VSM
PDF Full Text Request
Related items