Font Size: a A A

Research On Blog's Tags And Contents Search Engine

Posted on:2012-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:H X SongFull Text:PDF
GTID:2178330335460305Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development and maturation of Chinese web2.0 website, it has provided a new platform for better information sharing. So the information on the internet is increasing sharply. When search engine as a powerful tool, is used by people to search information, people expect that search engine can improve the result of searching. The technology that make search engine personalize, professional, real time and intelligent has become the trend. Tag as one of characteristics of web2.0, is a keyword used by remarking the data resource, and it is a common way of organizing and finding resource. Therefore, blog tag is worthwhile factor for the blog search engine.First, we analyze the current situation and development of tagging on Chinese blog, including tag classification, hot tags, named entities in tags, new popular words in tags, etc. And we put forward a set of issues needing attention when tagging. Then we try to build a tag recommendation model according to these issues. The new proposed model is mainly based on text classification and key words extraction of blog content in order to help blog writers adding more precise tags for his information and performing better in blog retrieval. In order to evaluate the performance of this new model from the opinion of folksonomy, we have collected testing corpus from the del.icio.us site, which provides blog tags given by the author and the public at the same time. The original test has shown that this model is effective; it could obtain good recall in tag recommendation. Second, we study the working principle of search engines and related technologies, such as the web crawler technology, html parser, data index and search algorithm technology, etc. At last, we design and implementation a blog search engine test system based on tags and contents, with fully using natural language processing technology. The search algorithm is more effective proved by the experimental results, and the system can much better meet the user's search needs. It further illustrates that tags take an important role in blog search engine.
Keywords/Search Tags:blog search, tag recommendation, semantic similarity, key words extraction, folksonomy
PDF Full Text Request
Related items