Font Size: a A A

Research On Entity Retrieval And Mining With The Web

Posted on:2009-06-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:S H BaoFull Text:PDF
GTID:1118360242483561Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Web technologies, the World Wide Web is comingto a new status containing multiple mixed generations, each of which keeps developing fastas well. The traditional web (Web 1.0) still acts as the principal part of the current Web.Recently, social World Wide Web (Web 2.0) develops rapidly and becomes the rising notablepart of today's Web. At the same time, many people are working on the development of theSemantic Web where machine can understand and process various web data like humanbeings. It is expected to be a main stream in the next generation of Web (Web 3.0). Variousapplications emerge endlessly in all these generations of the Web. They bring the web usersgreat convenience as well as a key problem, i.e. information overload. How to effectivelyfind the desired information for the user from such a huge and complex information spacebecomes a hot research topic in recent years. In this paper, we propose to mine the entityinformation in Web 1.0, 2.0 and 3.0. For each generation, we analyze the properties of theWeb and propose a series of mining algorithms as follows.In the traditional web (Web 1.0), most work targets on providing the user with the mostrelevant web pages. In reality, more and more users are concerned with information of en-tities scattered in the web page, but not the web page itself. Motivated by this, the first partof this paper proposes the following algorithms for entity mining. 1) Expert search: Wepropose a new algorithm, namely fine-grained model, to address the problem. 2) Expert-expertise mining: We propose a new typed separable mixture model to mine the latent as-sociations between expert and expertise effectively. 3) Competitor mining: We propose anew algorithm, CoMiner, to mine the competitors automatically in a domain-independentmanner. 4) Temporal event mining: We propose a new algorithm, TESer, to mine the eventschronologically.With the boost of Web 2.0, more and more web resources like web pages, picturesare annotated by web users with different backgrounds, for example, various resources areannotated with services provided by Del.icio.us, Flickr and so on. The second part of thispaper analyzes the properties of Web 2.0 and mines the entity relations. 1) Social search:We propose two new algorithms to improve the web pages'similarity ranking and static ranking, respectively. 2) Social language model: We propose a new algorithm to smooththe estimation of language model with social annotations. 3) Social browsing: We proposean effective algorithm to utilize the semantic association and hierarchical information toimprove the social browsing experience.To make machine understand web information, researchers propose the Semantic Webto define the semantics of web resources explicitly. The Semantic Web is in an early stageof rapid development. As a natural extension of the current web, Semantic Web (referredas Web 3.0 here) is expected to be the coming next generation of the Web. The third partof the paper takes a try on mining the semantic information of Web 3.0.1) Emergent seman-tics:We propose an effective algorithm for emerging hierarchical semantics from social an-notations. 2) Semantic web service composition: We propose a semantic rewriting approachfor semantic web service composition based on query rewriting.The experimental results show that the mining of entities in web 1.0, 2.0 and 3.0 benefitsthe web users a lot in saving time to find the target information and facilitates the understand-ing of the target entities.
Keywords/Search Tags:Web 1.0, 2.0 and 3.0, Entity information mining, expert search, competi-tor mining, temporal event search, social search, social language model, social browsing, emergent semantics, semantic web service composition
PDF Full Text Request
Related items