Font Size: a A A

A Study Of Context Modeling Based On Probabilistic Topic Models

Posted on:2015-05-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:B X HuaiFull Text:PDF
GTID:1228330434966100Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of IT industry and related technologies, people face a huge amount of information everyday. Meanwhile, the increasing prevalence of smart mobile devices and mobile Internet also enables the easy collection of user behavior data and corresponding contextual information. Indeed, the arrival of big data era not only brings a lot of opportunities for people, but also makes it harder to discover useful knowledge from such kind of massive data. Data mining is an effective approach for solving the problem of information over load, which makes it easier for us to carry out data research. However, traditional data mining technologies usually focus on the whole data space, which cannot accurately capture users’various information needs. Therefore, it is important for researchers to explore the rich contextual information for modeling the behavior of a specific entity, which generally can be a person, a product, a news or other meaningful items. In this dissertation, we carry out a comprehensive study on context-aware data mining with respect to different Internet environments, such as traditional Internet and mobile Internet. Specifically, based on different application scenarios, we propose three research problems, namely, context recognition for mobile users, context-aware intent modeling for mobile users, and the named entity linking for Internet documents. In particular, our research contributions can be summarized as follows.First of all, the problem of mobile context recognition targets at identifying the semantic meaning of context in a mobile environment. It plays an important role in understanding mobile user behaviors, and thus provides the opportunity for the devel-opment of better intelligent context-aware services. A key step of context recognition is to model the personalized contextual information of mobile users. While many stud-ies have been devoted to mobile context modeling, limited efforts have been made on the exploitation of the sequential and dependency characteristics of mobile contextual information. Also, the latent semantics behind mobile context are often ambiguous and poorly understood. Indeed, a promising direction is to incorporate some domain knowl-edge of common context, such as "waiting for a bus" or "having dinner", by modeling both labeled and unlabeled context data from mobile users, since there are often very few labeled contexts available in practice. To this end, in this paper, we propose a se-quence based semi-supervised approach to model personalized context for mobile users. Specifically, we first exploit the semi-supervised Bayesian Hidden Markov Model (S-BHMM) for modeling context in the form of the probabilistic distributions and transi-tions of raw context data. Also, we propose a sequential model by extending S-BHMM with the prior knowledge of contextual features to model context more accurately. Then, to efficiently learn the parameters and initial values of the proposed models, we develop a novel approach for parameter estimation by integrating the Dirichlet Process Mixture (DPM) model and the Mixture Unigram (MU) Model. Furthermore, by incorporating both user labeled and unlabeled data, we propose a semi-supervised learning based al-gorithm to identify and model the latent semantics of context. Finally, experimental results on real-world data clearly validate both efficiency and effectiveness of the pro-posed approaches on recognizing personalized context of mobile users.Secondly, by moving the scene of mobile user research and contact log, we propose to model the contact intents of mobile users based on probabilistic topic models, which can be used for building context-aware contact recommendation services. Specifically, in this work we first introduce the importance of modeling contact intents for mobile users. Indeed, with the popularity of smart mobile devices, and the extensive use of various smart Internet applications, the "contact" operation is frequently used in many situations. However, due to some common drawbacks of mobile devices, such as such as the small screen, it is necessary to model the latent contact intent, which is the basis of many intelligent services. In the data preprocessing stage we present a simple but effective contact session segmentation method and context region partition algorithm. Particularly, we propose the assumption that the user contact someone are usually based on some kind of latent topic, and the time gap is short between contact operations in the same session. At the same time, latent contact topics are also depend on its geographical location context. Based on this assumption, in this paper we propose to model the latent contact topic of mobile users based on probabilistic topic models, and the experimen-tal results show that the model proposed in this chapter is reasonable for user contact modeling.Finally, through the analysis of the text in the Internet environment of word and entity from two angles, we propose an entity context modeling method based on proba-bilistic topic models. Compared with the context modeling for human, the context mod-eling for Named Entity is also important, it helps us to understand the entities better, leads to more accurate and reasonable use of entity and provides services of higher qual-ity users. Specifically, we put the text information into two levels:words and named entities. However, since the two types of data belong to the same document, they should have the same topic distribution, so it can be mapped to the same topic space. Based on this assumption, we put forward a topic modeling method based on variational infer-ence, and use this method to calculate parameters, which makes the model more easily parallelized, and provides the theoretical foundation for the subsequent processing of massive data. The experimental results show that the model proposed in this chapter is practical and the assumption is reasonable.
Keywords/Search Tags:Context Modeling, Context Recognition, Probabilistic Topic Models, NamedEntity Linking
PDF Full Text Request
Related items