Font Size: a A A

Summary extraction from chat data

Posted on:2008-07-07Degree:M.SType:Thesis
University:University of KansasCandidate:Lingor, ChristopherFull Text:PDF
GTID:2448390005970571Subject:Computer Science
Abstract/Summary:
Chat rooms are becoming a mainstream form of communication across the Internet. Internet crime is also increasing and is facilitated by the anonymity of these chat rooms. Consequently, while there is an increased interest in monitoring chat rooms, there are too many chat rooms for this to be feasible. This thesis research takes well-known statistical information retrieval algorithms that have been successfully used on other forms of text such as newspapers and journals and applies these techniques to chat data in an attempt to automatically generate considerably smaller summaries of the original logs of chat room discussions.; Three experiments are performed to determine if standard information retrieval algorithms can be applied to chat data. First, five different methods to calculate the inverse document frequency, or IDF, are tested. The most effective of these is used in the second experiment to determine the best summarization algorithm. Finally, the addition of a segmentation algorithm is used to determine if it provides information useful to produce a better summary.; The results of these experiments indicates that the standard tf*idf algorithm can be used on chat data to produce short summaries that are at best 40% (at worst 10%) better than reading the entire chat log to gather the most important information.
Keywords/Search Tags:Chat data, Chat rooms, Information retrieval algorithms
Related items