Research On Technologies And Methods Of User-oriented Short Text

Posted on:2019-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:G Chen

Full Text:PDF

GTID:2428330566473964

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development and popularization of Internet mobile technology,text information has become an important part of daily life,work and social interaction,most of the textual information derived from the Internet is short text.There are many sources of short text,such as a variety of chat systems,social software,question answering systems,and so on.The rapid increase in the number of short text also impeded the rapid acquisition of the main information.Especially in systems that require feedback quickly,such as a question-answering system,we need to analyze the core issues of the user-advisory statement first,and then give responses in a short time.These requirements are all challenging.Therefore,it is of great significance to use computer technology to mine and analyze short texts,clustering is an important means of organizing,abstracting and navigating effectively the text information,and it can also excavate the relationship between different texts,which helps to further process these texts.In short text,the characters are limited,the amount of information is limited,the noise is greatly influenced and the context information is insufficient,so the characteristics are relatively sparse.These features lead to short texts that can not be modeled using common long text modeling methods,posing many challenges for short text research.At present,the short text clustering technology faces the following problems: how to mitigate the influence of unrelated information? How to represent the sparse feature of short text? How to improve the quality of short text clustering? How to improve the efficiency of the short text clustering? In view of the above problems,this paper proposes a short text clustering method applied to user consultation short text.The main work is as follows:1.In this paper,we use the two order Hidden Markov Model to identify irrelevant words in user-oriented short text,and then we build a dictionaries of irrelevant words so that we can filter the irrelevant words2.In order to alleviate the problem of the sparse feature of the short text,we represent the short text by analyzing the characteristics of the short text and using the word vector to express the short text,we also use the selective weighting method to construct the text vector and use the similarity degree of the word vector to express the similarity between the short texts.3.In order to make the clustering algorithm adapt to the incremental data set,and prompt the efficiency of clustering algorithm,the clustering process is divided into two steps: off-line clustering and online clustering.We use user consultation short texts to carry out clustering experiments.The final result proves the validity of the similarity calculation method adopted in this paper,the accuracy of the cluster results is 82% and the recall is 73%.The clustering experiment on incremental data sets proves that the combination of offline clustering and online clustering can indeed greatly improve the efficiency of short text clustering.

Keywords/Search Tags:

Short Text, Irrelevant Words, Clustering, Word Vector, word2vec

PDF Full Text Request

Related items

1	Automatic Summarization Alorgithm For Chiness Short Text
2	Research On Short Text Clustering Of Social Networks Based On Word2vec
3	Research On Unknown Words Recognition And Word Meaning Discovery Based On Short Text Of Micro-blog
4	Research On Chinese Short Text Classification Based On Word Embedding
5	Research On Short Text Emotion Classification Method Based On Word2Vec And N-Gram
6	Research Of Short-text Clustering Method
7	Research And Application Of Short Text Clustering Based On Word Representations
8	Social Media Short Text Clustering And Its Applications
9	Research On Text Classification Based On Word2vec Word Vector
10	Research And Application Of PCA-PSO-FCM In Short Text Clusterting