Font Size: a A A

Text Representation Model Based On Semantics And Structured Tensor

Posted on:2020-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:J C ZhuangFull Text:PDF
GTID:2428330578479993Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Because text is a kind of unstructured data,the computer cannot operate directly them before converting them into structured data.For the reason,constructing text representation model is the first goal in text processing.This paper focuses on text representation and two kinds of text representation model are put forward for different purposes-Local Buzzword Model and Structured Tensor Space Model.Additionally,a software,named Text Layered Software,was developed according to the principle of STSM.The work is summarized as follows:1.Local Buzzword Model(LBM).A two-step keyword extraction strategy was firstly developed.Then the local buzzword model was proposed by combining the word2vec model and clustering the keywords.The model can extract features in fields corpus,and it can reduce the negative impact of the distributed imbalance of corpus.We apply this model to the corpus consist of tourism comments,and the experimental results verify the effectiveness of extracting features of field based on LBM.2.Structured Tensor Space Model(STSM)is designed based on structured characteristics of texts themselves.The content of the text can be divided into several major levels.We assume the meaning of the paragraphs under the same level is close,but the meaning of the paragraphs in different levels is far.According to this hypothesis,we proposed the hierarchy structure extraction algorithm(HSEA),which can make the text layer according to hierarchical structure.STSM can be built by applying the extracted hierarchies to text representation.We use Sogou corpus and Fudan corpus to verify the effectiveness of STSM through text classification experiments.The experimental results show that the classify consist of STSM and high order support tensor machine has better classification effect in the case of small sample corpus,which reflect STSM is a effective text representation model.3.Text Layered Software(TLS)is designed based on HESA,which is a visual demo of the hierarchical text.Beside layering text,TLS can extract the abstract and the central sentences of articles.
Keywords/Search Tags:Text representation model, word2vec, keyword extraction, hierarchical structure, structured tensor space model, text classification
PDF Full Text Request
Related items