Research On The Semantic Retrieval Method Of Tibetan Language Based On Neural Network Language Model

Posted on:2022-04-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y Xiao

Full Text:PDF

GTID:2505306509997769

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the acceleration of information globalization,people’s life,work and learning styles are rapidly changing,which also promotes the development of networked applications of minority languages and scripts.In recent years,Tibetan language information has become more and more abundant on the Internet.How to quickly and accurately find out the Tibetan language information that meets the needs of users from the huge amount of network information resources has become an urgent problem to be solved by the current Tibetan language information processing technology.Traditional information retrieval is more based on keyword matching,which only considers the literal matching between words and ignores the semantic level associated information,but the Tibetan grammar is diverse and the phenomenon of multiple meanings of words is common,which makes the user’s retrieval experience poor.In summary,this paper introduces a neural network language model into Tibetan information retrieval technology and extracts the semantic relationship between query words and documents through BERT pre-training,thus improving the performance of Tibetan semantic retrieval.The work and main contributions of this paper are as follows.1.To address the problem that there is no public dataset for Tibetan,this paper uses crawler tools to collect Tibetan news corpus from Tibetan News Network,China Tibetan Network,Tibet Daily,Qinghai Lake Tibetan Network and other websites as the training dataset for the pre-trained model,and collects data from China Tibetan Netcom as the dataset for Tibetan semantic retrieval,after which and builds the pre-trained BERT language model.2.The BERT pre-trained Tibetan language model is fine-tuned and then applied to the Tibetan semantic retrieval task.The semantic information between documents and query terms is fully explored;the linear combination probability distribution between documents and queries is calculated,and the similarity between documents and query terms is calculated using this distribution,and finally the N documents with the highest relevance to the query keywords are returned,so that the semantic information of the user query is obtained and the documents with the highest semantic information of the user query are returned.3.By comparing the performance differences between the pre-trained BERT Tibetan semantic retrieval model and TF-IDF and word vector in the information retrieval task,the effectiveness of the method in this paper is verified,and the performance of Tibetan semantic retrieval is further improved.The results show that the comprehensive evaluation rate is higher than the traditional keyword-based retrieval method by24.31% and higher than the word vector-based semantic retrieval method by 19.57%.4.For the above experiments,a simple Tibetan semantic retrieval system was developed.It is capable of full-text search of corpus contents,and the search result pages are sorted from top to bottom according to semantic relevance.

Keywords/Search Tags:

neural network language model, tibetan, semantic retrieval, BERT pre-training

PDF Full Text Request

Related items

1	Research And Application Of Tibetan Pre-training Language Model Based On BERT
2	Research On Tibetan Language Model Based On Neural Network
3	Research On The Generation Of Tibetan Rhythmic Poems Based On Neural Network
4	Research On Sentiment Analysis Model Of Movie Reviews Based On Further Pre-training And Feature Fusion
5	Identification Of Chinese Framenet Semantic Role Based On Neural Networds Model
6	Exploration of autobiographical, episodic, and semantic memory: Modeling of a common neural network
7	Construction Of Mongolian Compound Noun Semantic Network Based On Neural Network
8	Oracle Image Retrieval Method Based On Siamese Neural Network
9	Tibetan Lhasa Acoustic Model Based On LSTM-CTC Speech Recognition System
10	Research On Music Genre Classification Model Based On Convolutional Neural Network