Font Size: a A A

Uyghur Language Information Retrieval And Management Platform

Posted on:2019-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q CuiFull Text:PDF
GTID:2428330569496205Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Today,when China puts forward the strategy of digitalization and power,digital development in Xinjiang's ethnic minority areas has attracted much attention.Although Chinese very popular now,but there are still a lot of xinjiang minority compatriots use this ethnic minority words and language,the language communication difficulty greatly restricted their communication and learning.With the creation of more and more Uygur language websites,more and more Uygur ethnic minority compatriots begin to use the Internet to learn knowledge and exchange experience.This can not only enhance the feelings of the minority ethnic compatriots,but also play a very important role in the national unity of the nation.The users search the network information quickly and accurately through the search engine network retrieval system in the vast Internet world.The mainstream search engine network retrieval system has a good effect on Chinese and English,but the effect of Uygur language is too poor to meet the increasing demand for information retrieval,It has greatly restricted the development of economy,society and education in Xinjiang's ethnic minority areas.The main research work of the full text is summarized as follows:1.A set of Uighur search engine retrieval system is designed and implemented by using high concurrent and high availability software architecture.The architecture realizes high concurrency by reverse proxy and load balancing server Nginx.Some modules in the system use cluster to achieve high availability.2.There is no information processing in the Uyghur language segmentation using the modern Chinese word segmentation specification and the unopened corpus dataset reference equivalence problem.Based on the latest research results of the laboratory Uighur information group,The Uyghur word segmentation model is encapsulated into a software module,which bears the function of word segmentation in Uyghur.Due to syntax characteristics of uygur language itself,did not achieve the Chinese to integrate the whole word segmentation and word segmentation of Lucene,so this article adoption of SOA services to integrate word segmentation module and Lucene's word segmentation process,At the same time,a participle experiment module is implemented.In this module,different participle and participle model files can be easily replaced,and the word segmentation model is updated quickly and the search results of the search engine can be observed through the search results of the search engine.3.An improved PageRank algorithm is implemented to make the sorting result better.For invalid links that appear frequently in search results,restore the web page through the web snapshot.4.The statistics of user behavior data are carried out,and the early work of data collection is carried out for the next research group,such as personalized search,public opinion monitoring,topic tracking and so on.Through this article,we can know that the gap between Uyghur network information retrieval service and Chinese and English search service is very large.China's largest search engine,Baidu,does not divide the search keywords of Uygur language,but divides Uygur words into a single character,then searches with each character,and does not search for the desired information.This paper implements the segmentation of Uyghur language and applies participle to Uyghur information retrieval.Uygur compatriots use a wide range of izda search engines without realizing the function of webpage snapshot.This article implements the function of webpage snapshot.The search keywords of izda search engine are not bright and the web page sorting is not ideal.This paper implements an improved PageRank algorithm,and the search keywords are all bright.In the fifth chapter of the test section,we can see the sort comparison.This paper has made a great progress in information retrieval in Uyghur language.
Keywords/Search Tags:Search Engine, Uighur, Participle
PDF Full Text Request
Related items