Font Size: a A A

Design And Implementation Of Baidu Data Retrieval System Based On Big Data Platform

Posted on:2016-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:M L GuanFull Text:PDF
GTID:2308330482481347Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the 21 st century, there is no doubt is the age of the Internet, now information in the Internet every day are in the explosive growth, in the face of massive data storage and analysis, the traditional centralized search engine is it see the elbow.So people put forward the tactics of distributed search engine. The distributed file system based on cloud computing can reasonably utilize hardware resource and carry out efficient parallel retrieval.The vast amounts of data generated by the Internet every day are valuable wealth, but no search engines they are just a bunch of messy data, need to consume a lot of manpower to dig. Traditional search engines are using keywords that match the query, unable to speculate on the people’s intentions, so that the user is difficult to obtain accurate you want information, so the search engine, the distributed intelligent is the trend of future development.For enterprises are a hundreds of thousands of employees of the enterprise, branches distribution around the world, in the enterprise portal to provide for the unity of all the staff of the search service and search content including enterprise arising from the application of the business data and employee information. Most of the enterprises can not fully explore the value of its data for example now enterprise data mostly unstructured data, which includes word document, Excel spreadsheet, PDF files, picture scanning, e-mail, phone records, voicemail, paper documents, photos, web pages, videos, and other forms of content. Because many companies lack the technology that can understand and use them effectively, the resources that are very valuable and full of strategic meaning often can’t exert its function. Enterprise data and miscellaneous lack of unified management platform, business personnel lack of technical support, the structure of the underlying data are not familiar with only through technical personnel provided efficiency is very low, so a search system based on natural language intelligent cloud for enterprise value is inestimable.The system is based on large data platform through new mobile industry terminology thesaurus, self learning dynamic semantic network analysis model, using Lucene / Solr participle server, users can input natural language retrieval of the data. Through the dynamic semantic network semantic analytical model, the system can automatic collection, analysis, semantic enrichment of entry, continue to improve the "natural language and technical language" corresponding to the thesaurus. The metadata repository and a unified computational framework for heterogeneous data access and integration file, traditional database, XML, MPP and Hadoop and structured / non structured data in multiple types of platforms, information query service by a unified platform to provide. Using intelligent task collaboration, query distributed processing, quick response information query service. By using Streaming Spark stream processing technology, we adopt the memory index method, and establish the incremental index update mechanism of background data, and provide the latest data to the users.
Keywords/Search Tags:Intelligent Search Cloud, Data retrieval, Enterprise search, Big Data
PDF Full Text Request
Related items