Font Size: a A A

Development Of Marine Science Popularization Website Based On Reptile Technology

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z S ZhaiFull Text:PDF
GTID:2428330602472231Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the number of Internet users in our country has grown rapidly,and the information on the Internet has exploded,which has led to the low efficiency of many people finding the information they need on the Internet.In order to increase the efficiency of users' access to information on the Internet and to attract more people to learn about marine science,this article develops a marine science website based on automatic reptiles to contribute to marine science.Websites need to implement automatic web crawlers for specific websites,and at the same time perform some simple processing and classification of the results of the crawler,store it in the database,and display it on the page.In addition,the website also needs to have a search function for Springer and other paper databases.Users can enter the relevant constraints that need to be queried on the page to get the query results.The crawler part of the website needs to start with the URL of the corresponding website,and through the regular expression analysis of the URL,the page belongs to the article list page or the article content page.Then you need to analyze the content of the web page through xpath,extract the required content,and store it in the corresponding class.The crawler needs to perform the next classification operation to obtain articles.Classification first needs to segment the article.Chinese word segmentation is more difficult to compare with English.You need to compare the corresponding dictionary or perform semantic analysis.The next operation of word segmentation is to perform bag-of-words processing and calculate the value of TF-IDF.These operations are to extract keywords in the article according to certain rules.The extracted keywords are subjected to naive Bayes classification to obtain corresponding classification results.At present,the search methods provided by the major paper databases are to use the corresponding API,write the requirements into corresponding URL addresses,and then get the results in JSON format.Due to the difficulty of writing the URL instruction,and the result of the JSON format is not clear enough,the search efficiency is very low and the operation is cumbersome.Therefore,this article designs to automate the writing of URLs and provide the corresponding constraints that users need to query on the page for users to query or select,which will greatly reduce the difficulty of user operations.After the user enters the corresponding constraints,the system background will automatically generate URL instructions for query,and after receiving the JSON format results,analyze the results and display the key content,which will make the query difficult.The combination of the two aspects will make the entire query process easier and more efficient.In summary,the main purpose of the website design in this article is to improve the user's information acquisition efficiency.The website will automatically obtain popular science articles,classify them and display them to the user,and provide corresponding pages when the user needs to query the paper Reduce the difficulty of the query.Increasing the efficiency of information acquisition will be the development trend of the website now and in the future.
Keywords/Search Tags:Web development, Crawlers, Text classification, Paper search
PDF Full Text Request
Related items