Reasearch On Filtering Method About Garbage Webpages In The Agriculture Websites

Posted on:2012-07-12

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Zhang

Full Text:PDF

GTID:2178330335986020

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

Xinjiang is situated in the country's northwest region,which has vast territory and a lot of resources,but it is precisely because its vast,objectively lead to far distance between rural areas, between urban and rural areas,which indirectly lead to poor information in rural areas and hinder development of productivity in rural areas. It is clear that the information construction in rural areas of Xinjiang is imminent, the peasants urgently need modern means of information to get real-time information and grasp market trends. In the many means of information, agriculture website is most popular with farmer users, because it not only provides professional agricultural information in real time and reflects the agricultural market dynamics, but also provides very abundant information.The peasants can browse the most comprehensive agricultural information, attend Real-time introduction of the national agricultural policies, adjusting planting structure, the sale of farm produce through agriculture website. But look at all of the current agricultural sites, are a common problem that Web site there are a lot of invalid information page.These invalid information webpages mainly include non-agricultural category webpages, agricultural category webpages without main contents and navigation webpages which we call"Garbage webpages in agricultural websites".The existing of "Garbage Webpages" seriously impede the farmers to get accurate market information timely.So as to help farmer users get accurate and useful agricultural information timely,we choose the appropriate webpage identifing models and improve them to remove "Garbage Webpages" from agriculture websites. In this article, on the basis of researching on a great number of data at home and abroad, I research strengths and weaknesses on Multiple Linear Regression, Naive Bayes, and Fisher.I make use of document frequency, Square test and JE, IK, Paoding's knives on the basis of these three webpages identifying method, then analyse and compare their test result. For its own part of an agricultural type, but the main content of page is blank pages and pages of normal type of the distinction between agriculture, I used the Naive Bayes and fisher both pattern recognition method, using the same feature extraction model and the Chinese Segmentation software.In the process of extracting features from webpages, according to the features about these webpages, I improved feature extracting model. I select phrase as the feature of webpages, instead of word. Taking advantage of this approach, we better realize the distinction between Normal pages and Garbage pages.The contents of this article are the key technologies of agricultural search engine in《Rural science and technology information service platform key technology research and application demonstration》, which is key scientific research project in Xinjiang Uygur Autonomous Region.

Keywords/Search Tags:

Agriculture Websites, Garbage webpages, Pattern Recognition, Feature extraction model

PDF Full Text Request

Related items

1	Image Recognition Technology Applications Inagricultural Insurance
2	Feature Extraction And Pattern Classification Of Electromyographic Signals
3	Research On Removing Duplicated WebPages Algorithm Of Search Engine Based On Content
4	Key Algorithm Research And Application For The Statistics Pattern Recognition System
5	Design Of Visual Robot Garbage Sorting System Based On Deep Learning
6	The Research Of Feature Extraction Methods And Their Applications
7	Lung Sound Signal Feature Extraction And Pattern Recognition
8	Research On Dynamic Image Feature Extraction In Precision Agriculture
9	Study On Performance Of Myoelectric Pattern RecognitionBased Movement Classification
10	Research On Feature Recognition And Model Establishment Based On Part File