Font Size: a A A

The Research On Key Technologies For Web Information Personalization Collection And Management

Posted on:2012-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:D Q FanFull Text:PDF
GTID:2218330371953935Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
According to National Information Industrial Department promotes the software industry and social service informationization specific request, as well as the government, the enterprises and institutions have the demand to have individualized WEB information collection and management.So the Web information service had became one of present most popular professions.However facing the present General Search Engine has a lot of flaws, which lead to the people no longer to satisfy in acts as tenderer of the key word. Therefore how does the user participate in the WEB information personalization collection and management, and provides intellectualized, individualized and semantization information service have became people's urgent demand.In order to solve this people's urgent demand, this article for the relevant technology of WEB information personalization collection and management to reseach. The main research work including four aspects as the following:(1)Analysis the structure of information source, we present one kind of WEB information gathering method which based on three search strategy: Network Spider, Yuan Search Engine, Deep Web; Then analysis structure of homepage, we according to content of the personalization custom-made to present one kind of method which based on the dual purification homepage subject content extraction method, thus achieves this goal which is WEB information personalization collection.(2) Analysis the feature of HTML, we present one kind of duplicated web pages removal algorithm through study web pages content and classical logical reasoning. This method extracts the phrase of user preferences which in elements of web pages extraction web pages content. The web pages content and classical logical reasoning to speculate on their similarity to judge the web pages homepage the heavy multiplicity. The experimental result indicates that, this method can complete in view of the web pages Chinese content duplicated web pages, and obtains the high recall and the accuracy ratio.(3) Aims at the network commentary which the enterprises and institutions pays attention. We present a new sentiment polarity recognition model based on linguistic structure of emotion states-fixed sentiment terms model. The proposed method uses three types of specific collocation pattern to construct the recognition algorithm based on fixed sentiment terms. These feature term sets are gradually updated by relevance feedbacks from the users which based on incremental tf-idf model. Comparison is done between the traditional method and fixed sentiment terms model. All tests showed the proposed method gets a higher efficiency and accuracy rate of the emotion classifier.(4) Analysis search behavior of user, we use ARIMA time series analysis method to evaluate predictive scenarios using search tool transactional logs that are a particular user's behavior records, and then we use SVM classifier based on RBF nucleus to improve the predictive performance. First we regard a particular user's behavior records as the order period of time in the random sequence. To mark with the feature selection and the document expression extract user's behavior character. Then we use ARIMA time series analysis method to do a one-period-ahead prediction on the log data. Finally we use SVM classifier based on RBF nucleus to eliminate noise. The experiment tests indicated that, the new method can correct the forecast direction of use behavior. The results show that ARIMA-SVM model is more propitious to improve the predictive performance than ARIMA model.Finally, we designe and develop a network personalization collection and management system. The algorithm and the model of the article in carry on the experiment and the analysis. The experimental result indicates that, this system obtains the high recall ratio. The WEB information personalization collection's accuracy ratio, managetion and analyzes all to obtain the good effect.
Keywords/Search Tags:All-in-One Search Engine, Deep Web, Classical Logical Reasoning, Linguistic Structure, ARIMA time series analysis method, SVM
PDF Full Text Request
Related items