Font Size: a A A

Design And Implementation Of The Open Domain Party-construction Information Automatically Obtain And Intelligent Indexing System

Posted on:2020-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhouFull Text:PDF
GTID:2428330596971765Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The information of party construction has increased rapidly when the concept of“Internet and Party Construction” has been proposed.The vertical retrieval system for party construction can provide users with professional,accurate and less redundant content.To construct the vertical retrieval system of party construction,the texts of party construction are required as basic data.At the same time,in order to improve the retrieval quality of the vertical retrieval system for party construction,it is necessary to extract the key information contained in the text of party construction and display them to users as the retrieval results for making the return results more intuitively.If the potential entity relationship in the user retrieval content can be mined and combined with the entity relationship marked by the data in the party construction data,the correlation between the user query content and the retrieval result in the vertical retrieval of the party construction can be improved.To address the problem of party-construction text data collection,this thesis design the open domain party information automatically obtain system,a algorithm based on semantic relevancy and link structure is studied,and it is used to predict unvisited link's correlation for the topic of party construction,and based on this algorithm the party-construction semantic relevancy focused crawler has been implemented and used for party-construction information collection of open Internet domain.According to the selected keywords,the page description information,HowNet and the word embedding trained by Wikipedia Chinese corpus,page semantic relevance combined with the URL structure information is computed to predict the URL's relevancy to party construction.The data collected in the system is used as the basic data for building the vertical search engine in party building.To improve the quality of party construction vertical retrieval system to retrieve data and enhance the correlation of retrieval results and address the problem of data batch update,this thesis design a kind of system architecture used to query and update allhistory data and real-time incremental data,and the intelligent index system based on this architecture is implemented.This system extracts the relationship between keywords and entities in the collected corpus of party construction as the result of indexing and applies them to the display of the retrieval results in the field of party construction.The integrity and timeliness of the labeled results are considered by combining the full data processing with the flow data processing.The open domain party-construction information automatically obtain and intelligent indexing system can solve the problem of party construction information acquisition and labeling,and has been verified to some extent in performance and accuracy.
Keywords/Search Tags:Information retrieval, Semantic relevancy, Focused crawler, Index data
PDF Full Text Request
Related items