Font Size: a A A

The Design And Implementation Of WEB Topic Tag System Based On WEB Mining

Posted on:2018-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:S X RenFull Text:PDF
GTID:2348330518995693Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the information on the Internet has exploded. It has greatly enriched the user access to information channels, but makes that Web "nformation shows the characteristics of complex and redundant and brings certain difficulties for user to quickly and accurately locate their interested information. The arrival of the Web2.0 era makes that the tag has become a way of Internet information organization. At present, some researchers use text classification and auto-generating abstraction to index Web page, so as to improve efficiency and accuracy of user retrieval. However, the coarse-grained key information extraction and indexing for Web pages are still unable to meet the needs of the user to find information, which ignore the characteristics of the Web page for itself. In addition, different types of Web pages' use unified way,which makes the output accuracy is not high and lacks the function of analyzing the specific application scenario. Therefore, it is the most urgent for Web pages extracting topic tags to handle this problem-using the reasonable technology and the way of Web pages' information organization to help users get valuable information.This paper analyzes and studies Web pages by the way of natural language indexing and proposes a solution to build the Web topic tag, and completes the corresponding Web topic tag system. The paper's major research contents and achievements are listed as followed:1) Realizing the extraction of the Web topic tag. This paper uses Web text mining technology, combined with its own characteristics, to design the process of extracting the Web topic tag and realize the functions of data preparation,Web pages' information extraction, text preprocessing, Web topic tag building and so on. 2) Studying the building of the Web topic tag. In this paper, the key word extraction and named entity recognition technology are studied respectively. On this basis, this paper realizes the multi-feature fusion key word extraction, named entity recognition and TF-based key word extraction for Web pages with text information, Web pages with special inforrmation and Web pages without text information, which are applied to different types of Web pages in the process of building the Web topic tag. 3) Research on topic tag extraction solution for different classification Web pages. By analyzing and comparing the characteristics of the news, video and electronic business Web pages, this paper proposes their own appropriate Web topic tag extraction solutions. First, it needs to extract the text which represents the central idea of the Web page, and then according to its characteristics to take appropriate Web tag building technology to generate Web topic tag, and finally make the visual display.4) Putting forward the application scheme of the system. This paper uses the extraction of the Web topic tag to provide users with data analysis ability, and realizes the analysis of the URL in batches. After the analysis of batch URL, users can directly see the results of the data analysis, which can help users to explore the underlying value and meaning of the data and understand data objectively.Based on the above research contents and achievements, this paper realizes the Web topic tag system based on Web text mining. The system can mine and analyze Web pages so as to generate the topic tag with certain accuracy and realize the effective organization and management for Web pages' information, to help the user acquire the necessary knowledge effectively.
Keywords/Search Tags:Web pages, topic tag, Web text mining, key words extraction
PDF Full Text Request
Related items