Font Size: a A A

Crawling Data Of Electronic Business Platform Based On Scrapy And Construction Of Automatic Question-Answering System

Posted on:2017-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:D H ShuFull Text:PDF
GTID:2348330488485691Subject:Software engineering
Abstract/Summary:
With the advent of the era of big data and the rapid growth of Internet information, the traditional search engine technology which relys on key words to search for information has become increasingly unable to meet the needs of users on retrieving information quickly and accurately. As an advanced form of information retrieval, the automatic question answering system has become the hot spot and focus of research in recent years. It is of great significance to study and construct Chinese automatic question answering system, which can satisfy users by asking questions in natural language directly and obtaining the answers quickly and accurately.In this paper, I make use of the web crawler technology to climb the commodity data information from the electronic business platform, and build the Chinese automatic question answering system of electronic business platform to facilitate users’ accurate access to the relevant product information. The main works are as follows:Firstly, this paper selects Scrapy as the crawl tool, which is a open source crawler frame and it’s built by Python. Then this paper studies the construction and use of Scrapy, and selects the Netease Koala overseas shopping platform as the research object of this paper. After analyzing the structure of Koala commodity data, this paper writes the web crawler program based on Scrapy framework and manages to climb all the goods data of Koala.Secondly, this paper builds a web project and publishes this project on an open source middleware called Tomcat, so that users only need to input the corresponding page links in the browser, and they can see the dynamic relationship map of Koalas’ commodity data and see all kinds of goods information intuitively. In this way, we can realize the visualization of commodity display.Thirdly, this paper builds the dictionary and speech database of Koala’s commodity knowledge. It realizes the segmentation algorithm which is the combination of the forward maximum matching algorithm and reverse maximum matching algorithm. What’s more, it defines the distinctive keywords extraction rules, and realizes the similarity computation of the edit distance algorithm.In the end, this paper designs and implements a Chinese automatic question answering system tool called KOALAASK which is based on Koala’s commodity knowledge base.Finally, a series of functional tests were carried out on the KOALAASK system. The experimental results show that the system has the characteristics of high performance and rapid response. By the method of statistical analysis, this paper puts 799 goods information which are in different fields into this system and asks questions about them. Next, we get answers from it and calculate the corresponding accuracy. It proves that the system performs quite high accuracy in this specific product knowledge area with good practicability.
Keywords/Search Tags:Web Crawler, Scrapy, Visualization Technique, D3, Automatic Question-Answering System In Chinese
Related items