Font Size: a A A

Excavation And Analysis Of Urban Complaints Text Based On

Posted on:2016-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:H SunFull Text:PDF
GTID:2278330503960867Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rise of Micro-blog governance, a growing number of government departments open their official Micro-blogs to interact with public. For example, ” Beijing 12345” was create as political Micro-blog, which greatly facilitating the masses to reflect the non-emergency rescue service class demands. Due to the popularity and civil popularity of Micro-blog, and convenient operation for users, a huge number of complaints from Micro-blog were received every day. How to use computer technology to extract important complaints of citizens from Micro-blog automatically plays an important role in the improvement of people’s livelihood and urban development. Because extracting citizens’ complaints about municipal services using computer technology can not only reduce the manual work of office clerks but also detect the hot issues and hot spots immediately in order to submit these complaints to relevant departments to deal with. Therefore, the research on the mining and analysis based on city complaints of Micro-blog is of great value and practical significance.Geographical entity in city complaints of micro-blog is of great significance to complaints. And the information is invalid without the specific geographical location. However, there is a difference in the expression of the network, which leads to non-standard writing of complaint information, especially the complaint information on the Micro-blog has expression characteristics of spoken language, new words, typos and etc. Geographical locations from complaint information are generally specific and mixed with content of complaints, which has a big difficulty to extract geographical locations and automatically put the location to classified area.In this paper, we mainly focus on research on the mining and analysis in the field of the city complaint information of Micro-blog based on the data of Beijing 12345 political Micro-blog complaints. By adopting the technique of information extraction, we can extract event automatically by converting the unstructured data into structured data. And we mainly study recognition of geographical entity, integrity of geographical entity and the application of the complaint analysis platform. The purpose of helping the city management department staff more effective, more convenient analysis of the city’s complaints. The main content of research includes those:(1) This thesis designs and implements a web crawler that can automatic capture the city complaints of Micro-blog. Through the analysis of the status of the existing Sina Micro-blog crawler, from the performance and operability of the two points of view, the design based on Sina Micro-blog page analysis of the Micro-blog crawler. The crawler can collect the city complaints of Micro-blog through theme words and is not limited by the use of Sina Micro-blog API, so as to achieve the automatic crawl.(2) The method of recognition of geographical entity in the text of micro blog City complaints is put forward. Firstly, the method utilizes the features that Thesaurus Sogou related to the geographical position of Beijing associated with part of speech, tail word, tail phrase to mark features, using the conditional random field(CRF) model to identify the geographical entities. Secondly, according to the characteristics of micro-blog and geographical entity, recognized data by CRF is second marked. Thirdly, rule bank is utilized to supplementing the recognition result and correcting geographical entities, consequently, the recognition of geographical entities are implemented.(3) This paper presents the method of automatic completion defect geographical entities based on Question Answering Community-Baidu Zhidao. First of all, the defect geographical entities are transformed into the problem of in what area, which is retrieved by Baidu Zhidao. Secondly, according to the result of the retrieval, the features are extracted. The scores of the geographical entity belongs to each area are calculated, and the feature vectors of the areas of the defect geographical entity are constructed. At last, rules are utilized to make defect geographical entities integrity to achieve the full representation of the geographical entities.(4) This thesis designs and implements an analysis platform of city complains of Micro-blog. This platform mainly has three functional modules that capture the city complaints of Micro-blog based on theme words, recognition of geographical entity, defect geographical entities integrity. The GUI of client program which is written in Java consists of data configuration interface and the presentation interface about data processing. The main functions of this client program are as follows: collecting complaints on municipal service from Micro-blog automatically, filtering noisy data, recognizing entity of geographical entity and making defect geographical entities integrity by Question Answering Community-BaiduZhidao.
Keywords/Search Tags:City complaints of Micro-blog, Micro-blog crawler, Geographical entity, Geographical entity integrity, Question Answering Community(QAC), The analysis platform of city complains of Micro-blog
PDF Full Text Request
Related items