Font Size: a A A

The Study On Text Categorization PSE

Posted on:2009-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X B ZhangFull Text:PDF
GTID:1118360245999289Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
PSE is very popular nowadays in the field of computer application, and is widely used. In this dissertation, grid-enabled PSE-TC is proposed, which can provide MPSE environment to handle numerical and non-numerical data simultaneously, it provides many technology and methods to solve problems of some kind with uniform interface. TC (Text Categorization) is a kernel technology of data mining, but big interface difference exists in many existing TC methods. To improve efficiency, a PSE-TC platform is brought forward in this dissertation , which can put these categorization methods in a uniform environment.The existing technology can not fully meet the needs of resource demands in the procedure of TC, while the grid technology can accumulate resources, hence, PSE-TC which combine grid technology and TC is proposed to meet the needs of large scale data processing.In this dissertation, a grid architecture consists of four layers is proposed, which can provide services in a uniform interface. we put a middleware layer-Agent among the three layer grid architecture in PSE-TC,(1)the low lever single grid service is encapsulated in this layer, which can shield for the user the difference between different grid service providers;(2)The calling sequence of low level service is scheduled by work flow tools in Agent, the service is integrated and encapsulated, many low level service calls can be replaced by a single type of service call, which can simplify the calling procedure and improve efficiency. (3) The services in Agent is provided using a uniform interface, which makes the service calling components reuseable.(4) The work flow tools in Agent can enable users to schedule,define their tasks, create user applications dynamically, so this system is highly reconstructable.The virtue of resource sharing also means security degradation. Users' privacy can not be fully guaranteed in the procedure of transportation and computation with security measures like authentication and authorization. Therefore, by analysing the existing distributed SVM TC, we propose and implement data privacy preserving based on homomorphic encryption. A distributed SVM TC-GSVC is implemented, which has the following advantages: (1) During the process of training TC, the GSVC service located in distributed servers can establish local TC models without exchanging key vectors and training data, so that the privacy preserving of original training data is achieved. (2) In the process of computing category of text, based on the theory of homomorphic encryption, the author suggests that the words in original text be arranged in disorder, and be divided into many parts and computed in different GSVC to preserve data privacy of original users. (3) To preserve data privacy, we add interferential vectors in distributed GSVC transfer, to prevent distributed GSVC from deducing the original data allocation using vectors from other GSVCs. The experiment shows that high categorization precision can be secured while the original data privacy is well preserved.In grid applications, to overcome the disadvantages of slow grid service invocation, we make improvements over the original remote service invocation technology by adding cache in Agent to speed remote service invocation. The experiment shows that this method is highly applicable. In the end, based on Portal technology, we provide web interface for the PSE-TC such that users can use it more conveniently. The grid functions are integrated in this interface transparently, which provides users a simple and direct method for service invocation and startup methods for multi applications. The visualization of the results is implemented using Java 3D technology. All in all, the PSE-TC integrates many latest innovations in computer science, breaks through the resource limitation of existing PSEs, provides a safe and high efficiency platform, which can benefit the research of MPSE and provide a new way for MPSE integration.
Keywords/Search Tags:PSE-TC, Grid, Web Services, Java 3D, Privacy Preserving, Text Categorization
PDF Full Text Request
Related items