Font Size: a A A

The Research On Internet Traffic Identification Methods With Scale Adaptability

Posted on:2009-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z X ChenFull Text:PDF
GTID:1118360245494533Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Internet traffic identification is one of the hot research topics in Internet traffic measurement area. The P2P (Peer-to-Peer) traffic dominated Internet traffic identification and management have became general focused problems in the academia, network engineering area and various important departments of the nation. Classifying Internet traffic with high efficiency and accuracy is important for analyzing the network development, providing quality of service, dynamic access controlling, legally management and abnormal detection in reality.Popular port and application payload signature based Internet traffic identification methods cannot cope with the port disguised, port randomly configured and application data encrypted anti-monitoring technology trends. Flow and activity based dynamic characters methods are becoming hot research methods recently, but the identification accuracy rate, the ability of real-time identification, self-learning and new application discovering are all still research challenges.Facing the challenge of identifying Internet traffic in different networks, with different macroscopic level and granularity, the target of this thesis is to research and develop effective methods and organizing ways for Internet transmitted content identification and analyzing behavior from different levels or locations. The new framework with intelligent feature, online real-time processing ability, mixed classification and distributed processing methods in different scale are developed at the same time.Single point, finite scale and Internet scale faced Internet traffic identification and behavior analyzing methods were researched in this thesis. Focusing on the proposed key problems the main contributions and innovations of this thesis are as follows:(1) A novel Data Gravitation and Further Division Recognition Space (DGFDRS-SSL) theory based on Semi-supervised learning model was proposed. Based on this model, Internet traffic identification method was investigated on statistical feature of flow information.In this model, the data points in sample space are seen as mass points and the Euclidean distance between two different points is defined as sample distance. Borrowing the law of gravity, Data Gravitation (DG) theory in data sample space is defined. The data gravitation theory was implemented on sample clustering. At the same time, a novel Further Division Recognition Space (FDRS) theory was proposed. In this theory, the class recognition space was divided in different dimensionality and fineness. The divided spaces were colored with different color to distinguish each other. Then a recognition space which with further division feature was created. After clustering, the clustered result can be mapped to the divided recognition space, and then the novel Semi-supervised learning model can be obtained after coloring the recognition space by the class signed mapping result according to the color rule. The novel model was applied to flow information statistical feature based Internet traffic identification at single point. It can overcome the defects that need much class signed train samples in traditional supervised machine learning method. It also showed high performance and good ability in new application detection.(2) The Internet application community model was defined in this thesis. And then finite network scale based Internet application detection and Internet traffic identification methods were proposed.Based on application feature, hosts connection graphs of different applications behaviors were defined. United with the conception of social community and the assembling, sharing, connecting feature among hosts in finite network scale, a behavior feature graph based Internet application community generation and detection method was investigated. Hosts behavior information on different direction were collected from different level and location, and then were applied to finite scale behavior analyzing. The macroscopic community feature was applied to help Internet traffic identification, port disguise and bestraddle protocols transmitting behavior detection in microcosmic level. This model is suitable for behavior analyzing of application with distributed and cooperation feature in finite scale.(3) An Internet scale based self organization Internet traffic identification union model was proposed.Due to the excellent characters of DHT(Distributed Hash Table), virtual storage and Hash in same prefix technology were investigated to store the DHT index resource native place. The Chord overlay route algorithm is improved to design a DHT based self organization Internet traffic identification and behavior analyzing federation model without management center. The node with Internet traffic identification and behavior analyzing function can join the union stand by the designed agreement. It can cooperate with other member on Internet traffic classification and behavior analyzing. It also can share traffic feature, data samples and identification experience among union members. It overcome the defect that traditional methods can only support fix equipments and fix protocol when cooperation between appointed research organizations, which with poor expansibility and openness for further development.(4) An innovative method with the ability of collecting the class signed truly traffic samples was proposed and designed. United it with the Network Processor platform an Internet traffic samples collection system was designed.A client with static filterable and monitor mechanism was designed and applied to monitor the local network program running status by Hook. The application can be identified by the Internet traffic generator of network program or system process. The related class tag was embedded in the TOS bits of generated packet to sign the class of Internet traffic. At the same time, it with the ability of traffic class verifying. The TOS bits signed packets can be captured by the Network Processor locate at the network gateway which with high performance of hardware matching ability. The collected samples can be published for using after pretreatment. It is utility and effective investigation.(5)A novel online machine learning based Internet traffic identification method was proposed. Uniting with a Network Processor, it was designed with the ability of 1000Mbps finite line speed processing ability.The time serial relativity feature of Internet traffic was investigated. The pre-order Internet traffic information was collected to identify traffic between two hosts and then the identified result can be applied as the guidance of identifying and analyzing the surf-order Internet traffic. The guidance will be adjusted and corrected when the features of collected samples changed. Due to the high speed parallel process ability of Network Processor IXP-2400 and the intelligent feature of soft computing theory, a Network Processor and server based hardware and software mixed Internet traffic identification platform was designed with finite 1000Mbps line speed process ability. This research makes the machine learning based Internet traffic identification method with the ability of online real-time identification in some degree.
Keywords/Search Tags:Internet, Traffic identification, Semi-supervised learning, Community, Network Processor
PDF Full Text Request
Related items