Network-wide monitoring by measuring the end-to-end performance data is important for many important functions,including network anomaly detection,network troubleshooting,and network service level agreement(SLA)tracking.However,the network measurement usually has a very high measurement cost.For a network consisting of n nodes,the cost of one time network-wide monitoring will be O(n2).The cumulative cost of measurement over time can be catastrophic.Moreover,by actively injecting probes,the measurement itself can impact the network performance.There exists a dilemma:how to reduce the measurement cost and the impact to the network while obtaining network-wide performance monitoring data?Because of the temporal and spatial correlation of network data,it usually has low rank when it is modeled as matrix or tensor,which provides conditions for the use of low rank decomposition technology.Low rank decomposition is widely used in the field of data measurement,because it can complete the missing data through a small number of sampled data.According to the dimension of data,low rank decomposition can be divided into matrix completion and tensor completion.Based on low rank decomposition model,this thesis studies the measurement of endto-end performance data of a large-scale network,the main research results are as follows:(1)To reduce the measurement cost of end-to-end performance data of the whole network,a low cost network monitoring based on block matrix completion is proposed.In this thesis,a block sampling principle satisfying the completion condition is proposed with block as the minimum scheduling unit.Based on the principle of matrix block sampling,a matrix measurement method based on block is proposed.According to the matrix measurement method,a matrix completion method based on block is proposed to recover missing data.Only O(nrln(r))samples are required to ensure that a N*T matrix with rank r can be recovered precisely,where n=max{N,T).This sampling bound is better than the sampling bound of other existing sampling algorithms based on matrix completion.A large number of experiments are carried out on three real network data sets.Under the same inference accuracy,the measurement costs of other sampling algorithms on three data sets are at least 3 times,3 times and 2 times of the proposed algorithm.(2)Aiming at the problem of the weak data mining ability of two-dimensional matrix,a low cost network measurement method based on block tensor completion is proposed.In this thesis,the principle of tensor block sampling satisfying the completion condition is proposed.Based on the principle of tensor block sampling,a tensor measurement method based on block is proposed.After measurement scheduling,each large tensors have one more sample block than the sampling principle.According to the tensor measurement method,a tensor completion method based on block is proposed,which can recover missing data quickly without iteration.A n × n × n tensor with rank R can be recovered precisely with only O(n2R ln(R))measurement data,which significantly reducing the measurement cost.Comparing experiments on 3 data sets show that the measurement costs of other algorithms are at least 4.5 times,2.3 times and 7.5 times of the proposed method under the same inference accuracy.(3)As redundant measurement data will be generated during measurement in work(2),a network measurement method based on block-bar tensor completion is proposed to further reduce the measurement cost.In this thesis,the block-bar sampling principle which satisfy the completion is proposed.According to the block-bar sampling principle,a tensor measurement method based on block-bar is proposed.Measure three bars such that all three bars share the same measurement block.Based on the characteristic information shared by three bars,a tensor completion method is proposed.Through theoretical analysis,the sampling bound based on the block-bar measurement method is only O(nR2 ln2(R)),which is better than the sampling bound of other existing sampling algorithms based on tensor completion.Through comparison experiments on 3 real data sets,the measurement cost of the proposed method is lower than that of any other similar algorithm,and the sampling rate is only 4.8%,0.2%and 0.8%on three data sets.(4)To locate the large entry in the network quickly with partial observed data,a low cost and efficient method to find large value elements in network based on discrete tensor completion is proposed.Three novel techniques are further proposed to speed up the inference process:a discrete problem solving method,fast tensor recovery method based on bit operation,acceleration algorithm based on binary code partition,and online discrete tensor completion method.The convergence of the discrete problem solving method is proved theoretically,and a large number of experiments show that the proposed algorithm can quickly locate large entry with lower time and space cost.(5)To efficiently locate and obtain the real value of the large entry in network data,we propose a low cost and efficient method to fast retrieval of large entries with incomplete measurement data based on LSH.Then we transform the tensor recovery problem to a cosine similarity searching problem,and propose three algorithms:Based on LSH forest representation,Low overhead hash table construction method,and fast similarity query method based on shift technique.We make a breakthrough in theory and give the correlation between the scale of hash table and similarity searching to guide the construction of hash table.Experimental results on four real data sets show that the proposed method is at least 60 times faster than the traditional tensor completion method to recover and retrieve large valued elements. |