| With the rise of telecommuting,online meetings have become a popular topic.Compared with traditional on-demand and live scenarios,lower latency needs to be guaranteed in streaming meeting scenarios,and streaming media transmission in meeting scenarios usually uses Real-time Transport Protocol(RTP),so congestion control at the application layer to improve the network environment is a necessary tool.Among the traditional algorithms based on heuristic rules,Google Congestion Control(GCC)is a key algorithm in web instant messaging and has been widely used in the browser side.The experimental analysis of the GCC algorithm reveals poor performance on the traffic model in conference scenarios,the root cause of which is that the internal delay threshold cannot be dynamically adapted to various network scenarios.Based on this background,this paper designs a quality evaluation network and integrates it into the delaybased bandwidth estimator of the GCC algorithm,so that the delay sensitivity threshold can better adapt to the traffic model of the conference scenario and more accurately determine the network congestion,and contributes a network dataset for the video conference scenario.Experimental results show that the improved algorithm has less bandwidth fluctuation and significantly improved bandwidth utilization under strong network conditions compared to the GCC algorithm.For increasingly complex network environments,the sustainable learning capability of reinforcement learning can better adapt to the changing network environment.In this context,we design and implement an online reinforcement learning model for videoconferencing scenarios.First,we verify and analyze the difference in code rate between the network layer and the coding layer due to the burst traffic of video frames,and add this difference variable to the model state for better learning.The action space based on the varying scale is designed for the high flexibility of bit rate adjustment required by the high real-time nature of video conferencing.A reward factor for objective rating of video quality is added to the design of the reward function,and a mixed subjective and objective evaluation process is designed.In addition,the fluctuation level of the bandwidth estimate is added as a penalty factor to optimize the stability of the system.A robust hybrid congestion control algorithm is designed to address the loss of overall stability due to the aggressive behavior of the model in the early stage of exploration.A Kalman filter-based safety determiner inside the algorithm is used to detect the correctness of the reinforcement learning model,and switch to the rule-based algorithm and give a new penalty factor if the model is determined to be unstable,which improves the robustness of the system and allows the reinforcement learning model to continuously learning.The experimental results show that the overall quality of experience score is improved by 24.22%compared to the optimal results of other congestion control algorithms. |