In recent years,with the continuous popularization and development of new energy vehicles,coupled with the original number of gasoline cars,the total number of cars is rising,which brings great challenges to traffic and personal safety.In order to solve such security problems,the Internet of vehicles came into being.With the continuous improvement of the technology of the Internet of vehicles,the network of cellular vehicles(Cellular Vehicle-to-Everything,C-V2X)has become the mainstream.Vehicle-to-vehicle(V2V)communication is the most important link in the Internet of vehicles,and V2 V is mainly based on device-to-device(D2D)technology for direct communication.The number of V2 V communication is huge.If the frequency band is set separately,spectrum resources will be limited.In order to save the existing spectrum resources,this paper considers that V2 V communication multiplexes the uplink in the cellular network for communication.Cellular users are equivalent to vehicle-to-infrastructure(V2I)in an Internet of vehicles environment.Therefore,resource allocation is required to enable V2 V links to select the subbands of multiplexed V2 I links and select appropriate transmitting power to fulfill the requirements of V2 I communication and V2 V communication.Due to the mobility of vehicles,centralized resource allocation is difficult to ensure communication requirements.In this paper,distributed resource allocation is designed based on deep reinforcement learning algorithm.Each vehicle is an Agent,and the subband and transmission power with maximum reward are selected according to the environment.Assuming that all V2 I links are pre-assigned,the V2 V communication scenario takes into account urban blocks and highways.In the urban block scenario,firstly,the resource allocation problem is modeled,grouped in combination with the driving direction of vehicles,and divided into four independent resource pools,and then the V2 V communication is allocated by using the Deep Q-Network(DQN)algorithm.For the empirical playback of dqn algorithm,a weighted playback mechanism is set,a four layer fully connected neural network structure is designed,and Adam network optimizer is used.After dqn algorithm training,each V2 V vehicle is an independent agent.Through interaction with the environment,the intelligent person can choose the action to maximize the V2 I rate and meet the V2 V communication requirements.Simulate and analyze the results of different algorithms under different total number of vehicles and different speeds.Dqn algorithm compares Q-learning algorithm with random allocation algorithm.The results show that even when the total number of vehicles is the largest and the vehicle speed is the highest,using dqn algorithm can better ensure the rate of V2 I link and V2 V link,and reduce the interference caused by V2 V link multiplexing V2 I link.In the expressway scenario,due to the fast moving speed of vehicles on the high speed and the changing environment at any time,V2 V communication will be switched frequently,which puts forward higher requirements for V2 V communication.Similarly,V2 I links are pre allocated,grouped based on the vehicle driving direction and divided into two independent resource pools.However,the action of V2 V communication should be considered as continuous action,so this paper allocates resources for V2 V communication based on Deep Deterministic Policy Gradient(DDPG)algorithm according to actor critical architecture.The structures of actor and critical networks are designed respectively,the parameter settings are given,and Gaussian noise is added to the strategy network.In the simulation,the convergence process of ddpg algorithm and dqn algorithm is compared.The results under different vehicle speeds and total number of vehicles are compared.With the increase of vehicle speed,vehicle communication data packets will be lost.The delivery rate of successful transmission of V2 V communication under different payloads is compared.The results show that even when the vehicle speed reaches 120 km / h,ddpg algorithm can still maximize the total rate of V2 I link,ensure the transmission reliability requirements of V2 V link,and the capacity of transmission load is much higher than other algorithms. |