| With the rapid development of intelligent and networked automobile industry in the world,Internet of Vehicles,as a key communication technology,is faced with the problem of shortage of spectrum resources.As the carrier of all kinds of wireless communication,limited spectrum resources will undoubtedly cause great waste if they are not fully utilized.The emergence of machine learning,especially reinforcement learning,provides an effective method for spectrum resource allocation of Internet of vehicles in highly dynamic and complex environment.In this paper,the main work is as follows:(1)Aiming at the spectrum configuration problem of single-antenna vehicle-mounted communication network with multiple Cellular Users(CU)and device-to-device(D2D)Users co-existing,this paper proposes a service aware spectrum access mechanism with delay constraint in complex dynamic environment.In order to achieve a dynamic balance between maximizing the total capacity of vehicle-to-vehicle(V2V)and vehicle-to-infrastructure(V2I)links and minimizing link interference,A priority spectrum allocation scheme based on lightweight deep reinforcement learning is proposed.The proposed algorithm is trained using Deep Q-learning Network(DQN)over a set of common bandwidths.Simulation results show that the proposed scheme can quickly and effectively allocate spectrum resources,improve channel transmission rate,realize service priority differentiation,and has good robustness against communication noise in high dynamic vehicle network environment.(2)In order to further adapt to the lightweight learning of network edge equipment,this paper introduces the binarization and LSTM network,which further compresses the data through the binarization and predicts the data through the LSTM network.Through further data processing,both can simplify the network operation complexity and improve the overall performance of the system.According to four different types of the proposed scheme,through the experiment,the output data to within DNN binarization after using distributed LSTM network data forecast input to DQN for reinforcement learning scheme can obtain the best performance of the process,the rational use of spectrum resources effectively,to communicate at the same time has good noise robustness. |