| With the increasing of car ownership and the development of vehicular intelligent networking,the growing demand of computation-intensive applications such as virtual/augment reality,image processing,face detection and recognition is emerging to satisfy the infotainment experience of vehicular users(VUs).These computation-intensive applications bring in the requirement of explosive computation tasks for VUs.Vehicular edge computing(VEC)is envisioned as a promising approach to satisfy the requirement of explosive computation tasks for VUs.In the VEC system,the VUs or Internet of things devices(Io TDs)can allocate power to offload the computing task to the base station(BS)for assistance processing,and then the BS sends the processed useful information to the corresponding equipments or devices,so as to alleviate its own computing burden.In order to improve communication efficiency,multi-antenna BS is adopted,furthermore,multi-input multi-output(MIMO)and non-orthogonal multiple access(NOMA)technology is adopted.However,in the MIMO-NOMA VEC system,energy of VUs and equipments is limited,and task should be processed in time,so it is necessary to study the power allocation scheme to reduce the time cost and energy consumption in the system.While MIMO-NOMA VEC environment is complex,with the characteristics of uncertain channel state,random task arrival and high-speed vehicle movement.The general optimization method can not effectively learn the environment,and is not suitable for the research of power allocation in MIMO-NOMA VEC environment.Deep reinforcement learning(DRL)inherits the strong understanding ability of deep learning and the superior decision-making ability of reinforcement learning.Therefore,it is a very potential solution.This paper focuses on MIMONOMA VEC system and studies the power allocation scheme based on DRL.The research contents of this paper are as follows:(1)For the VEC scenario where single VU communicates with multi-antenna BS,an offloading scheme is designed based on deep reinforcement learning(DRL).This section considers the uncertainty of multi-antenna channel and stochastic task arrival.The mobility model,communication model and computation model are constructed firstly.Then the state space,action and the reward function consisting power and latency are defined.After that,the optimal power allocation policy is obtained by DRL method.In the end,simulation is conducted and the results show the superiority of optimal policy.(2)For the VEC scenario that multiple VUs communicate with BS through MIMO-NOMA channel,a distributed offloading scheme is designed based on DRL.This section considers the uncertainty of channels caused by mobility of VUs,Doppler effect,and the interference between various VUs’ channels,along with the stochastic tasks arrival a distributed offloading scheme is designed in which each VU can make offloading decisions based on local observation.The VEC system model including mobility of VUs,network and computation is built.Then the DRL framework consisting state,action and reward function is defined.Deep deterministic policy gradient(DDPG)is adopted to obtain optimal policy to maximize long-term discounted reward.In the end,experimental results show the performance of optimal policy is superior than greedy policy.(3)In the VEC system,a power allocation scheme is designed for the scenario that Io TDs sample environment information and transmit information to BS through MIMO-NOMA channels.This section adopts age of information(Ao I)to measure the freshness of information.Firstly,a network model is constructed considering successive interference cancellation(SIC)to decode signal.Then the Ao I model and energy consumption model are built up.After that,the optimal problem is formulated,i.e.,minimize Ao I and energy through optimizing sample decisions and transmit power.Next,a theorem is derived to prove that Ao I and energy can be minimized only by optimizing transmit power,which make the optimal problem is suitable to solve by DDPG.In the end,experimental simulations are carried out,and the test results show the effectiveness of the proposed method. |