| Q-learning is a fundamental problem in machine learning,which is traditionally based on Bellman value iteration.Recently,a new variational method of Q-learning has attracted growing attentions from the community,which casts optimal Q-functions as solutions of a constrained optimization problem.Despite the reported empirical gains,a number of different variational formulations have been used,which differ in the optimization objective,optimization constraints,and optimization direction.This paper provides a systematic investigation on the implications of these formulation choices to tractability and optimality of the resulted variational solutions.A generalization study is conducted on variational methods,with artificially set parameters being generalized.The general conditions required for strong duality and convex properties are determined in different directions,and examples that simultaneously satisfy strong duality and optimality are explored.And we also present variational method algorithms for strong duality properties.General conditions for strong duality,convexity,and optimality are identified,which offer guidance on how to properly setup the variational formulation in this new approach of Q-learning. |