Actor Versus Critic
- 1 minProject Link: DQN Agent
Project Writeup: Actor Versus Critic
This project compares the performance of value function approximation and policy optimization methods for solving reinforcement learning problems. The Deep Q-Network (DQN) architecture was chosen as a representative of a value function approximation method, and Proximal Policy Optimization (PPO) as a policy optimization method. I implemented the DQN architecture and used OpenAI’s baseline implementations of DQN and PPO to learn evaluate the algorithms on Atari 2600 Pong from OpenAI Gym. The vanilla DQN architecture was not able to learn a successful policy, so an implementation of DQN with prioritized experience replay was used for comparison with PPO. In the experiments, the DQN player learns a successful policy much more quickly than the PPO player and also with less variance, thus DQN seems to be a better algorithm for the game of Pong.