Actor Versus Critic

- 1 min
Frame of pong game played by reinforcement learning agent.

:link: Project Link: DQN Agent

:book: Project Writeup: Actor Versus Critic

This project compares the performance of value function approximation and policy optimization methods for solving reinforcement learning problems. The Deep Q-Network (DQN) architecture was chosen as a representative of a value function approximation method, and Proximal Policy Optimization (PPO) as a policy optimization method. I implemented the DQN architecture and used OpenAI’s baseline implementations of DQN and PPO to learn evaluate the algorithms on Atari 2600 Pong from OpenAI Gym. The vanilla DQN architecture was not able to learn a successful policy, so an implementation of DQN with prioritized experience replay was used for comparison with PPO. In the experiments, the DQN player learns a successful policy much more quickly than the PPO player and also with less variance, thus DQN seems to be a better algorithm for the game of Pong.

Osmany Corteguera

Osmany Corteguera

Software Engineer

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora