https://drive.google.com/file/d/0B_wzP_JlVFcKS2dDWUZqTTZGalU/view [1312.5602] Playing Atari with Deep Reinforcement Learning(and its nature version) http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/prioritized-replay.pdf [1511.06581] Dueling Network Architectures for Deep Reinforcement Learning http://people.inf.elte.hu/lorincz/Files/RL_2006/SuttonBook.pdf http://www0.cs.ucl.ac.uk/staff/d.silver/web/Publications_files/thesis.pdf Policy Gradient Methods for Reinforcement Learning with Function Approximation https://webdocs.cs.ualberta.ca/~sutton/papers/SMSM-NIPS99.pdf 1. Policy-based approach is better than value based: policy function is smooth, while using value function to pick policy is not continuous.