DQN(Deep reinforcement learning) Basic

1 DQN’s architecture

【input】

84*84*4 image pixels. The input to the neural network consists of an 84*84*4 image produced by the preprocessing map .

【hidden layer】

The first hidden layer convolves 32 filters of 8*8 with stride 4 with the input image and applies a rectifier nonlinearity31,32. The second hidden layer convolves 64 filters of 4*4 with stride 2, again followed by a rectifier nonlinearity. This is followed by a third convolutional layer that convolves 64 filters of 3*3 withstride 1 followed by a rectifier. The final hidden layer is fully-connected and consists of 512 rectifier units.

【output】

The output layer is a fully-connected linear layer with a single output for each valid action. The number of valid actions varied between 4 and 18 on the games we considered.

【loss function】

The loss function(object function) of DQN is

in which gamma is the discount factor determining the agent’s horizon, theta are the parameters of the Q-network at iteration i and theta - are the network parameters used to compute the target at iteration i. The target network parameters theta - are only updated with the Q-network parameters theta every C steps and are held fixed between individual updates.

2 Algorithm

3 Conclusion

DQN use DNN to store policy, pi, which is a sequence of mapping state to action.

3.2 为什么使用DNN

解决输入的高维问题

原文链接：https://blog.csdn.net/zkq_1986/article/details/77366547