【强化学习】DQN(Deep reinforcement learning) Basic

DQN(Deep reinforcement learning) Basic


1 DQNs architecture

 

input

84*84*4 image pixels. The input to the neural network consists of an 84*84*4 image produced by the preprocessing map .  

hidden layer

The first hidden layer convolves 32 filters of 8*8 with stride 4 with the input image and applies a rectifier nonlinearity31,32. The second hidden layer convolves 64 filters of 4*4 with stride 2, again followed by a rectifier nonlinearity. This is followed by a third convolutional layer that convolves 64 filters of 3*3 withstride 1 followed by a rectifier. The final hidden layer is fully-connected and consists of 512 rectifier units.

output

The output layer is a fully-connected linear layer with a single output for each valid action. The number of valid actions varied between 4 and 18 on the games we considered.

 

loss function

The loss function(object function) of DQN is

 

in which gamma is the discount factor determining the agent’s horizon, theta are the parameters of the Q-network at iteration i and theta - are the network parameters used to compute the target at iteration i. The target network parameters theta - are only updated with the Q-network parameters theta every C steps and are held fixed between individual updates.

 

 

2 Algorithm

 

3 Conclusion

DQN use DNN to store policy, pi, which is a sequence of mapping state to action.

 

3.2 为什么使用DNN

解决输入的高维问题

 


版权声明:本文为zkq_1986原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。