{"id":2499,"date":"2019-01-05T14:12:21","date_gmt":"2019-01-05T14:12:21","guid":{"rendered":"http:\/\/intelligentonlinetools.com\/blog\/?page_id=2499"},"modified":"2019-01-07T15:27:10","modified_gmt":"2019-01-07T15:27:10","slug":"reinforcement-learning-dqn","status":"publish","type":"page","link":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/","title":{"rendered":"Reinforcement Learning DQN"},"content":{"rendered":"<p>This is the python source code of RL_brainDQN.py for post <a href=\"http:\/\/intelligentonlinetools.com\/blog\/2019\/01\/02\/reinforcement-learning-python-dqn-application-resource-allocation\/\" target=\"_blank\">Reinforcement Learning Python DQN Application for Resource Allocation<\/a> <\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n&quot;&quot;&quot;\r\nThis part of code is the DQN brain, which is a brain of the agent.\r\nAll decisions are made in here.\r\nUsing Tensorflow to build the neural network.\r\n(MIT license) \r\n&quot;&quot;&quot;\r\n\r\nimport numpy as np\r\nimport tensorflow as tf\r\n\r\nnp.random.seed(1)\r\ntf.set_random_seed(1)\r\n\r\n\r\n# Deep Q Network off-policy\r\nclass DeepQNetwork:\r\n    def __init__(\r\n            self,\r\n            n_actions,\r\n            n_features,\r\n            learning_rate=0.01,              \r\n            reward_decay=0.9,\r\n            e_greedy=0.9,                     \r\n            replace_target_iter=300,          \r\n            memory_size=500,                  \r\n            batch_size=32,                    \r\n            e_greedy_increment=None,\r\n            output_graph=True,     \r\n    ):\r\n        self.n_actions = n_actions\r\n        self.n_features = n_features\r\n        self.lr = learning_rate\r\n        self.gamma = reward_decay\r\n        self.epsilon_max = e_greedy\r\n        self.replace_target_iter = replace_target_iter\r\n        self.memory_size = memory_size\r\n        self.batch_size = batch_size\r\n        self.epsilon_increment = e_greedy_increment\r\n        self.epsilon = 0 if e_greedy_increment is not None else self.epsilon_max\r\n\r\n        # total learning step\r\n        self.learn_step_counter = 0\r\n\r\n        # initialize zero memory [s, a, r, s_]\r\n        self.memory = np.zeros((self.memory_size, n_features * 2 + 2))\r\n\r\n        # consist of [target_net, evaluate_net]\r\n        self._build_net()\r\n        t_params = tf.get_collection('target_net_params')\r\n        e_params = tf.get_collection('eval_net_params')\r\n        self.replace_target_op = [tf.assign(t, e) for t, e in zip(t_params, e_params)]\r\n\r\n        self.sess = tf.Session()\r\n\r\n        if output_graph:\r\n            # $ tensorboard --logdir=logs\r\n            # tf.train.SummaryWriter soon be deprecated, use following\r\n            tf.summary.FileWriter(&quot;logs\/&quot;, self.sess.graph)\r\n\r\n        self.sess.run(tf.global_variables_initializer())\r\n        self.cost_his = []\r\n\r\n    def _build_net(self):\r\n        # ------------------ build evaluate_net ------------------\r\n        self.s = tf.placeholder(tf.float32, [None, self.n_features], name='s')  # input\r\n        self.q_target = tf.placeholder(tf.float32, [None, self.n_actions], name='Q_target')  # for calculating loss\r\n        with tf.variable_scope('eval_net'):\r\n            # c_names(collections_names) are the collections to store variables\r\n            c_names, n_l1, w_initializer, b_initializer = \\\r\n                ['eval_net_params', tf.GraphKeys.GLOBAL_VARIABLES], 10, \\\r\n                tf.random_normal_initializer(0., 0.3), tf.constant_initializer(0.1)  # config of layers\r\n\r\n            # first layer. collections is used later when assign to target net\r\n            with tf.variable_scope('l1'):\r\n                w1 = tf.get_variable('w1', [self.n_features, n_l1], initializer=w_initializer, collections=c_names)\r\n                b1 = tf.get_variable('b1', [1, n_l1], initializer=b_initializer, collections=c_names)\r\n                l1 = tf.nn.relu(tf.matmul(self.s, w1) + b1)\r\n\r\n            # second layer. collections is used later when assign to target net\r\n            with tf.variable_scope('l2'):\r\n                w2 = tf.get_variable('w2', [n_l1, self.n_actions], initializer=w_initializer, collections=c_names)\r\n                b2 = tf.get_variable('b2', [1, self.n_actions], initializer=b_initializer, collections=c_names)\r\n                self.q_eval = tf.matmul(l1, w2) + b2\r\n\r\n        with tf.variable_scope('loss'):\r\n            self.loss = tf.reduce_mean(tf.squared_difference(self.q_target, self.q_eval))\r\n        with tf.variable_scope('train'):\r\n            self._train_op = tf.train.RMSPropOptimizer(self.lr).minimize(self.loss)\r\n\r\n        # ------------------ build target_net ------------------\r\n        self.s_ = tf.placeholder(tf.float32, [None, self.n_features], name='s_')    # input\r\n        with tf.variable_scope('target_net'):\r\n            # c_names(collections_names) are the collections to store variables\r\n            c_names = ['target_net_params', tf.GraphKeys.GLOBAL_VARIABLES]\r\n\r\n            # first layer. collections is used later when assign to target net\r\n            with tf.variable_scope('l1'):\r\n                w1 = tf.get_variable('w1', [self.n_features, n_l1], initializer=w_initializer, collections=c_names)\r\n                b1 = tf.get_variable('b1', [1, n_l1], initializer=b_initializer, collections=c_names)\r\n                l1 = tf.nn.relu(tf.matmul(self.s_, w1) + b1)\r\n\r\n            # second layer. collections is used later when assign to target net\r\n            with tf.variable_scope('l2'):\r\n                w2 = tf.get_variable('w2', [n_l1, self.n_actions], initializer=w_initializer, collections=c_names)\r\n                b2 = tf.get_variable('b2', [1, self.n_actions], initializer=b_initializer, collections=c_names)\r\n                self.q_next = tf.matmul(l1, w2) + b2\r\n\r\n    def store_transition(self, s, a, r, s_):\r\n        if not hasattr(self, 'memory_counter'):\r\n            self.memory_counter = 0\r\n\r\n        transition = np.hstack((s, [a, r], s_))\r\n\r\n        # replace the old memory with new memory\r\n        index = self.memory_counter % self.memory_size\r\n       \r\n        self.memory[index, :] = transition\r\n        self.memory_counter += 1\r\n\r\n    def choose_action(self, observation):\r\n       \r\n        # to have batch dimension when feed into tf placeholder\r\n        observation = np.array(observation)\r\n        observation = observation[np.newaxis,: ]\r\n       \r\n\r\n        if np.random.uniform() &lt; self.epsilon:\r\n            # forward feed the observation and get q value for every actions\r\n            actions_value = self.sess.run(self.q_eval, feed_dict={self.s: observation})\r\n            action = np.argmax(actions_value)\r\n          \r\n           \r\n            \r\n        else:\r\n            action = np.random.randint(0, self.n_actions)\r\n        return action\r\n\r\n    def learn(self):\r\n        # check to replace target parameters\r\n        if self.learn_step_counter % self.replace_target_iter == 0:\r\n            self.sess.run(self.replace_target_op)\r\n            print('\\ntarget_params_replaced\\n')\r\n\r\n        # sample batch memory from all memory\r\n        if self.memory_counter &gt; self.memory_size:\r\n            sample_index = np.random.choice(self.memory_size, size=self.batch_size)\r\n        else:\r\n            sample_index = np.random.choice(self.memory_counter, size=self.batch_size)\r\n        batch_memory = self.memory[sample_index, :]\r\n\r\n        q_next, q_eval = self.sess.run(\r\n            [self.q_next, self.q_eval],\r\n            feed_dict={\r\n                self.s_: batch_memory[:, -self.n_features:],  # fixed params\r\n                self.s: batch_memory[:, :self.n_features],  # newest params\r\n            })\r\n\r\n        # change q_target w.r.t q_eval's action\r\n        q_target = q_eval.copy()\r\n\r\n        batch_index = np.arange(self.batch_size, dtype=np.int32)\r\n        eval_act_index = batch_memory[:, self.n_features].astype(int)\r\n        reward = batch_memory[:, self.n_features + 1]\r\n\r\n        q_target[batch_index, eval_act_index] = reward + self.gamma * np.max(q_next, axis=1)\r\n\r\n       \r\n        # train eval network\r\n        _, self.cost = self.sess.run([self._train_op, self.loss],\r\n                                     feed_dict={self.s: batch_memory[:, :self.n_features],\r\n                                                self.q_target: q_target})\r\n        self.cost_his.append(self.cost)\r\n\r\n        # increasing epsilon\r\n        self.epsilon = self.epsilon + self.epsilon_increment if self.epsilon &lt; self.epsilon_max else self.epsilon_max\r\n        self.learn_step_counter += 1\r\n\r\n    def plot_cost(self):\r\n        import matplotlib.pyplot as plt1\r\n        plt1.figure(2)\r\n        plt1.plot(np.arange(len(self.cost_his)), self.cost_his)\r\n        plt1.ylabel('Cost')\r\n        plt1.xlabel('training steps')\r\n        plt1.show()\r\n\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>This is the python source code of RL_brainDQN.py for post Reinforcement Learning Python DQN Application for Resource Allocation<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"jetpack_post_was_ever_published":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Reinforcement Learning DQN - Machine Learning Applications<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Reinforcement Learning DQN - Machine Learning Applications\" \/>\n<meta property=\"og:description\" content=\"This is the python source code of RL_brainDQN.py for post Reinforcement Learning Python DQN Application for Resource Allocation\" \/>\n<meta property=\"og:url\" content=\"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/\" \/>\n<meta property=\"og:site_name\" content=\"Machine Learning Applications\" \/>\n<meta property=\"article:modified_time\" content=\"2019-01-07T15:27:10+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/\",\"url\":\"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/\",\"name\":\"Reinforcement Learning DQN - Machine Learning Applications\",\"isPartOf\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#website\"},\"datePublished\":\"2019-01-05T14:12:21+00:00\",\"dateModified\":\"2019-01-07T15:27:10+00:00\",\"breadcrumb\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\/\/intelligentonlinetools.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Reinforcement Learning DQN\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#website\",\"url\":\"http:\/\/intelligentonlinetools.com\/blog\/\",\"name\":\"Machine Learning Applications\",\"description\":\"Artificial intelligence, data mining and machine learning for building web based tools and services.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Reinforcement Learning DQN - Machine Learning Applications","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/","og_locale":"en_US","og_type":"article","og_title":"Reinforcement Learning DQN - Machine Learning Applications","og_description":"This is the python source code of RL_brainDQN.py for post Reinforcement Learning Python DQN Application for Resource Allocation","og_url":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/","og_site_name":"Machine Learning Applications","article_modified_time":"2019-01-07T15:27:10+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/","url":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/","name":"Reinforcement Learning DQN - Machine Learning Applications","isPartOf":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/#website"},"datePublished":"2019-01-05T14:12:21+00:00","dateModified":"2019-01-07T15:27:10+00:00","breadcrumb":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/"]}]},{"@type":"BreadcrumbList","@id":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/intelligentonlinetools.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Reinforcement Learning DQN"}]},{"@type":"WebSite","@id":"http:\/\/intelligentonlinetools.com\/blog\/#website","url":"http:\/\/intelligentonlinetools.com\/blog\/","name":"Machine Learning Applications","description":"Artificial intelligence, data mining and machine learning for building web based tools and services.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"}]}},"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P7h1IJ-Ej","jetpack-related-posts":[{"id":2501,"url":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn-run-planning\/","url_meta":{"origin":2499,"position":0},"title":"Reinforcement Learning DQN Run Planning","date":"January 5, 2019","format":false,"excerpt":"This is the python source code of run_planning_RL_DQN.py for post Reinforcement Learning Python DQN Application for Resource Allocation","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2495,"url":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn-planning-environment\/","url_meta":{"origin":2499,"position":1},"title":"Reinforcement Learning DQN Planning Environment","date":"January 5, 2019","format":false,"excerpt":"This is the python source code of planning_envDQN.py for post Reinforcement Learning Python DQN Application for Resource Allocation","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2366,"url":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/","url_meta":{"origin":2499,"position":2},"title":"Reinforcement Learning Dyna-Q","date":"November 3, 2018","format":false,"excerpt":"This is the python source code of RL_brain.py for post Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2368,"url":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q-planning-env\/","url_meta":{"origin":2499,"position":3},"title":"Reinforcement Learning Dyna-Q Planning Environment","date":"November 3, 2018","format":false,"excerpt":"This is the python source code of planning_env.py for post Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2364,"url":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q-run-planning-rl\/","url_meta":{"origin":2499,"position":4},"title":"Reinforcement Learning Dyna-Q Run Planning","date":"November 3, 2018","format":false,"excerpt":"This is the python source code of run_planning_RL.py for post Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2284,"url":"http:\/\/intelligentonlinetools.com\/blog\/neural-networks-applications-seismic-prospecting-neural-network-code\/","url_meta":{"origin":2499,"position":5},"title":"Neural Networks Applications in Seismic Prospecting  &#8211; Neural Network Code","date":"September 8, 2018","format":false,"excerpt":"Below you will find neural network code for blog post Artificial Intelligence - Neural Networks Applications in Seismic Prospecting. This post is showing how seismic process can be automated using deep neural network such as simple Multi Layer perceptron. References 1. Artificial Intelligence - Neural Networks Applications in Seismic Prospecting","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/pages\/2499"}],"collection":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/comments?post=2499"}],"version-history":[{"count":2,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/pages\/2499\/revisions"}],"predecessor-version":[{"id":2516,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/pages\/2499\/revisions\/2516"}],"wp:attachment":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/media?parent=2499"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}