{"id":2366,"date":"2018-11-03T17:50:12","date_gmt":"2018-11-03T17:50:12","guid":{"rendered":"http:\/\/intelligentonlinetools.com\/blog\/?page_id=2366"},"modified":"2018-11-07T01:14:50","modified_gmt":"2018-11-07T01:14:50","slug":"rl-dyna-q","status":"publish","type":"page","link":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/","title":{"rendered":"Reinforcement Learning Dyna-Q"},"content":{"rendered":"<p>This is the python source code of RL_brain.py for post <a href=\"http:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/\" target=\"_blank\">Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q<\/a> <\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n&quot;&quot;&quot;\r\nThis part of code is the Dyna-Q learning brain, which allows agent to make decision.\r\nAll decisions and learning processes are made in here.\r\n(MIT license) \r\n&quot;&quot;&quot;\r\n\r\nimport numpy as np\r\nimport pandas as pd\r\nfrom copy import deepcopy\r\n\r\n\r\nclass QLearningTable:\r\n    def __init__(self, actions, learning_rate=0.1, reward_decay=0.9, e_greedy=0.9, agent=&quot;&quot;):\r\n        self.actions = actions  # a list\r\n        self.lr = learning_rate\r\n        self.gamma = reward_decay\r\n        self.epsilon = e_greedy\r\n        self.agent=agent\r\n        self.q_table = pd.DataFrame(columns=self.actions)\r\n\r\n    def choose_action(self, observation):\r\n        self.check_state_exist(observation)\r\n            # action selection\r\n        if self.agent == &quot;RANDOM_AGENT&quot;:\r\n            action = np.random.choice(self.actions)\r\n            return action\r\n        if np.random.uniform() &lt; self.epsilon:\r\n            # choose best action\r\n         \r\n            state_action = self.q_table.ix[observation, :]\r\n          \r\n            state_action = state_action.reindex(np.random.permutation(state_action.index))     \r\n          \r\n            max_value=0\r\n            for act in list(self.q_table.columns.values):\r\n                if self.q_table.ix[observation, act] &gt;= max_value :\r\n                    max_action= act\r\n                    max_value= self.q_table.ix[observation, act]\r\n                   \r\n\r\n            action=max_action\r\n           \r\n           \r\n        else:\r\n            # choose random action\r\n            action = np.random.choice(self.actions)\r\n        return action\r\n\r\n    def learn(self, s, a, r, s_, dn):\r\n       \r\n        self.check_state_exist(s_)\r\n        q_predict = self.q_table.ix[s, a]\r\n       \r\n        if s_ != 'terminal' and dn != True:\r\n            q_target = r + self.gamma * self.q_table.ix[s_, :].max()  # next state is not terminal\r\n           \r\n        else:\r\n            q_target = r  # next state is terminal\r\n          \r\n        self.q_table.ix[s, a] += self.lr * (q_target - q_predict)  # update\r\n     \r\n    def check_state_exist(self, state):\r\n        if state not in self.q_table.index:\r\n            # append new state to q table\r\n            self.q_table = self.q_table.append(\r\n                pd.Series(\r\n                    [0]*len(self.actions),\r\n                    index=self.q_table.columns,\r\n                    name=state,\r\n                )\r\n            )\r\n\r\n\r\nclass EnvModel:\r\n    &quot;&quot;&quot;Similar to the memory buffer in DQN, you can store past experiences in here.\r\n    Alternatively, the model can generate next state and reward signal accurately.&quot;&quot;&quot;\r\n    def __init__(self, actions):\r\n       \r\n        self.actions = actions\r\n        self.database = pd.DataFrame(columns=actions, dtype=np.object)\r\n\r\n    def store_transition(self, s, a, r, s_):\r\n         \r\n        if s not in self.database.index:\r\n            self.database = self.database.append(\r\n                pd.Series(\r\n                    [None] * len(self.actions),\r\n                    index=self.database.columns,\r\n                    name=s,\r\n                ))\r\n   \r\n      \r\n        self.database.at[s, a] = deepcopy((r, s_))\r\n       \r\n        \r\n    def sample_s_a(self):\r\n        s = np.random.choice(self.database.index)\r\n        a = np.random.choice(self.database.ix[s].dropna().index)    # filter out the None value\r\n        return s, a\r\n\r\n    def get_r_s_(self, s, a):\r\n        r, s_ = self.database.ix[s, a]\r\n        return r, s_\r\n        \r\n    def get_env(self):\r\n        print (self.database)\r\n\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>This is the python source code of RL_brain.py for post Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"jetpack_post_was_ever_published":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Reinforcement Learning Dyna-Q - Machine Learning Applications<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Reinforcement Learning Dyna-Q - Machine Learning Applications\" \/>\n<meta property=\"og:description\" content=\"This is the python source code of RL_brain.py for post Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q\" \/>\n<meta property=\"og:url\" content=\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/\" \/>\n<meta property=\"og:site_name\" content=\"Machine Learning Applications\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-07T01:14:50+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/\",\"url\":\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/\",\"name\":\"Reinforcement Learning Dyna-Q - Machine Learning Applications\",\"isPartOf\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#website\"},\"datePublished\":\"2018-11-03T17:50:12+00:00\",\"dateModified\":\"2018-11-07T01:14:50+00:00\",\"breadcrumb\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\/\/intelligentonlinetools.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Reinforcement Learning Dyna-Q\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#website\",\"url\":\"http:\/\/intelligentonlinetools.com\/blog\/\",\"name\":\"Machine Learning Applications\",\"description\":\"Artificial intelligence, data mining and machine learning for building web based tools and services.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Reinforcement Learning Dyna-Q - Machine Learning Applications","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/","og_locale":"en_US","og_type":"article","og_title":"Reinforcement Learning Dyna-Q - Machine Learning Applications","og_description":"This is the python source code of RL_brain.py for post Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q","og_url":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/","og_site_name":"Machine Learning Applications","article_modified_time":"2018-11-07T01:14:50+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/","url":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/","name":"Reinforcement Learning Dyna-Q - Machine Learning Applications","isPartOf":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/#website"},"datePublished":"2018-11-03T17:50:12+00:00","dateModified":"2018-11-07T01:14:50+00:00","breadcrumb":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/"]}]},{"@type":"BreadcrumbList","@id":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/intelligentonlinetools.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Reinforcement Learning Dyna-Q"}]},{"@type":"WebSite","@id":"http:\/\/intelligentonlinetools.com\/blog\/#website","url":"http:\/\/intelligentonlinetools.com\/blog\/","name":"Machine Learning Applications","description":"Artificial intelligence, data mining and machine learning for building web based tools and services.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"}]}},"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P7h1IJ-Ca","jetpack-related-posts":[{"id":2368,"url":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q-planning-env\/","url_meta":{"origin":2366,"position":0},"title":"Reinforcement Learning Dyna-Q Planning Environment","date":"November 3, 2018","format":false,"excerpt":"This is the python source code of planning_env.py for post Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2364,"url":"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q-run-planning-rl\/","url_meta":{"origin":2366,"position":1},"title":"Reinforcement Learning Dyna-Q Run Planning","date":"November 3, 2018","format":false,"excerpt":"This is the python source code of run_planning_RL.py for post Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2499,"url":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn\/","url_meta":{"origin":2366,"position":2},"title":"Reinforcement Learning DQN","date":"January 5, 2019","format":false,"excerpt":"This is the python source code of RL_brainDQN.py for post Reinforcement Learning Python DQN Application for Resource Allocation","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2501,"url":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn-run-planning\/","url_meta":{"origin":2366,"position":3},"title":"Reinforcement Learning DQN Run Planning","date":"January 5, 2019","format":false,"excerpt":"This is the python source code of run_planning_RL_DQN.py for post Reinforcement Learning Python DQN Application for Resource Allocation","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2495,"url":"http:\/\/intelligentonlinetools.com\/blog\/reinforcement-learning-dqn-planning-environment\/","url_meta":{"origin":2366,"position":4},"title":"Reinforcement Learning DQN Planning Environment","date":"January 5, 2019","format":false,"excerpt":"This is the python source code of planning_envDQN.py for post Reinforcement Learning Python DQN Application for Resource Allocation","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2284,"url":"http:\/\/intelligentonlinetools.com\/blog\/neural-networks-applications-seismic-prospecting-neural-network-code\/","url_meta":{"origin":2366,"position":5},"title":"Neural Networks Applications in Seismic Prospecting  &#8211; Neural Network Code","date":"September 8, 2018","format":false,"excerpt":"Below you will find neural network code for blog post Artificial Intelligence - Neural Networks Applications in Seismic Prospecting. This post is showing how seismic process can be automated using deep neural network such as simple Multi Layer perceptron. References 1. Artificial Intelligence - Neural Networks Applications in Seismic Prospecting","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/pages\/2366"}],"collection":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/comments?post=2366"}],"version-history":[{"count":5,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/pages\/2366\/revisions"}],"predecessor-version":[{"id":2389,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/pages\/2366\/revisions\/2389"}],"wp:attachment":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/media?parent=2366"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}