{"id":2327,"date":"2018-10-28T11:55:59","date_gmt":"2018-10-28T11:55:59","guid":{"rendered":"http:\/\/intelligentonlinetools.com\/blog\/?p=2327"},"modified":"2018-11-07T01:18:05","modified_gmt":"2018-11-07T01:18:05","slug":"reinforcement-learning-example-planning-using-q-learning-dyna","status":"publish","type":"post","link":"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/","title":{"rendered":"Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q"},"content":{"rendered":"<p><img data-attachment-id=\"2403\" data-permalink=\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/labyrinth-1015643_640\/#main\" data-orig-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/labyrinth-1015643_640.jpg\" data-orig-size=\"640,640\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"labyrinth-1015643_640\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/labyrinth-1015643_640-300x300.jpg\" data-large-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/labyrinth-1015643_640.jpg\" decoding=\"async\" loading=\"lazy\" src=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/labyrinth-1015643_640.jpg\" alt=\"\" width=\"640\" height=\"640\" class=\"alignnone size-full wp-image-2403\" srcset=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/labyrinth-1015643_640.jpg 640w, https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/labyrinth-1015643_640-150x150.jpg 150w, https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/labyrinth-1015643_640-300x300.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/p>\n<h2>What is Planning Process<\/h2>\n<p><b>Planning<\/b> is the process of finding a sequence of actions (steps), which if executed by an<br \/>\nagent result in the achievement of a set of predefined goals. The sequence of actions mentioned above is also referred to as plan. Planning is studied within Reinforcement Learning and Automated Planning that are subfields of Machine Learning and Artificial Intelligence. [1]<\/p>\n<p>Planning can be used in production, <a href=\"http:\/\/lup.lub.lu.se\/luur\/download?func=downloadFile&#038;recordOId=8936610&#038;fileOId=8936615\" target=\"_blank\">here<\/a> [5] you can find reinforcement learning example applied to learn an approximately optimal strategy for controlling the stations of a production line in order to meet the demand. The goal in this thesis was to create schedule for machines such as press and oven, running in production environment.<\/p>\n<p>In our day to day life we do planning without using any knowledge about Reinforcement Learning or Artificial Intelligence. For example when we create plan of actions for completion project or plan of tasks for the week or month.  Using Reinforcement Learning for planning we can save time, find better strategies, eliminate human error.  <\/p>\n<p>In this post we will look at typical planning problem of finding actions needed to complete some specific tasks. This is very practical problem as it can be used for making our everyday schedule or for achieving our goals.  <\/p>\n<h2>Combining Q Learning with Dyna<\/h2>\n<p>We will investigate how to apply Reinforcement Learning for planning of actions to complete tasks using algorithm <b>Dyna-Q<\/b> proposed by R. Sutton and based on combining Dyna and Q learning.<\/p>\n<p><b>Dyna<\/b> is most common and often used solution to speed up the learning procedure in Reinforcement Learning. [2],[3] In our experiment we will see how it impact on speed.  <\/p>\n<p>Under Dyna the action taken is computed rapidly as a function of the situation, but the<br \/>\nmapping implemented by that system is continually adjusted by a planning process and the planner is not restricted to planning about the current situation. [2]<\/p>\n<p><b>Q-learning<\/b> is a model free method which means that there is no need to maintain a separate<br \/>\nstructure for the value function and the policy but only the Q-value function. The Dyna-Q<br \/>\narchitecture is simpler as two data structures have been replaced with one. [1]<\/p>\n<p>We will look at more details of Dyna-Q framework after we define our environment and problem.<\/p>\n<h2>Problem Description<\/h2>\n<p>As mentioned above we will  do planning of actions that are needed to complete tasks. Given some goals and set of actions we are interesting to know what action we need to take now in order to get the best result in the end.<\/p>\n<p>Lets say by the end of week I need complete project in Applied Machine Learning and project in Reinforcement Learning.  I have some rewards for completion of each project as 3 and 10.  This means that completion of Reinforcement Learning is more important for whatever reason.  <\/p>\n<p>Lets assume I need to put specific number of time &#8211; 2 and 3 time units to complete end goal for each project. Time unit can be just 1 hour for this example. I am working only in the evening each day and each day I can make only one action.  I have only 5 times to pick.  <\/p>\n<p>While I need to put only 2 units of time to complete my weekly goal on Machine Learning project, I still can work on this project after putting 2 unit of time, possibly doing something for next week or for extra credit. Reward is calculated only in the end of week.  <\/p>\n<p>The diagram of one of possible path would look like this:<\/p>\n<figure id=\"attachment_2353\" aria-describedby=\"caption-attachment-2353\" style=\"width: 740px\" class=\"wp-caption alignnone\"><img data-attachment-id=\"2353\" data-permalink=\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/rl_paths\/#main\" data-orig-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_paths-e1541117896388.png\" data-orig-size=\"750,219\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Planning Diagram\" data-image-description=\"&lt;p&gt;Planning Diagram&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Planning Diagram&lt;\/p&gt;\n\" data-medium-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_paths-300x88.png\" data-large-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_paths-1024x299.png\" decoding=\"async\" loading=\"lazy\" src=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_paths-e1541117896388.png\" alt=\"Planning Diagram\" width=\"750\" height=\"219\" class=\"size-full wp-image-2353\" \/><figcaption id=\"caption-attachment-2353\" class=\"wp-caption-text\">Planning Diagram<\/figcaption><\/figure>\n<p>On this diagram the green indicates path that produces the max reward 13 as the agent was able to complete both goals.<\/p>\n<h2>Simplification<\/h2>\n<p>As this is the first post on reinforcement learning for planning, we pick very simple problem. And even without calculations we can say that the optimal schedule is when we allocate 2 units for ML project and 3 units for another project and our maximum reward can be 13.<\/p>\n<p>Thus in this example we did few <b>simplifications<\/b>:<br \/>\nthe number of actions is the same as the number of goals.  This makes easy a little bit programming for now.<br \/>\nThe number of time units needed to complete task is not changing. This is not always true. In real situation we often realize that something that we planned, will take longer time or may be not possible at all at the current moment. <\/p>\n<p>Despite of the above simplification, the program still has a lot to learn.<br \/>\nHow would it create action plan for completion the given tasks by the end of specific time period?<\/p>\n<h2>Solution<\/h2>\n<p>The code here is based on dyna-q for maze problem[4]. It has 2 modules for programming environment and Reinforcement Learning algorithm.  Additionally it has main module which run loop with episods.<\/p>\n<p>Our solution consists of two parts:<br \/>\n1. Reinforcement Learning <b>Q learning<\/b> where we use observed value and update the table with  state, action, reward. Here we create action.<br \/>\n2. Dyna part &#8211; where we do <b>simulations<\/b> and also update state action reward after each simulation.  Basically we choose randomly state and action, define next state and reward and update the table in same way as in 1.<\/p>\n<p>Out table is pandas data frame shown on flowchart on right side. <\/p>\n<figure id=\"attachment_2344\" aria-describedby=\"caption-attachment-2344\" style=\"width: 631px\" class=\"wp-caption alignnone\"><img data-attachment-id=\"2344\" data-permalink=\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/rl_flow_chart_rev4\/#main\" data-orig-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_flow_chart_rev4.png\" data-orig-size=\"641,541\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Reinforcement Learning and Planning\" data-image-description=\"&lt;p&gt;Reinforcement Learning and Planning &#8211; Dyna-Q Algorithm&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Reinforcement Learning and Planning &#8211; Dyna-Q Algorithm&lt;\/p&gt;\n\" data-medium-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_flow_chart_rev4-300x253.png\" data-large-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_flow_chart_rev4.png\" decoding=\"async\" loading=\"lazy\" src=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_flow_chart_rev4.png\" alt=\"\" width=\"641\" height=\"541\" class=\"size-full wp-image-2344\" srcset=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_flow_chart_rev4.png 641w, https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_flow_chart_rev4-300x253.png 300w\" sizes=\"(max-width: 641px) 100vw, 641px\" \/><figcaption id=\"caption-attachment-2344\" class=\"wp-caption-text\">Reinforcement Learning and Planning &#8211; Dyna-Q Algorithm<\/figcaption><\/figure>\n<p>To run this reinforcement learning example you can use python source code from the links below:<\/p>\n<p><a href=\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q-planning-env\/\" target=\"_blank\">       Reinforcement Learning Dyna-Q Planning Environment<\/a><br \/>\n<a href=\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q\/\" target=\"_blank\">                   Reinforcement Learning Dyna-Q<\/a><br \/>\n<a href=\"http:\/\/intelligentonlinetools.com\/blog\/rl-dyna-q-run-planning-rl\/\" target=\"_blank\">    Reinforcement Learning Dyna-Q Run Planning<\/a> <\/p>\n<h2>Results<\/h2>\n<p>We run 3 different agents:<\/p>\n<p>1. Random Agent &#8211; action is always picked randomly<br \/>\n2. RL Agent &#8211; we use only observed values, no simulations are performed. So we use only Q learning.<br \/>\n3. Dyna Q &#8211;  we use Q learning and Dyna simulations.<\/p>\n<p>The results are shown on charts below. Here we output average reward for each 50 episods.<\/p>\n<figure id=\"attachment_2347\" aria-describedby=\"caption-attachment-2347\" style=\"width: 422px\" class=\"wp-caption alignnone\"><img data-attachment-id=\"2347\" data-permalink=\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/random-agen-dyno-q\/#main\" data-orig-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/random-agen-dyno-q-e1541117351448.png\" data-orig-size=\"432,339\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Random Agent Run Result\" data-image-description=\"&lt;p&gt;Random Agent Run Result &lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Random Agent Run Result &lt;\/p&gt;\n\" data-medium-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/random-agen-dyno-q-300x236.png\" data-large-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/random-agen-dyno-q-e1541117351448.png\" decoding=\"async\" loading=\"lazy\" src=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/random-agen-dyno-q-e1541117351448.png\" alt=\"Random Agent Run Result\" width=\"432\" height=\"339\" class=\"size-full wp-image-2347\" \/><figcaption id=\"caption-attachment-2347\" class=\"wp-caption-text\">Random Agent Run Result<\/figcaption><\/figure>\n<figure id=\"attachment_2346\" aria-describedby=\"caption-attachment-2346\" style=\"width: 551px\" class=\"wp-caption alignnone\"><img data-attachment-id=\"2346\" data-permalink=\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/only-rl-agent\/#main\" data-orig-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/only-RL-agent.png\" data-orig-size=\"561,435\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Only RL Q Learning Agent Run Result\" data-image-description=\"&lt;p&gt;Only RL Q Learning Agent Run Result &lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Only RL Q Learning Agent Run Result &lt;\/p&gt;\n\" data-medium-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/only-RL-agent-300x233.png\" data-large-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/only-RL-agent.png\" decoding=\"async\" loading=\"lazy\" src=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/only-RL-agent.png\" alt=\"Only RL Q Learning Agent Run Result\" width=\"561\" height=\"435\" class=\"size-full wp-image-2346\" srcset=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/only-RL-agent.png 561w, https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/only-RL-agent-300x233.png 300w\" sizes=\"(max-width: 561px) 100vw, 561px\" \/><figcaption id=\"caption-attachment-2346\" class=\"wp-caption-text\">Only RL Q Learning Agent Run Result<\/figcaption><\/figure>\n<figure id=\"attachment_2345\" aria-describedby=\"caption-attachment-2345\" style=\"width: 432px\" class=\"wp-caption alignnone\"><img data-attachment-id=\"2345\" data-permalink=\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/rl-and-dyno-q-agent\/#main\" data-orig-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL-and-dyno-Q-agent-e1541117589547.png\" data-orig-size=\"442,339\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"RL Dyna-Q Agent Run Result\" data-image-description=\"&lt;p&gt;RL Dyna-Q Agent Run Result &lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;RL Dyna-Q Agent Run Result &lt;\/p&gt;\n\" data-medium-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL-and-dyno-Q-agent-300x230.png\" data-large-file=\"https:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL-and-dyno-Q-agent-e1541117589547.png\" decoding=\"async\" loading=\"lazy\" src=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL-and-dyno-Q-agent-e1541117589547.png\" alt=\"RL Dyna-Q Agent Run Result\" width=\"442\" height=\"339\" class=\"size-full wp-image-2345\" \/><figcaption id=\"caption-attachment-2345\" class=\"wp-caption-text\">RL Dyna-Q Agent Run Result<\/figcaption><\/figure>\n<p>The random agent was not able to understand that there is better option with reward 13.<br \/>\nRL agent performed better than random, was able to pick reward at 13 however it took long way.<br \/>\nDyna Q agent was able to pick reward 13 after only 100 episods.  The average however about 12.5 So there is some room for improvement.<br \/>\nStill it is not bad considering that we did not do any specific tune up of parameters.<\/p>\n<h2>Next Steps<\/h2>\n<p>We learned algorithms for reinforcement learning such as Q learning and Dyna-Q techniques that can be used for planning. By adding <b>Dyna<\/b> part the <b>learning was significantly accelerated.<\/b><\/p>\n<p>Next actions would be improve performance, use reinforcement learning deep learning net and make more general environment setup.<\/p>\n<p><b>References<\/b><br \/>\n1. <a href=\"https:\/\/pdfs.semanticscholar.org\/b905\/4205676223f064e73295c8a3f16d87f1277f.pdf\" target=\"_blank\">Reinforcement Learning and Automated Planning: A Survey<\/a><br \/>\n2. <a href=\"https:\/\/pdfs.semanticscholar.org\/87bf\/c4daa478433d8b18a47edf9112a25098cada.pdf\" target=\"_blank\">Planning by Incremental Dynamic Programming<\/a> R. S. Sutton<br \/>\n3. <a href=\"https:\/\/www.cs.cmu.edu\/afs\/cs\/project\/jair\/pub\/volume4\/kaelbling96a-html\/node29.html\" target=\"_blank\">Dyna<\/a><br \/>\n4. <a href=\"https:\/\/github.com\/MorvanZhou\/Reinforcement-learning-with-tensorflow\" target=\"_blank\">Reinforcement Learning Methods and Tutorials<\/a><br \/>\n5. <a href=\"http:\/\/lup.lub.lu.se\/luur\/download?func=downloadFile&#038;recordOId=8936610&#038;fileOId=8936615\" target=\"_blank\">Reinforcement learning for planning of a simulated production line<\/a> Gustaf Ehn, Hugo Werner  February 27, 2018<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is Planning Process Planning is the process of finding a sequence of actions (steps), which if executed by an agent result in the achievement of a set of predefined goals. The sequence of actions mentioned above is also referred to as plan. Planning is studied within Reinforcement Learning and Automated Planning that are subfields &#8230; <a title=\"Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q\" class=\"read-more\" href=\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":[]},"categories":[97],"tags":[102,101,99,100,98],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q - Machine Learning Applications<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q - Machine Learning Applications\" \/>\n<meta property=\"og:description\" content=\"What is Planning Process Planning is the process of finding a sequence of actions (steps), which if executed by an agent result in the achievement of a set of predefined goals. The sequence of actions mentioned above is also referred to as plan. Planning is studied within Reinforcement Learning and Automated Planning that are subfields ... Read more\" \/>\n<meta property=\"og:url\" content=\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/\" \/>\n<meta property=\"og:site_name\" content=\"Machine Learning Applications\" \/>\n<meta property=\"article:published_time\" content=\"2018-10-28T11:55:59+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-07T01:18:05+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/labyrinth-1015643_640.jpg\" \/>\n<meta name=\"author\" content=\"owygs156\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"owygs156\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/\",\"url\":\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/\",\"name\":\"Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q - Machine Learning Applications\",\"isPartOf\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#website\"},\"datePublished\":\"2018-10-28T11:55:59+00:00\",\"dateModified\":\"2018-11-07T01:18:05+00:00\",\"author\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478\"},\"breadcrumb\":{\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\/\/intelligentonlinetools.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#website\",\"url\":\"http:\/\/intelligentonlinetools.com\/blog\/\",\"name\":\"Machine Learning Applications\",\"description\":\"Artificial intelligence, data mining and machine learning for building web based tools and services.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478\",\"name\":\"owygs156\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"caption\":\"owygs156\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q - Machine Learning Applications","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/","og_locale":"en_US","og_type":"article","og_title":"Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q - Machine Learning Applications","og_description":"What is Planning Process Planning is the process of finding a sequence of actions (steps), which if executed by an agent result in the achievement of a set of predefined goals. The sequence of actions mentioned above is also referred to as plan. Planning is studied within Reinforcement Learning and Automated Planning that are subfields ... Read more","og_url":"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/","og_site_name":"Machine Learning Applications","article_published_time":"2018-10-28T11:55:59+00:00","article_modified_time":"2018-11-07T01:18:05+00:00","og_image":[{"url":"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/labyrinth-1015643_640.jpg"}],"author":"owygs156","twitter_card":"summary_large_image","twitter_misc":{"Written by":"owygs156","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/","url":"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/","name":"Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q - Machine Learning Applications","isPartOf":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/#website"},"datePublished":"2018-10-28T11:55:59+00:00","dateModified":"2018-11-07T01:18:05+00:00","author":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478"},"breadcrumb":{"@id":"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/intelligentonlinetools.com\/blog\/2018\/10\/28\/reinforcement-learning-example-planning-using-q-learning-dyna\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/intelligentonlinetools.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q"}]},{"@type":"WebSite","@id":"http:\/\/intelligentonlinetools.com\/blog\/#website","url":"http:\/\/intelligentonlinetools.com\/blog\/","name":"Machine Learning Applications","description":"Artificial intelligence, data mining and machine learning for building web based tools and services.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478","name":"owygs156","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","caption":"owygs156"}}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p7h1IJ-Bx","jetpack-related-posts":[{"id":2481,"url":"https:\/\/intelligentonlinetools.com\/blog\/2019\/01\/02\/reinforcement-learning-python-dqn-application-resource-allocation\/","url_meta":{"origin":2327,"position":0},"title":"Reinforcement Learning Python DQN Application for Resource Allocation","date":"January 2, 2019","format":false,"excerpt":"In the previous post Reinforcement Learning Example for Planning Tasks Using Q Learning and Dyna-Q we applied Dyna-Q algorithm for planning of actions to complete tasks. This problem can be viewed as resource allocation task. In this post we will use reinforcement learning python DQN (Deep Q-network) for the same\u2026","rel":"","context":"In &quot;Python Scripts&quot;","img":{"alt_text":"Planning Diagram","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/10\/RL_paths-e1541117896388.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":586,"url":"https:\/\/intelligentonlinetools.com\/blog\/2016\/09\/15\/how-can-we-use-computer-technology-to-improve\/","url_meta":{"origin":2327,"position":1},"title":"How Can We Use Computer Programming to Increase Effective Thinking","date":"September 15, 2016","format":false,"excerpt":"Once a while we might find ourselves in situation when we think \"I wish I knew this before\" , \"Why I did not think about this before\" or \"Why it took so long to come to this decision or action\". Can computer programs be used to help us to avoid\u2026","rel":"","context":"In &quot;Artificial Intelligence&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2253,"url":"https:\/\/intelligentonlinetools.com\/blog\/2018\/09\/06\/ml-applications\/","url_meta":{"origin":2327,"position":2},"title":"Everyday Examples of Machine Learning Applications","date":"September 6, 2018","format":false,"excerpt":"Artificial Intelligence and Machine Learning applications is one of the most hottest topics in the industry today. Robots, self driving cars, intelligent chatbots and many other innovations are coming to our work and life. In this post we will look at few machine learning less known applications that were covered\u2026","rel":"","context":"In &quot;Machine learning applications&quot;","img":{"alt_text":"Topic modeling with textacy","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2018\/09\/Topic-modeling-with-textacy-e1536508581929.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1257,"url":"https:\/\/intelligentonlinetools.com\/blog\/2017\/06\/17\/time-series-prediction-with-convolutional-neural-networks-and-keras\/","url_meta":{"origin":2327,"position":3},"title":"Time Series Prediction with Convolutional Neural Networks and Keras","date":"June 17, 2017","format":false,"excerpt":"A convolutional neural network (CNNs) is a type of network that has recently gained popularity due to its success in classification problems (e.g. image recognition or time series classification) [1]. One of the working examples how to use Keras CNN for time series can be found at this link[2]. This\u2026","rel":"","context":"In &quot;Artificial Intelligence&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/06\/CNN22-300x212.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":227,"url":"https:\/\/intelligentonlinetools.com\/blog\/2016\/05\/28\/using-python-for-mining-data-from-twitter\/","url_meta":{"origin":2327,"position":4},"title":"Using Python for Mining Data From Twitter","date":"May 28, 2016","format":false,"excerpt":"Twitter is increasingly being used for business or personal purposes. With Twitter API there is also an opportunity to do data mining of data (tweets) and find interesting information. In this post we will take a look how to get data from Twitter, prepare data for analysis and then do\u2026","rel":"","context":"In &quot;Artificial Intelligence&quot;","img":{"alt_text":"Frequency of Hashtags","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/05\/Frequency-of-Hashtags-300x171.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":18,"url":"https:\/\/intelligentonlinetools.com\/blog\/2016\/03\/31\/data-mining-twitter-with-python\/","url_meta":{"origin":2327,"position":5},"title":"Data Mining Twitter Data with Python","date":"March 31, 2016","format":false,"excerpt":"Twitter is an online social networking service that enables users to send and read short 140-character messages called \"tweets\". [1] Twitter users are tweeting about different topics based on their interests and goals. A word, phrase or topic that is mentioned at a greater rate than others is said to\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/2327"}],"collection":[{"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/comments?post=2327"}],"version-history":[{"count":43,"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/2327\/revisions"}],"predecessor-version":[{"id":2405,"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/2327\/revisions\/2405"}],"wp:attachment":[{"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/media?parent=2327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/categories?post=2327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/tags?post=2327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}