{"id":993,"date":"2017-02-22T02:15:04","date_gmt":"2017-02-22T02:15:04","guid":{"rendered":"http:\/\/intelligentonlinetools.com\/blog\/?p=993"},"modified":"2017-03-07T02:31:15","modified_gmt":"2017-03-07T02:31:15","slug":"converting-categorical-text-variable-into-binary-variables","status":"publish","type":"post","link":"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/","title":{"rendered":"Converting Categorical Text Variable into Binary Variables"},"content":{"rendered":"<p>Sometimes we might need convert categorical feature into multiple binary features.  Such situation emerged while I was implementing decision tree with independent categorical variable using python sklearn.tree for the post <a href=\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/18\/building-decision-trees-in-python-handling-categorical-data\/\" target=\"_blank\">Building Decision Trees in Python &#8211; Handling Categorical Data<\/a> and it turned out that a text independent variable is not supported.<\/p>\n<p>One of solution would be <strong>binary encoding<\/strong>, also called one-hot-encoding when we might code [&#8216;red&#8217;,&#8217;green&#8217;,&#8217;blue&#8217;] with 3 columns, one for each category, having 1 when the category match and 0 otherwise. [1]<\/p>\n<p>Here we implement the python code that makes such binary encoding. The script looks at text data column and add numerical columns with values 0 or 1 to the original data. If category word exists in the column then it will be 1 in the column for this category, otherwise 0.<\/p>\n<p>The list of categories is initialized in the beginning of the script. Additionally we initialize data source file,  number of column with text data, and number of first empty column on right side. The script will add columns on right side starting from first empty column.<\/p>\n<p>The next step in the script is to navigate through each row and do binary conversion and update data.<\/p>\n<p>Below is some example of added binary columns to data input .<\/p>\n<p><img data-attachment-id=\"994\" data-permalink=\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/additional_columns\/#main\" data-orig-file=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/additional_columns.png\" data-orig-size=\"450,122\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"additional_columns\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/additional_columns-300x81.png\" data-large-file=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/additional_columns.png\" decoding=\"async\" loading=\"lazy\" src=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/additional_columns-300x81.png\" alt=\"\" width=\"300\" height=\"81\" class=\"alignnone size-medium wp-image-994\" srcset=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/additional_columns-300x81.png 300w, http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/additional_columns.png 450w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>Below is full source code.<\/p>\n<pre><code>\r\n# -*- coding: utf-8 -*-\r\n\r\nimport pandas as pd\r\n\r\nwords = [\"adwords\", \"adsense\",\"mortgage\",\"money\",\"loan\"]\r\ndata = pd.read_csv('adwords_data5.csv', sep= ',' , header = 0)\r\n\r\n\r\ntotal_rows = len(data.index)\r\n\r\n\r\ny_text_column_index=7\r\ny_column_index=16\r\n\r\n\r\n\r\n\r\n\r\nfor index, w in enumerate(words):\r\n  data[w] = 0   \r\n  col_index=data.columns.get_loc(w)\r\n  \r\n  for x in range (total_rows):\r\n      if w in data.iloc[x,y_text_column_index] :\r\n           data.iloc[x,y_column_index+index]=1\r\n      else :\r\n           data.iloc[x,y_column_index+index]=0  \r\n\r\n\r\nprint (data)\r\n<\/code><\/pre>\n<p><strong>References<\/strong><br \/>\n1. <a href=http:\/\/datascience.stackexchange.com\/questions\/5226\/strings-as-features-in-decision-tree-random-forest target=\"_blank\">strings as features in decision tree\/random forest<\/a><br \/>\n2. <a href=\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/18\/building-decision-trees-in-python\/\" target=\"_blank\">Building Decision Trees in Python<\/a><br \/>\n3. <a href=\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/18\/building-decision-trees-in-python-handling-categorical-data\/\" target=\"_blank\">Building Decision Trees in Python &#8211; Handling Categorical Data<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sometimes we might need convert categorical feature into multiple binary features. Such situation emerged while I was implementing decision tree with independent categorical variable using python sklearn.tree for the post Building Decision Trees in Python &#8211; Handling Categorical Data and it turned out that a text independent variable is not supported. One of solution would &#8230; <a title=\"Converting Categorical Text Variable into Binary Variables\" class=\"read-more\" href=\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":[]},"categories":[5,2,9,10],"tags":[],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Converting Categorical Text Variable into Binary Variables - Machine Learning Applications<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Converting Categorical Text Variable into Binary Variables - Machine Learning Applications\" \/>\n<meta property=\"og:description\" content=\"Sometimes we might need convert categorical feature into multiple binary features. Such situation emerged while I was implementing decision tree with independent categorical variable using python sklearn.tree for the post Building Decision Trees in Python &#8211; Handling Categorical Data and it turned out that a text independent variable is not supported. One of solution would ... Read more\" \/>\n<meta property=\"og:url\" content=\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/\" \/>\n<meta property=\"og:site_name\" content=\"Machine Learning Applications\" \/>\n<meta property=\"article:published_time\" content=\"2017-02-22T02:15:04+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2017-03-07T02:31:15+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/additional_columns-300x81.png\" \/>\n<meta name=\"author\" content=\"owygs156\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"owygs156\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/\",\"url\":\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/\",\"name\":\"Converting Categorical Text Variable into Binary Variables - Machine Learning Applications\",\"isPartOf\":{\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#website\"},\"datePublished\":\"2017-02-22T02:15:04+00:00\",\"dateModified\":\"2017-03-07T02:31:15+00:00\",\"author\":{\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478\"},\"breadcrumb\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/intelligentonlinetools.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Converting Categorical Text Variable into Binary Variables\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#website\",\"url\":\"https:\/\/intelligentonlinetools.com\/blog\/\",\"name\":\"Machine Learning Applications\",\"description\":\"Artificial intelligence, data mining and machine learning for building web based tools and services.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478\",\"name\":\"owygs156\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"contentUrl\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"caption\":\"owygs156\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Converting Categorical Text Variable into Binary Variables - Machine Learning Applications","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/","og_locale":"en_US","og_type":"article","og_title":"Converting Categorical Text Variable into Binary Variables - Machine Learning Applications","og_description":"Sometimes we might need convert categorical feature into multiple binary features. Such situation emerged while I was implementing decision tree with independent categorical variable using python sklearn.tree for the post Building Decision Trees in Python &#8211; Handling Categorical Data and it turned out that a text independent variable is not supported. One of solution would ... Read more","og_url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/","og_site_name":"Machine Learning Applications","article_published_time":"2017-02-22T02:15:04+00:00","article_modified_time":"2017-03-07T02:31:15+00:00","og_image":[{"url":"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/additional_columns-300x81.png"}],"author":"owygs156","twitter_card":"summary_large_image","twitter_misc":{"Written by":"owygs156","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/","url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/","name":"Converting Categorical Text Variable into Binary Variables - Machine Learning Applications","isPartOf":{"@id":"https:\/\/intelligentonlinetools.com\/blog\/#website"},"datePublished":"2017-02-22T02:15:04+00:00","dateModified":"2017-03-07T02:31:15+00:00","author":{"@id":"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478"},"breadcrumb":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/"]}]},{"@type":"BreadcrumbList","@id":"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/22\/converting-categorical-text-variable-into-binary-variables\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/intelligentonlinetools.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Converting Categorical Text Variable into Binary Variables"}]},{"@type":"WebSite","@id":"https:\/\/intelligentonlinetools.com\/blog\/#website","url":"https:\/\/intelligentonlinetools.com\/blog\/","name":"Machine Learning Applications","description":"Artificial intelligence, data mining and machine learning for building web based tools and services.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478","name":"owygs156","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/image\/","url":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","contentUrl":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","caption":"owygs156"}}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p7h1IJ-g1","jetpack-related-posts":[{"id":975,"url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/18\/building-decision-trees-in-python-handling-categorical-data\/","url_meta":{"origin":993,"position":0},"title":"Building Decision Trees in Python &#8211; Handling Categorical Data","date":"February 18, 2017","format":false,"excerpt":"In the post Building Decision Trees in Python we looked at the decision tree with numerical continuous dependent variable. This type of decision trees can be called also regression tree. But what if we need to use categorical dependent variable? It is still possible to create decision tree and in\u2026","rel":"","context":"In &quot;Artificial Intelligence&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/dt-235x300.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1516,"url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/11\/23\/regression-and-classification-decision-trees-building-with-python-and-running-online\/","url_meta":{"origin":993,"position":1},"title":"Regression and Classification Decision Trees &#8211; Building with Python and Running Online","date":"November 23, 2017","format":false,"excerpt":"According to survey [1] Decision Trees constitute one of the 10 most popular data mining algorithms. Decision trees used in data mining are of two main types: Classification tree analysis is when the predicted outcome is the class to which the data belongs. Regression tree analysis is when the predicted\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/11\/decision_tree_11_2017-300x283.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1470,"url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/11\/13\/getting-data-driven-insights-from-blog-data-analysis-with-feature-selection\/","url_meta":{"origin":993,"position":2},"title":"Getting Data-Driven Insights from Blog Data Analysis with Feature Selection","date":"November 13, 2017","format":false,"excerpt":"Machine learning algorithms are widely used in every business - object recognition, marketing analytics, analyzing data in numerous applications to get useful insights. In this post one of machine learning techniques is applied to analysis of blog post data to predict significant features for key metrics such as page views.\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/11\/feature_selection-300x253.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":966,"url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/02\/18\/building-decision-trees-in-python\/","url_meta":{"origin":993,"position":3},"title":"Building Decision Trees in Python","date":"February 18, 2017","format":false,"excerpt":"A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. Decision trees are commonly used in operations research, specifically in decision analysis, to\u2026","rel":"","context":"In &quot;Artificial Intelligence&quot;","img":{"alt_text":"Decision Tree","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/02\/dt_post1_N_CTQ_Cost_regr1-2-use-this-300x103.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":2167,"url":"http:\/\/intelligentonlinetools.com\/blog\/2018\/07\/28\/inferring-causes-effects-daily-data\/","url_meta":{"origin":993,"position":4},"title":"Inferring Causes and Effects from Daily Data","date":"July 28, 2018","format":false,"excerpt":"Doing different activities we often are interesting how they impact each other. For example, if we visit different links on Internet, we might want to know how this action impacts our motivation for doing some specific things. In other words we are interesting in inferring importance of causes for effects\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":852,"url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/01\/22\/data-visualization-visualizing-an-lda-model-using-python\/","url_meta":{"origin":993,"position":5},"title":"Data Visualization &#8211; Visualizing an LDA Model using Python","date":"January 22, 2017","format":false,"excerpt":"In the previous post Topic Extraction from Blog Posts with LSI , LDA and Python python code was created for text documents topic modeling using Latent Dirichlet allocation (LDA) method. The output was just an overview of the words with corresponding probability distribution for each topic and it was hard\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/01\/word_topic_dataframe-300x112.png?resize=350%2C200","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/993"}],"collection":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/comments?post=993"}],"version-history":[{"count":13,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/993\/revisions"}],"predecessor-version":[{"id":1051,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/993\/revisions\/1051"}],"wp:attachment":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/media?parent=993"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/categories?post=993"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/tags?post=993"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}