{"id":256,"date":"2016-06-05T13:05:05","date_gmt":"2016-06-05T13:05:05","guid":{"rendered":"http:\/\/intelligentonlinetools.com\/blog\/?p=256"},"modified":"2016-06-14T01:10:59","modified_gmt":"2016-06-14T01:10:59","slug":"using-python-for-mining-data-from-twitter-visualization-and-other-enchancements","status":"publish","type":"post","link":"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/","title":{"rendered":"Using Python for Data Visualization of Clustering Results"},"content":{"rendered":"<p>In one of the previous post <a href=http:\/\/intelligentonlinetools.com\/blog\/2016\/05\/28\/using-python-for-mining-data-from-twitter\/ target=\"_blank\"> http:\/\/intelligentonlinetools.com\/blog\/2016\/05\/28\/using-python-for-mining-data-from-twitter\/<\/a> python source code for mining Twitter data was implemented. Clustering was applied to put tweets in different groups using bag of words representation for the text. The results of clustering were obtained via numerical matrix. Now we will look at visualization of clustering results using python. Also we will do some additional data cleaning before clustering.<\/p>\n<p><strong>Data preprocessing<\/strong><br \/>\nThe following actions are added before clustering :<\/p>\n<ul>\n<li>Retweet tweets always start with text in the form &#8220;RT @name: &#8220;. The code is added to remove this text.<\/li>\n<li>Special characters like #, ! are removed.<\/li>\n<li>URL links are removed.<\/li>\n<li>All numerical numbers also removed.<\/li>\n<li>Duplicates tweets retweets are removed &#8211; we keep only one tweet<\/li>\n<\/ul>\n<p>Below is the code for the above preprocessing steps. See full source code for functions right, remove_duplicates.<\/p>\n<pre><code>\r\nfor counter, t in enumerate(texts):\r\n    if t.startswith(\"rt @\"):\r\n          pos= t.find(\": \")\r\n          texts[counter] = right(t, len(t) - (pos+2))\r\n          \r\nfor counter, t in enumerate(texts):\r\n    texts[counter] = re.sub(r'[?|$|.|!|#|\\-|\"|\\n|,|@|(|)]',r'',texts[counter])\r\n    texts[counter] = re.sub(r'https?:\\\/\\\/.*[\\r\\n]*', '', texts[counter], flags=re.MULTILINE)\r\n    texts[counter] = re.sub(r'[0|1|2|3|4|5|6|7|8|9|:]',r'',texts[counter]) \r\n    texts[counter] = re.sub(r'deeplearning',r'deep learning',texts[counter])      \r\n        \r\ntexts= remove_duplicates(texts)\r\n<\/code><\/pre>\n<p><strong>Plotting<\/strong><br \/>\nThe vector-space models as a choosen model for representing word meanings in this example is the problem in multidimensional space.  The number of different words is high even for small set of data. There is however a tool t-SNE  to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. t-SNE has a cost function that is not convex, i.e. with different initializations we can get different results. [1] Below is the python source code for building plot for visualization of clustering results. <\/p>\n<pre><code>\r\nfrom sklearn.manifold import TSNE\r\n\r\nmodel = TSNE(n_components=2, random_state=0)\r\nnp.set_printoptions(suppress=True)\r\nY=model.fit_transform(train_data_features)\r\n\r\nplt.scatter(Y[:, 0], Y[:, 1], c=clustering_result, s=290,alpha=.5)\r\nplt.show()\r\n<\/code><\/pre>\n<p>The resulting visualization is shown below<br \/>\n<figure id=\"attachment_297\" aria-describedby=\"caption-attachment-297\" style=\"width: 290px\" class=\"wp-caption alignnone\"><img data-attachment-id=\"297\" data-permalink=\"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/data-visualization1\/#main\" data-orig-file=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/06\/data-visualization1.png\" data-orig-size=\"619,454\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"data visualization1\" data-image-description=\"&lt;p&gt;Data Visualization for Clustering Results&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Data Visualization for Clustering Results&lt;\/p&gt;\n\" data-medium-file=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/06\/data-visualization1-300x220.png\" data-large-file=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/06\/data-visualization1.png\" decoding=\"async\" loading=\"lazy\" src=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/06\/data-visualization1-300x220.png\" alt=\"Data Visualization for Clustering Results\" width=\"350\" height=\"330\"  class=\"size-medium wp-image-297\"\/><figcaption id=\"caption-attachment-297\" class=\"wp-caption-text\">Data Visualization for Clustering Results<\/figcaption><\/figure><\/p>\n<p><strong>Analysis<\/strong><br \/>\nAdditionally to visualization the silhouette_score  was computed and the obtained value was around 0.2<\/p>\n<pre><code>\r\nsilhouette_avg = silhouette_score(train_data_features, clustering_result)\r\n<\/code><\/pre>\n<p>The silhouette_score gives the average value for all the samples. This gives a perspective into the density and separation of the formed clusters.<br \/>\nSilhoette coefficients (as these values are referred to as) near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster.  [2]<\/p>\n<p>Thus in this post python script for visualization of clustering results was provided. The clustering was applied to results of Twitter search for some specific phrase.<\/p>\n<p>It should be noted that clustering of tweets data is challenging as the tweet length can be only 140 characters or less. Such problems are related to short text clustering and there are some additional technique that can be applied to get better results. [3]-[6]<br \/>\nBelow is the full script code. <\/p>\n<pre><code>\r\nimport twitter\r\nimport json\r\n\r\nimport matplotlib.pyplot as plt\r\nimport numpy as np\r\n\r\nfrom sklearn.feature_extraction.text import CountVectorizer\r\nfrom sklearn.cluster import Birch\r\nfrom sklearn.manifold import TSNE\r\n\r\nimport re\r\n\r\nfrom sklearn.metrics import silhouette_score\r\n\r\n# below function is from\r\n# http:\/\/www.dotnetperls.com\/duplicates-python\r\ndef remove_duplicates(values):\r\n    output = []\r\n    seen = set()\r\n    for value in values:\r\n        # If value has not been encountered yet,\r\n        # ... add it to both list and set.\r\n        if value not in seen:\r\n            output.append(value)\r\n            seen.add(value)\r\n    return output\r\n\r\n# below 2 functions are from\r\n# http:\/\/stackoverflow.com\/questions\/22586286\/\r\n#         python-is-there-an-equivalent-of-mid-right-and-left-from-basic\r\ndef left(s, amount = 1, substring = \"\"):\r\n    if (substring == \"\"):\r\n        return s[:amount]\r\n    else:\r\n        if (len(substring) > amount):\r\n            substring = substring[:amount]\r\n        return substring + s[:-amount]\r\n\r\ndef right(s, amount = 1, substring = \"\"):\r\n    if (substring == \"\"):\r\n        return s[-amount:]\r\n    else:\r\n        if (len(substring) > amount):\r\n            substring = substring[:amount]\r\n        return s[:-amount] + substring\r\n\r\n\r\nCONSUMER_KEY =\"xxxxxxx\"\r\nCONSUMER_SECRET =\"xxxxxxx\"\r\nOAUTH_TOKEN = \"xxxxxx\"\r\nOAUTH_TOKEN_SECRET = \"xxxxxx\"\r\n\r\n\r\nauth = twitter.oauth.OAuth (OAUTH_TOKEN, OAUTH_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET)\r\n\r\ntwitter_api= twitter.Twitter(auth=auth)\r\nq='#deep learning'\r\ncount=100\r\n\r\n# Do search for tweets containing '#deep learning'\r\nsearch_results = twitter_api.search.tweets (q=q, count=count)\r\n\r\nstatuses=search_results['statuses']\r\n\r\n# Iterate through 5 more batches of results by following the cursor\r\nfor _ in range(5):\r\n   print (\"Length of statuses\", len(statuses))\r\n   try:\r\n        next_results = search_results['search_metadata']['next_results']\r\n   except KeyError:   \r\n       break\r\n   # Create a dictionary from next_results\r\n   kwargs=dict( [kv.split('=') for kv in next_results[1:].split(\"&\") ])\r\n\r\n   search_results = twitter_api.search.tweets(**kwargs)\r\n   statuses += search_results['statuses']\r\n\r\n# Show one sample search result by slicing the list\r\nprint (json.dumps(statuses[0], indent=10))\r\n\r\n\r\n\r\n# Extracting data such as hashtags, urls, texts and created at date\r\nhashtags = [ hashtag['text'].lower()\r\n    for status in statuses\r\n       for hashtag in status['entities']['hashtags'] ]\r\n\r\n\r\nurls = [ urls['url']\r\n    for status in statuses\r\n       for urls in status['entities']['urls'] ]\r\n\r\n\r\ntexts = [ status['text'].lower()\r\n    for status in statuses\r\n        ]\r\n\r\ncreated_ats = [ status['created_at']\r\n    for status in statuses\r\n        ]\r\n\r\n# Preparing data for trending in the format: date word\r\ni=0\r\nprint (\"===============================\\n\")\r\nfor x in created_ats:\r\n     for w in texts[i].split(\" \"):\r\n        if len(w)>=2:\r\n              print (x[4:10], x[26:31] ,\" \", w)\r\n     i=i+1\r\n\r\n# Prepare tweets data for clustering\r\n# Converting text data into bag of words model\r\n\r\nvectorizer = CountVectorizer(analyzer = \"word\", \\\r\n                             tokenizer = None,  \\\r\n                             preprocessor = None,  \\\r\n                             stop_words='english', \\\r\n                             max_features = 5000) \r\n\r\n\r\n\r\nfor counter, t in enumerate(texts):\r\n    if t.startswith(\"rt @\"):\r\n          pos= t.find(\": \")\r\n          texts[counter] = right(t, len(t) - (pos+2))\r\n          \r\nfor counter, t in enumerate(texts):\r\n    texts[counter] = re.sub(r'[?|$|.|!|#|\\-|\"|\\n|,|@|(|)]',r'',texts[counter])\r\n    texts[counter] = re.sub(r'https?:\\\/\\\/.*[\\r\\n]*', '', texts[counter], flags=re.MULTILINE)\r\n    texts[counter] = re.sub(r'[0|1|2|3|4|5|6|7|8|9|:]',r'',texts[counter]) \r\n    texts[counter] = re.sub(r'deeplearning',r'deep learning',texts[counter])      \r\n        \r\ntexts= remove_duplicates(texts)  \r\n\r\ntrain_data_features = vectorizer.fit_transform(texts)\r\ntrain_data_features = train_data_features.toarray()\r\n\r\nprint (train_data_features.shape)\r\nprint (train_data_features)\r\n\r\nvocab = vectorizer.get_feature_names()\r\nprint (vocab)\r\n\r\ndist = np.sum(train_data_features, axis=0)\r\n\r\n# For each, print the vocabulary word and the number of times it \r\n# appears in the training set\r\nfor tag, count in zip(vocab, dist):\r\n    print (count, tag)\r\n\r\n\r\n# Clustering data\r\nn_clusters=7\r\nbrc = Birch(branching_factor=50, n_clusters=n_clusters, threshold=0.5,  compute_labels=True)\r\nbrc.fit(train_data_features)\r\n\r\nclustering_result=brc.predict(train_data_features)\r\nprint (\"\\nClustering_result:\\n\")\r\nprint (clustering_result)\r\n\r\n# Outputting some data\r\nprint (json.dumps(hashtags[0:50], indent=1))\r\nprint (json.dumps(urls[0:50], indent=1))\r\nprint (json.dumps(texts[0:50], indent=1))\r\nprint (json.dumps(created_ats[0:50], indent=1))\r\n\r\n\r\nwith open(\"data.txt\", \"a\") as myfile:\r\n     for w in hashtags: \r\n           myfile.write(str(w.encode('ascii', 'ignore')))\r\n           myfile.write(\"\\n\")\r\n\r\n\r\n\r\n# count of word frequencies\r\nwordcounts = {}\r\nfor term in hashtags:\r\n    wordcounts[term] = wordcounts.get(term, 0) + 1\r\n\r\n\r\nitems = [(v, k) for k, v in wordcounts.items()]\r\nprint (len(items))\r\n\r\nxnum=[i for i in range(len(items))]\r\nfor count, word in sorted(items, reverse=True):\r\n    print(\"%5d %s\" % (count, word))\r\n   \r\n\r\n\r\nfor x in created_ats:\r\n  print (x)\r\n  print (x[4:10])\r\n  print (x[26:31])\r\n  print (x[4:7])\r\n\r\n\r\n\r\nplt.figure(1)\r\nplt.title(\"Frequency of Hashtags\")\r\n\r\nmyarray = np.array(sorted(items, reverse=True))\r\n\r\nplt.xticks(xnum, myarray[:,1],rotation='vertical')\r\nplt.plot (xnum, myarray[:,0])\r\nplt.show()\r\n\r\n\r\nmodel = TSNE(n_components=2, random_state=0)\r\nnp.set_printoptions(suppress=True)\r\nY=model.fit_transform(train_data_features)\r\nprint (Y)\r\n\r\n\r\nplt.figure(2)\r\nplt.scatter(Y[:, 0], Y[:, 1], c=clustering_result, s=290,alpha=.5)\r\n\r\nfor j in range(len(texts)):    \r\n   plt.annotate(clustering_result[j],xy=(Y[j][0], Y[j][1]),xytext=(0,0),textcoords='offset points')\r\n   print (\"%s %s\" % (clustering_result[j],  texts[j]))\r\n            \r\nplt.show()\r\n\r\nsilhouette_avg = silhouette_score(train_data_features, clustering_result)\r\nprint(\"For n_clusters =\", n_clusters, \"The average silhouette_score is :\", silhouette_avg)\r\n<\/code><\/pre>\n<p><strong>References<\/strong><\/p>\n<p>1. <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.manifold.TSNE.html\" target=\"_blank\">sklearn.manifold.TSNE<\/a><br \/>\n2. <a href=\"http:\/\/scikit-learn.org\/stable\/auto_examples\/cluster\/plot_kmeans_silhouette_analysis.html\" target=\"_blank\">plot_kmeans_silhouette_analysis<\/a><br \/>\n3. <a href=\"http:\/\/users.dsic.upv.es\/~prosso\/resources\/ErrecaldeEtAl_JCST10.pdf\" target=\"_blank\">A new AntTree-based algorithm for clustering short-text corpora<\/a>  Marcelo Luis Errecalde, Diego Alejandro Ingaramo, Paolo Rosso, JCS&#038;T Vol. 10 No. 1<br \/>\n4. <a href=\"http:\/\/www.ntu.edu.sg\/home\/axsun\/paper\/sun_pakdd13.pdf\" target=\"_blank\">Crest: Cluster-based Representation<br \/>\nEnrichment for Short Text Classification<\/a> Zichao Dai, Aixin Sun, Xu-Ying Liu<br \/>\n5. <a href=\"http:\/\/dmml.asu.edu\/users\/xufei\/Papers\/FCS-11167-JLT.pdf\" target=\"_blank\">Enriching short text representation in microblog for clustering<\/a> Jiliang TANG , Xufei WANG, Huiji GAO, Xia HU, Huan LIU, Front. Comput. Sci., 2012, 6(1)<br \/>\n6. <a href=\"http:\/\/www.hpl.hp.com\/techreports\/2008\/HPL-2008-41.pdf\" target=\"_blank\">Clustering Short Texts using Wikipedia<\/a> Somnath Banerjee, Krishnan Ramanathan, Ajay Gupta,  HPL-2008-41 <\/p>\n","protected":false},"excerpt":{"rendered":"<p>In one of the previous post http:\/\/intelligentonlinetools.com\/blog\/2016\/05\/28\/using-python-for-mining-data-from-twitter\/ python source code for mining Twitter data was implemented. Clustering was applied to put tweets in different groups using bag of words representation for the text. The results of clustering were obtained via numerical matrix. Now we will look at visualization of clustering results using python. Also we &#8230; <a title=\"Using Python for Data Visualization of Clustering Results\" class=\"read-more\" href=\"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":[]},"categories":[5,2,6,9,10],"tags":[],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Using Python for Data Visualization of Clustering Results - Machine Learning Applications<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using Python for Data Visualization of Clustering Results - Machine Learning Applications\" \/>\n<meta property=\"og:description\" content=\"In one of the previous post http:\/\/intelligentonlinetools.com\/blog\/2016\/05\/28\/using-python-for-mining-data-from-twitter\/ python source code for mining Twitter data was implemented. Clustering was applied to put tweets in different groups using bag of words representation for the text. The results of clustering were obtained via numerical matrix. Now we will look at visualization of clustering results using python. Also we ... Read more\" \/>\n<meta property=\"og:url\" content=\"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/\" \/>\n<meta property=\"og:site_name\" content=\"Machine Learning Applications\" \/>\n<meta property=\"article:published_time\" content=\"2016-06-05T13:05:05+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-06-14T01:10:59+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/06\/data-visualization1-300x220.png\" \/>\n<meta name=\"author\" content=\"owygs156\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"owygs156\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/\",\"url\":\"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/\",\"name\":\"Using Python for Data Visualization of Clustering Results - Machine Learning Applications\",\"isPartOf\":{\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#website\"},\"datePublished\":\"2016-06-05T13:05:05+00:00\",\"dateModified\":\"2016-06-14T01:10:59+00:00\",\"author\":{\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478\"},\"breadcrumb\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/intelligentonlinetools.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Using Python for Data Visualization of Clustering Results\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#website\",\"url\":\"https:\/\/intelligentonlinetools.com\/blog\/\",\"name\":\"Machine Learning Applications\",\"description\":\"Artificial intelligence, data mining and machine learning for building web based tools and services.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478\",\"name\":\"owygs156\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"contentUrl\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"caption\":\"owygs156\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Using Python for Data Visualization of Clustering Results - Machine Learning Applications","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/","og_locale":"en_US","og_type":"article","og_title":"Using Python for Data Visualization of Clustering Results - Machine Learning Applications","og_description":"In one of the previous post http:\/\/intelligentonlinetools.com\/blog\/2016\/05\/28\/using-python-for-mining-data-from-twitter\/ python source code for mining Twitter data was implemented. Clustering was applied to put tweets in different groups using bag of words representation for the text. The results of clustering were obtained via numerical matrix. Now we will look at visualization of clustering results using python. Also we ... Read more","og_url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/","og_site_name":"Machine Learning Applications","article_published_time":"2016-06-05T13:05:05+00:00","article_modified_time":"2016-06-14T01:10:59+00:00","og_image":[{"url":"http:\/\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/06\/data-visualization1-300x220.png"}],"author":"owygs156","twitter_card":"summary_large_image","twitter_misc":{"Written by":"owygs156","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/","url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/","name":"Using Python for Data Visualization of Clustering Results - Machine Learning Applications","isPartOf":{"@id":"https:\/\/intelligentonlinetools.com\/blog\/#website"},"datePublished":"2016-06-05T13:05:05+00:00","dateModified":"2016-06-14T01:10:59+00:00","author":{"@id":"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478"},"breadcrumb":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/"]}]},{"@type":"BreadcrumbList","@id":"http:\/\/intelligentonlinetools.com\/blog\/2016\/06\/05\/using-python-for-mining-data-from-twitter-visualization-and-other-enchancements\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/intelligentonlinetools.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Using Python for Data Visualization of Clustering Results"}]},{"@type":"WebSite","@id":"https:\/\/intelligentonlinetools.com\/blog\/#website","url":"https:\/\/intelligentonlinetools.com\/blog\/","name":"Machine Learning Applications","description":"Artificial intelligence, data mining and machine learning for building web based tools and services.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478","name":"owygs156","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/image\/","url":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","contentUrl":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","caption":"owygs156"}}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p7h1IJ-48","jetpack-related-posts":[{"id":227,"url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/05\/28\/using-python-for-mining-data-from-twitter\/","url_meta":{"origin":256,"position":0},"title":"Using Python for Mining Data From Twitter","date":"May 28, 2016","format":false,"excerpt":"Twitter is increasingly being used for business or personal purposes. With Twitter API there is also an opportunity to do data mining of data (tweets) and find interesting information. In this post we will take a look how to get data from Twitter, prepare data for analysis and then do\u2026","rel":"","context":"In &quot;Artificial Intelligence&quot;","img":{"alt_text":"Frequency of Hashtags","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/05\/Frequency-of-Hashtags-300x171.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":521,"url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/26\/bio-inspired-optimization-for-text-mining-4\/","url_meta":{"origin":256,"position":1},"title":"Bio-Inspired Optimization for Text Mining-4","date":"August 26, 2016","format":false,"excerpt":"Clustering Text Data In previous post Bio-Inspired Optimization was applied for clustering of numerical data. In this post text data will be used for clustering. So python source code will be modified for clustering of text data. This data will be initialized in the beginning of this python script with\u2026","rel":"","context":"In &quot;Machine Learning&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":450,"url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/03\/bio-inspired-optimization-for-text-mining-2\/","url_meta":{"origin":256,"position":2},"title":"Bio-Inspired Optimization for Text Mining-2","date":"August 3, 2016","format":false,"excerpt":"Numerical One Dimensional Example In the previous code Bio-Inspired Optimization for Text Mining-1 Motivation we implemented source code for optimization some function using bio-inspired algorithm. Now we need to put actual function for clustering. In clustering we want to group our clusters in such way that the distance from each\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1289,"url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/07\/03\/algorithms-metrics-and-online-tool-for-clustering\/","url_meta":{"origin":256,"position":3},"title":"Algorithms, Metrics and Online Tool for Clustering","date":"July 3, 2017","format":false,"excerpt":"One of the key techniques of exploratory data mining is clustering \u2013 separating instances into distinct groups based on some measure of similarity. [1] In this post we will review how we can do clustering, evaluate and visualize results using online ML Sandbox tool from this website. This tool allows\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2017\/07\/kmeans-clustering-iris-300x286.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":426,"url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/07\/29\/bio-inspired-optimization-for-text-mining-1\/","url_meta":{"origin":256,"position":4},"title":"Bio-Inspired Optimization for Text Mining-1","date":"July 29, 2016","format":false,"excerpt":"Motivation Optimization problem studies maximizing or minimizing some function y=f(x) with some range of choices available for x. Biologically inspired (bio-inspired) algorithms for optimization problems are now widely used. A few examples of such optimization are: particle swarm optimization (PSO) that is based on the swarming behavior of fish and\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":498,"url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/13\/bio-inspired-optimization-for-text-mining-3\/","url_meta":{"origin":256,"position":5},"title":"Bio-Inspired Optimization for Text Mining-3","date":"August 13, 2016","format":false,"excerpt":"Clustering Numerical Multidimensional Data In this post we will implement Bio Inspired Optimization for clustering multidimensional data. We will use two dimensional data array \"data\" however the code can be used for any reasonable size of array. To do this parameter num_dimensions should be set to data array dimension. We\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/256"}],"collection":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/comments?post=256"}],"version-history":[{"count":37,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/256\/revisions"}],"predecessor-version":[{"id":311,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/256\/revisions\/311"}],"wp:attachment":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/media?parent=256"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/categories?post=256"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/tags?post=256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}