Data Mining Twitter Data with Python

Twitter is an online social networking service that enables users to send and read short 140-character messages called “tweets”. [1]
Twitter users are tweeting about different topics based on their interests and goals.
A word, phrase or topic that is mentioned at a greater rate than others is said to be a “trending topic”. Trending topics become popular either through a concerted effort by users, or because of an event that prompts people to talk about a specific topic. [1]
There is wide interest in analyzing of trending data from Twitter.
And in this post we will look at searching and downloading the tweets related to specific hashtag. We will use Python and Twitter API. Our example will be search tweets related to “deep learning”. After downloading Twitter data we will also look at some data manipulations with the data.

The example of downloading of Twitter data is based on the work [2]
Below is source code:

import twitter
import json
CONSUMER_KEY =”xxxxxx”
OAUTH_TOKEN = “xxxxxx”
twitter_api= twitter.Twitter(auth=auth)
q=’#deep learning’
search_results = (q=q, count=count)
for _ in range(5):
    print “Length of statuses”, len(statuses)
    except KeyError, e: #result does not exist
    kwargs=dict( [kv.split(‘=’) for kv in next_results[1:].split(“&”)])
    search_results =**kwargs)
    statuses += search_results[‘statuses’]
# Show one sample search result by slicing the list
print json.dumps(statuses[0], indent=10)
hashtags = [ hashtag[‘text’]
    for status in statuses
        for hashtag in status[‘entities’][‘hashtags’] ]
urls = [ urls[‘url’]
    for status in statuses
        for urls in status[‘entities’][‘urls’] ]
texts = [ status[‘text’]
    for status in statuses
#Created_at is date time when created
created_ats = [ status[‘created_at’]
    for status in statuses
print json.dumps(hashtags[0:50], indent=1)
print json.dumps(urls[0:50], indent=1)
print json.dumps(texts[0:50], indent=1)
print json.dumps(created_ats[0:50], indent=1)
# Now we append some data into the file
with open(“data.txt”, “a”) as myfile:
    for w in hashtags:         myfile.write(w)
# count of word frequencies
wordcounts = {}
for term in hashtags:
    wordcounts[term] = wordcounts.get(term, 0) + 1
items = [(v, k) for k, v in wordcounts.items()]
for count, word in sorted(items, reverse=True):
    print(“%5d %s” % (count, word))
# in case we need extract date or month or year
for x in created_ats:
    print x
    print x[4:10]
    print x[26:31]
    print x[4:7]

Output example for last for loop (just one cycle)
Wed Mar 30 02:10:20 +0000 2016
Mar 30

Any comments or suggestions are welcome.

[1] Twitter, From Wikipedia
TWITTER by Abhishanga Upadhyay, Luis Mao, Malavika Goda Krishna

Leave a Comment