Data Mining Twitter Data with Python

Twitter is an online social networking service that enables users to send and read short 140-character messages called “tweets”. [1]
Twitter users are tweeting about different topics based on their interests and goals.
A word, phrase or topic that is mentioned at a greater rate than others is said to be a “trending topic”. Trending topics become popular either through a concerted effort by users, or because of an event that prompts people to talk about a specific topic. [1]
There is wide interest in analyzing of trending data from Twitter.
And in this post we will look at searching and downloading the tweets related to specific hashtag. We will use Python and Twitter API. Our example will be search tweets related to “deep learning”. After downloading Twitter data we will also look at some data manipulations with the data.

The example of downloading of Twitter data is based on the work [2]
Below is source code:

import twitter
import json
CONSUMER_KEY =”xxxxxx”
CONSUMER_SECRET =”xxxxxx”
OAUTH_TOKEN = “xxxxxx”
OAUTH_TOKEN_SECRET = “xxxxxx”
auth = twitter.oauth.OAuth (OAUTH_TOKEN, OAUTH_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET)
twitter_api= twitter.Twitter(auth=auth)
q=’#deep learning’
count=100
search_results = twitter_api.search.tweets (q=q, count=count)
statuses=search_results[‘statuses’]
for _ in range(5):
    print “Length of statuses”, len(statuses)
    try:
        next_results=search_results[‘search_metadata’][‘next_results’]
    except KeyError, e: #result does not exist
         break
    kwargs=dict( [kv.split(‘=’) for kv in next_results[1:].split(“&”)])
    search_results = twitter_api.search.tweets(**kwargs)
    statuses += search_results[‘statuses’]
# Show one sample search result by slicing the list
print json.dumps(statuses[0], indent=10)
hashtags = [ hashtag[‘text’]
    for status in statuses
        for hashtag in status[‘entities’][‘hashtags’] ]
urls = [ urls[‘url’]
    for status in statuses
        for urls in status[‘entities’][‘urls’] ]
texts = [ status[‘text’]
    for status in statuses
         ]
#Created_at is date time when created
created_ats = [ status[‘created_at’]
    for status in statuses
        ]
print json.dumps(hashtags[0:50], indent=1)
print json.dumps(urls[0:50], indent=1)
print json.dumps(texts[0:50], indent=1)
print json.dumps(created_ats[0:50], indent=1)
# Now we append some data into the file
with open(“data.txt”, “a”) as myfile:
    for w in hashtags:         myfile.write(w)
        myfile.write(“\n”)
# count of word frequencies
wordcounts = {}
for term in hashtags:
    wordcounts[term] = wordcounts.get(term, 0) + 1
items = [(v, k) for k, v in wordcounts.items()]
for count, word in sorted(items, reverse=True):
    print(“%5d %s” % (count, word))
# in case we need extract date or month or year
for x in created_ats:
    print x
    print x[4:10]
    print x[26:31]
    print x[4:7]

Output example for last for loop (just one cycle)
Wed Mar 30 02:10:20 +0000 2016
Mar 30
2016
Mar

Any comments or suggestions are welcome.

References
[1] https://en.wikipedia.org/wiki/Twitter Twitter, From Wikipedia
[2] http://www-scf.usc.edu/~aupadhya/Mining.pdf MINING DATA FROM
TWITTER by Abhishanga Upadhyay, Luis Mao, Malavika Goda Krishna

Related

Leave a Comment Cancel reply

Share this:

Related

Leave a Comment Cancel reply