Online Resources for Neural Networks with Python

The neural network field enjoys now a resurgence of interest. New training techniques made training deep networks feasible. With deeper networks, more training data and powerful new hardware to make it all work, deep neural networks (or “deep learning” systems) suddenly began making rapid progress in areas such as speech recognition, image classification and language translation. [1]

As result of this there are many posts or websites over the web with the source code and tutorials for neural networks of different types and complexity. Starting from simple feedforward network with just one hidden layer the authors of blog posts or tutorials are helping us to understand how to build neural net (deep or shallow).

To help to find needed python source code for neural network with desired features the website Neural Networks with Python on the Web was created.

Please feel free to add any comments, suggestions or advise the link to neural network web page (python source code) via the comments box on this page.

References
1. Why artificial intelligence is enjoying a renaissance



Thinking Patterns and Computer Programs

This post is a continuation of previous post [1] where we started to look how computer programs can increase effective thinking. In this post we will look at some patterns of human thinking and how these patterns are implemented in the computer programs.

Humans often follow others in their actions. When we think about something we often are interesting how others thinking or doing for the same or similar subject. In computer science we can find different implementations of using this approach. For example recommender systems typically produce a list of recommendations in one of two ways – through collaborative and content-based filtering or the personality-based approach. Collaborative filtering approaches building a model from a user’s past behavior (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users.[2] We can see something like this for example on Amazon website.

Thinking about situation from many different views is another useful technique. For example it could be useful think about not just what happens now but also what will happen next year or later. Or it might be useful to think about different group of users. In computer program we would need to add additional categories(attributes) to accomplish this.

According to Wikipedia there could be two types of thinking:convergent and divergent thinking. Convergent thinking involves aiming for a single, correct solution to a problem, whereas divergent thinking involves creative generation of multiple answers to a set problem. Divergent thinking is sometimes used as a synonym for creativity in psychology literature. Other researchers have occasionally used the terms flexible thinking or fluid intelligence. [3]

As a humans we might use systems thinking that can be viewed as a set of habits or practices within a framework that is based on the belief that the component parts of a system can best be understood in the context of relationships with each other and with other systems, rather than in isolation [4],[9],[10] Systems concepts are used widely in computer science – for example when we represent the system as black box, when we use feedback control or finite state machines.

Another technique for thinking is – structured thinking which is a process of putting a framework to an unstructured problem. Having a structure not only helps an analyst understand the problem at a macro level, it also helps by identifying areas which require deeper understanding. Structured Thinking allows us to map our ideas in structured fashion, thereby enabling us to identify which areas need the most attention. Mind mapping tools can help to implement structured thinking[5]. In computer science we can use decision tree to build structure from the data.

Dividing the problem in smaller problems is also useful technique that can be called as divide-and-conquer paradigm. This gives useful framework for thinking about problems. In mathematics or computer science it is used for solving problems recursively. [6]

Comparative-analysis – is the item-by-item comparison of two or more comparable alternatives, processes, products, qualifications, sets of data, systems, or the like. In accounting, for example, changes in a financial statement’s items over several accounting periods may be presented together to detect the emerging trends in the company’s operations and results. [7] In troubleshooting we often compare working device with not working device to identify the difference in the hope that it will help to understand why the device failed.

We also use organizing similar ideas or items into logical groupings. [8] By looking at differences or similarities between groups we can find new knowledge about items or group of items. It helps also to generalize our ideas or knowledge. In computer programming we can use clustering to group different items into groups. If we want to add new item to correct group we can use classification.

Thus we looked at different thinking patterns that are used by humans. Obviously computer programs has some limitations if we want to implement above patterns in our programs. However we saw that some patterns are implemented and are used in wide range of programming applications.

References
1. How Can We Use Computer Programming to Increase Effective Thinking
2. Recommender system
3. Creativity
4. Systems Thinking
5. 12 Free Mind Mapping Tools For a Data Scientist To Enhance Structured Thinking
6. Divide-and-Conquer Algorithms
7. Comparative-analysis
8. Affinity Diagram
9. General Systems Concepts
10. An Introduction to General Systems Thinking



How Can We Use Computer Programming to Increase Effective Thinking

Once a while we might find ourselves in situation when we think “I wish I knew this before” , “Why I did not think about this before” or “Why it took so long to come to this decision or action”. Can computer programs be used to help us to avoid or minimize the situations like this? Having background in computer science I decided to look at human thinking patterns and compare them with the learning computer algorithms.

The situations mentioned above as well as all our actions are result of our learning and thinking. Effective thinking and learning drive good decisions and actions.

As mentioned on Wikipedia [1] – “Learning is the act of acquiring new, or modifying and reinforcing, existing knowledge, behaviors, skills, values, or preferences and may involve synthesizing different types of information.”

Learning very closely connected to thinking. New information often can lead to new thoughts or ideas and during the thinking process we often come to the need to learn something new, to extend our knowledge.

Thinking is a process of response to external stimuli, and if thinking is effective it results in changes to or strengthening of world views, beliefs, opinions, attitudes, behaviours, skills, understanding, and knowledge.
Thinking and learning have the same outcomes, so have to be very closely related.” [2]

Current computer algorithms can be very intelligent due to the latest advances in computer sciences. Computer programs can learn information and use this information for making intelligent decisions. There are a number of computer fields associated with learning. For example machine learning, deep learning, reinforcement learning successfully provide computer algorithms for learning information in many different applications.

After learning computers make decisions based on learned information and programming instructions created by programmers.
Computers can not think (at least as of right now). Human beings can think and they are very flexible in the process of making decisions. For example they can get new ideas or apply knowledge from totally different domain area.

While computers can not think, the computer programs can be very flexible – nothing stop us from combining several algorithms to cover all or most of all possibilities, nothing stop us to produce more intelligent program.
Just simple example – program can sort apple from pear based on color, or it can use color and shape. In the second case it will be more intelligent and more accurate. If needed we may be could add even more attributes like weight, smell.

Humans have the ability to think and foresee some future situations but not always use this ability. Often people make actions following same patterns or following other people or just picking the easy or random option. It can work well but not always. And here computers can help to humans – as the computer machines can access and process a lot of information and calculate different alternatives and choose optimal solution.

Computer programs use algorithms. Scientists create algorithm and then it coded into program. Can algorithm be created for increasing effective thinking? Different people use different ways of thinking , even for the same problem. However even for different problems, we can see common thinking patterns like following from simple to more complex, dividing the something complex into smaller pieces or using similarity. Some patterns are used often some not. Can we program those patterns? In the next post or posts we will take a look at learning and thinking patterns in the context of how they are programmed for the computers.

References
1. Wikipedia – Learning
2. The Relationship Between Thinking and Learning



Web Scraping with BeautifulSoup with Python 3

Keeping up-to-date on your industry is very important as it will help make better decisions, spot threats and opportunities early on and identify the changes that you need to think about.[1] There are many ways to stay informed
and getting automatically data from the web is one of them. In this post we will take a look how to get useful information from the web using web scraping python script with BeatifulSoup.

I decided to use BeatifulSoup and found that I need modify code example from Internet as I have Python 3. So here will be shown code updated for python 3. Also I set the task to find word collocations from the text extracted. Word collocations can be very useful as they indicate some new trends or the topics of web pages.

Below is the python source code and references. In this example Wikipedia web page is used for web scraping in this script.

The first step in this code is use BeatifulSoup and get page text, page title,links. A links can be used if we want extract text from the links on the page. We extract only links that are only in div mw-category-generated.

After we got text from the web We use nltk and sklearn libraries to do text analysis of extracted content. Using sklearn library we get grams in range 1 to 5 using the method called countVectorizer. Range 1 means that we are looking at unigrams (only one word), range 2 means we are looking at bigrams (2 words).

We also find word collocations in this script. Collocations are essentially just frequent bigrams, except that we want to pay more attention to the cases that involve rare words. In particular, we want to find bigrams that occur more often than we would expect based on the frequency of the individual words. [2]


import urllib.request
from bs4 import BeautifulSoup

from sklearn.feature_extraction.text import CountVectorizer 
import nltk
from nltk.collocations import *


wiki = "https://en.wikipedia.org/wiki/Category:Artificial_intelligence"

response = urllib.request.urlopen(wiki)
the_page = response.read()
response.close



soup = BeautifulSoup(the_page)

print (soup.prettify())

print (soup.title.string)

for div in soup.findAll('div', {'class': 'mw-category-generated'}):
    for a in div.find_all("a"):
        print (a)
        print (a.attrs['href'])
print(soup.get_text())

text = soup.get_text()

# Here it gives all the grams given in a range 1 to 5.
vectorizer = CountVectorizer(ngram_range=(1,5))
analyzer = vectorizer.build_analyzer()
print (analyzer(text))

bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()

tokens = nltk.wordpunct_tokenize(text)
finder = BigramCollocationFinder.from_words(tokens)
finder.apply_freq_filter(2)
scored = finder.score_ngrams(bigram_measures.raw_freq)
print(sorted(bigram for bigram, score in scored))

The provided script is showing how to do web scraping with BeatifulSoup with pyhton 3 and how to apply text
analytics to the extracted data. This is however just beginning point to start. Fill free to provide feedback or comments or requests for updates.

References

1. Keeping Up-To-Date on Your Industry – Staying Informed
2. Language Processing and Python
3 Collocations



Using Python for Data Visualization of Clustering Results

In one of the previous post http://intelligentonlinetools.com/blog/2016/05/28/using-python-for-mining-data-from-twitter/ python source code for mining Twitter data was implemented. Clustering was applied to put tweets in different groups using bag of words representation for the text. The results of clustering were obtained via numerical matrix. Now we will look at visualization of clustering results using python. Also we will do some additional data cleaning before clustering.

Data preprocessing
The following actions are added before clustering :

  • Retweet tweets always start with text in the form “RT @name: “. The code is added to remove this text.
  • Special characters like #, ! are removed.
  • URL links are removed.
  • All numerical numbers also removed.
  • Duplicates tweets retweets are removed – we keep only one tweet

Below is the code for the above preprocessing steps. See full source code for functions right, remove_duplicates.


for counter, t in enumerate(texts):
    if t.startswith("rt @"):
          pos= t.find(": ")
          texts[counter] = right(t, len(t) - (pos+2))
          
for counter, t in enumerate(texts):
    texts[counter] = re.sub(r'[?|$|.|!|#|\-|"|\n|,|@|(|)]',r'',texts[counter])
    texts[counter] = re.sub(r'https?:\/\/.*[\r\n]*', '', texts[counter], flags=re.MULTILINE)
    texts[counter] = re.sub(r'[0|1|2|3|4|5|6|7|8|9|:]',r'',texts[counter]) 
    texts[counter] = re.sub(r'deeplearning',r'deep learning',texts[counter])      
        
texts= remove_duplicates(texts)

Plotting
The vector-space models as a choosen model for representing word meanings in this example is the problem in multidimensional space. The number of different words is high even for small set of data. There is however a tool t-SNE to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. t-SNE has a cost function that is not convex, i.e. with different initializations we can get different results. [1] Below is the python source code for building plot for visualization of clustering results.


from sklearn.manifold import TSNE

model = TSNE(n_components=2, random_state=0)
np.set_printoptions(suppress=True)
Y=model.fit_transform(train_data_features)

plt.scatter(Y[:, 0], Y[:, 1], c=clustering_result, s=290,alpha=.5)
plt.show()

The resulting visualization is shown below

Data Visualization for Clustering Results
Data Visualization for Clustering Results

Analysis
Additionally to visualization the silhouette_score was computed and the obtained value was around 0.2


silhouette_avg = silhouette_score(train_data_features, clustering_result)

The silhouette_score gives the average value for all the samples. This gives a perspective into the density and separation of the formed clusters.
Silhoette coefficients (as these values are referred to as) near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster. [2]

Thus in this post python script for visualization of clustering results was provided. The clustering was applied to results of Twitter search for some specific phrase.

It should be noted that clustering of tweets data is challenging as the tweet length can be only 140 characters or less. Such problems are related to short text clustering and there are some additional technique that can be applied to get better results. [3]-[6]
Below is the full script code.


import twitter
import json

import matplotlib.pyplot as plt
import numpy as np

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.cluster import Birch
from sklearn.manifold import TSNE

import re

from sklearn.metrics import silhouette_score

# below function is from
# http://www.dotnetperls.com/duplicates-python
def remove_duplicates(values):
    output = []
    seen = set()
    for value in values:
        # If value has not been encountered yet,
        # ... add it to both list and set.
        if value not in seen:
            output.append(value)
            seen.add(value)
    return output

# below 2 functions are from
# http://stackoverflow.com/questions/22586286/
#         python-is-there-an-equivalent-of-mid-right-and-left-from-basic
def left(s, amount = 1, substring = ""):
    if (substring == ""):
        return s[:amount]
    else:
        if (len(substring) > amount):
            substring = substring[:amount]
        return substring + s[:-amount]

def right(s, amount = 1, substring = ""):
    if (substring == ""):
        return s[-amount:]
    else:
        if (len(substring) > amount):
            substring = substring[:amount]
        return s[:-amount] + substring


CONSUMER_KEY ="xxxxxxx"
CONSUMER_SECRET ="xxxxxxx"
OAUTH_TOKEN = "xxxxxx"
OAUTH_TOKEN_SECRET = "xxxxxx"


auth = twitter.oauth.OAuth (OAUTH_TOKEN, OAUTH_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET)

twitter_api= twitter.Twitter(auth=auth)
q='#deep learning'
count=100

# Do search for tweets containing '#deep learning'
search_results = twitter_api.search.tweets (q=q, count=count)

statuses=search_results['statuses']

# Iterate through 5 more batches of results by following the cursor
for _ in range(5):
   print ("Length of statuses", len(statuses))
   try:
        next_results = search_results['search_metadata']['next_results']
   except KeyError:   
       break
   # Create a dictionary from next_results
   kwargs=dict( [kv.split('=') for kv in next_results[1:].split("&") ])

   search_results = twitter_api.search.tweets(**kwargs)
   statuses += search_results['statuses']

# Show one sample search result by slicing the list
print (json.dumps(statuses[0], indent=10))



# Extracting data such as hashtags, urls, texts and created at date
hashtags = [ hashtag['text'].lower()
    for status in statuses
       for hashtag in status['entities']['hashtags'] ]


urls = [ urls['url']
    for status in statuses
       for urls in status['entities']['urls'] ]


texts = [ status['text'].lower()
    for status in statuses
        ]

created_ats = [ status['created_at']
    for status in statuses
        ]

# Preparing data for trending in the format: date word
i=0
print ("===============================\n")
for x in created_ats:
     for w in texts[i].split(" "):
        if len(w)>=2:
              print (x[4:10], x[26:31] ," ", w)
     i=i+1

# Prepare tweets data for clustering
# Converting text data into bag of words model

vectorizer = CountVectorizer(analyzer = "word", \
                             tokenizer = None,  \
                             preprocessor = None,  \
                             stop_words='english', \
                             max_features = 5000) 



for counter, t in enumerate(texts):
    if t.startswith("rt @"):
          pos= t.find(": ")
          texts[counter] = right(t, len(t) - (pos+2))
          
for counter, t in enumerate(texts):
    texts[counter] = re.sub(r'[?|$|.|!|#|\-|"|\n|,|@|(|)]',r'',texts[counter])
    texts[counter] = re.sub(r'https?:\/\/.*[\r\n]*', '', texts[counter], flags=re.MULTILINE)
    texts[counter] = re.sub(r'[0|1|2|3|4|5|6|7|8|9|:]',r'',texts[counter]) 
    texts[counter] = re.sub(r'deeplearning',r'deep learning',texts[counter])      
        
texts= remove_duplicates(texts)  

train_data_features = vectorizer.fit_transform(texts)
train_data_features = train_data_features.toarray()

print (train_data_features.shape)
print (train_data_features)

vocab = vectorizer.get_feature_names()
print (vocab)

dist = np.sum(train_data_features, axis=0)

# For each, print the vocabulary word and the number of times it 
# appears in the training set
for tag, count in zip(vocab, dist):
    print (count, tag)


# Clustering data
n_clusters=7
brc = Birch(branching_factor=50, n_clusters=n_clusters, threshold=0.5,  compute_labels=True)
brc.fit(train_data_features)

clustering_result=brc.predict(train_data_features)
print ("\nClustering_result:\n")
print (clustering_result)

# Outputting some data
print (json.dumps(hashtags[0:50], indent=1))
print (json.dumps(urls[0:50], indent=1))
print (json.dumps(texts[0:50], indent=1))
print (json.dumps(created_ats[0:50], indent=1))


with open("data.txt", "a") as myfile:
     for w in hashtags: 
           myfile.write(str(w.encode('ascii', 'ignore')))
           myfile.write("\n")



# count of word frequencies
wordcounts = {}
for term in hashtags:
    wordcounts[term] = wordcounts.get(term, 0) + 1


items = [(v, k) for k, v in wordcounts.items()]
print (len(items))

xnum=[i for i in range(len(items))]
for count, word in sorted(items, reverse=True):
    print("%5d %s" % (count, word))
   


for x in created_ats:
  print (x)
  print (x[4:10])
  print (x[26:31])
  print (x[4:7])



plt.figure(1)
plt.title("Frequency of Hashtags")

myarray = np.array(sorted(items, reverse=True))

plt.xticks(xnum, myarray[:,1],rotation='vertical')
plt.plot (xnum, myarray[:,0])
plt.show()


model = TSNE(n_components=2, random_state=0)
np.set_printoptions(suppress=True)
Y=model.fit_transform(train_data_features)
print (Y)


plt.figure(2)
plt.scatter(Y[:, 0], Y[:, 1], c=clustering_result, s=290,alpha=.5)

for j in range(len(texts)):    
   plt.annotate(clustering_result[j],xy=(Y[j][0], Y[j][1]),xytext=(0,0),textcoords='offset points')
   print ("%s %s" % (clustering_result[j],  texts[j]))
            
plt.show()

silhouette_avg = silhouette_score(train_data_features, clustering_result)
print("For n_clusters =", n_clusters, "The average silhouette_score is :", silhouette_avg)

References

1. sklearn.manifold.TSNE
2. plot_kmeans_silhouette_analysis
3. A new AntTree-based algorithm for clustering short-text corpora Marcelo Luis Errecalde, Diego Alejandro Ingaramo, Paolo Rosso, JCS&T Vol. 10 No. 1
4. Crest: Cluster-based Representation
Enrichment for Short Text Classification
Zichao Dai, Aixin Sun, Xu-Ying Liu
5. Enriching short text representation in microblog for clustering Jiliang TANG , Xufei WANG, Huiji GAO, Xia HU, Huan LIU, Front. Comput. Sci., 2012, 6(1)
6. Clustering Short Texts using Wikipedia Somnath Banerjee, Krishnan Ramanathan, Ajay Gupta, HPL-2008-41