API Archives - Machine Learning Applications

Recently I visited great website Pixabay [1] that offers a wide range of images from people all around the world. These images are free to use even for commercial use. And there is an API [2] for accessing images on Pixabay. This brings a lot of ideas for interesting web applications with using of machine learning technique.

For example, what if we want find and download 10 images that very closely match to current image or theme. Or maybe there is a need to automatically scan new images that match some theme. As the first step in this direction, in this post we will look how to download images from Pixabay, save and do some analysis of images like calculating similarity between images.

As usually with most of APIs, the first step is sign up and get API key. This is absolutely free on Pixabay.

We will use python library python_pixabay to get links to images from Pixabay site.
To download images to local folder the python library urllib.request is used in the script.

Once the images are saved on local folder, we can calculate similarity between any chosen two images.
The python code for similarity functions is taken from [4]. In this post image similarity histogram via PIL (python image library) and image similarity vectors via numpy are calculated.

An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value. By looking at the histogram for a specific image a viewer will be able to judge the entire tonal distribution at a glance.[5]

In the end of script, the image similarity via vectors between images in pair is calculated. They all are in the range between 0 and 1. The script is downloading only 8 images from Pixabay and is using default image search function.

Thus we learned how to find and download images from Pixabay website. Also few techniques for calculating image similarities were tested.

Here is the source code of the script.


# -*- coding: utf-8 -*-

import python_pixabay
import urllib

from PIL import Image
apikey="xxxxxxxxxxx"
pix = python_pixabay.Pixabay(apikey)

# default image search
img_search = pix.image_search()

# view the content of the searches
print(img_search)


hits=img_search.get("hits")

images=[]
for hit in hits:
    userImageURL = hit["userImageURL"]
    print (userImageURL)
    images.append (userImageURL)


print (images)    


from functools import reduce

local_filenames=[]
import urllib.request
image_directory = 'C:\\Users\\Owner\\Desktop\\A\\Python_2016_A\\images'
for i in range(8):
                
   
    local_filename, headers = urllib.request.urlretrieve(images[i])
    print (local_filename)
    local_filenames.append (local_filename)
    
          
   


def image_similarity_histogram_via_pil(filepath1, filepath2):
    from PIL import Image
    import math
    import operator
    
    image1 = Image.open(filepath1)
    image2 = Image.open(filepath2)
 
    image1 = get_thumbnail(image1)
    image2 = get_thumbnail(image2)
    
    h1 = image1.histogram()
    h2 = image2.histogram()
 
    rms = math.sqrt(reduce(operator.add,  list(map(lambda a,b: (a-b)**2, h1, h2)))/len(h1) )
    print (rms)
    return rms
 
def image_similarity_vectors_via_numpy(filepath1, filepath2):
    # source: http://www.syntacticbayleaves.com/2008/12/03/determining-image-similarity/
    # may throw: Value Error: matrices are not aligned . 
    
    from numpy import average, linalg, dot
   
    
    image1 = Image.open(filepath1)
    image2 = Image.open(filepath2)
 
    image1 = get_thumbnail(image1, stretch_to_fit=True)
    image2 = get_thumbnail(image2, stretch_to_fit=True)
    
    images = [image1, image2]
    vectors = []
    norms = []
    for image in images:
        vector = []
        for pixel_tuple in image.getdata():
            vector.append(average(pixel_tuple))
        vectors.append(vector)
        norms.append(linalg.norm(vector, 2))
    a, b = vectors
    a_norm, b_norm = norms
    # ValueError: matrices are not aligned !
    res = dot(a / a_norm, b / b_norm)
    print (res)
    return res



def get_thumbnail(image, size=(128,128), stretch_to_fit=False, greyscale=False):
    " get a smaller version of the image - makes comparison much faster/easier"
    if not stretch_to_fit:
        image.thumbnail(size, Image.ANTIALIAS)
    else:
        image = image.resize(size); # for faster computation
    if greyscale:
        image = image.convert("L")  # Convert it to grayscale.
    return image


image_similarity_histogram_via_pil(local_filenames[0], local_filenames[1])
image_similarity_vectors_via_numpy(local_filenames[0], local_filenames[1])
    


for i in range(7):
  print (local_filenames[i])  
  for j in range(i+1,8):
      print (local_filenames[j])
      image_similarity_vectors_via_numpy(local_filenames[i], local_filenames[j])

References
1. Pixabay
2. Pixabay API
3. Python 2 & 3 Pixabay API interface
4. Python – Image Similarity Comparison Using Several Techniques
5. Image histogram Wikipedia

Share this: