This page is part of the documentation for the Machine Learning Database.

It is a static snapshot of a Notebook which you can play with interactively by trying MLDB online now.
It's free and takes 30 seconds to get going.

Transfer Learning on Images with Tensorflow

This demo will demonstrate how to do transfer learning to leverage the power of a deep convolutional neural network without having to train one yourself. Most people do not train those types of networks from scratch because of the large data and computational power requirements. What is more common is to train the network on a large dataset (unrelated to our task) and then leverage the representation it learnt in one of the following ways:

  • as the initialization of a new network which will then be trained on our specific task (fine tuning)
  • as a feature generator, essentially passing in our examples in the network so it can embed them in the abstract representation it learnt

This notebook will be doing the latter using the Inception-v3 model that was trained on the ImageNet Large Visual Recognition Challenge dataset made up of over 1 million images, where the task was to classify images into 1000 classes.

We will cover three topics in this demo:

  • Unsupervised learning: using the image embedding as input in a dimensionality reduction algorithm for visualization
  • Supervised learning: using the image embedding as features for a multi-class classification task
  • The DeepTeach plugin: showing a plugin that uses the techniques introduced to build a binary image classifier by doing similarity search through a web UI

The following videos shows DeepTeach in action:

In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo('7hZ3X37Qwc4')
Out[2]:

We recommend reading the Tensorflow Image Recognition Tutorial before going though this demo.

Initializing pymldb and other imports

The notebook cells below use pymldb's Connection class to make REST API calls. You can check out the Using pymldb Tutorial for more details.

In [2]:
from pymldb import Connection
mldb = Connection()
import urllib2, pandas as pd, numpy as np, matplotlib.pyplot as plt
from matplotlib.offsetbox import TextArea, DrawingArea, OffsetImage, AnnotationBbox
from matplotlib._png import read_png
%matplotlib inline

The dataset

In the Tensorflow Image Recognition Tutorial tutorial, you saw how to embed the image of Admiral Grace Hopper using the Inception model. To embed a whole dataset of images, we do the same thing but within a procedure of type transform. We will not cover this in details in the notebook, but the detailed code is available in the Dataset Builder plugin's code.

We ship the plugin with 4 pre-assembled datasets: real-estate, recipes, transportation and pets. We start by loading the embeddings of the images for the real-estate dataset:

In [3]:
prefix = "http://public.mldb.ai/datasets/dataset-builder"

print mldb.put("/v1/procedures/embedded_images", {
    "type": "import.text",
    "params": {
        "dataFileUrl": prefix + "/cache/dataset_creator_embedding_realestate.csv.gz",
        "outputDataset": {
                "id": "embedded_images_realestate",
                "type": "embedding"
            },
        "select": "* EXCLUDING(rowName)",
        "named": "rowName",
        "runOnCreation": True
    }
})
<Response [201]>

The dataset we just imported has one row per image and the dense columns are the 2048-dimensional embeddings. We used the second to last layer of the network, the pool_3 layer, which is less specialized than the final softmax layer of the network. Since the Inception model was trained on the ImageNet task, the last layer has been trained to perform very well on that specific task, while the previous layers are more abstract representations and are more suitable for transfer learning tasks.

The following query shows the embedding values for 2 rows:

In [4]:
mldb.query("SELECT * FROM embedded_images_realestate ORDER BY rowHash() ASC LIMIT 5")
Out[4]:
"pool_3.0.0.0.0" "pool_3.0.0.0.1" "pool_3.0.0.0.2" "pool_3.0.0.0.3" "pool_3.0.0.0.4" "pool_3.0.0.0.5" "pool_3.0.0.0.6" "pool_3.0.0.0.7" "pool_3.0.0.0.8" "pool_3.0.0.0.9" ... "pool_3.0.0.0.2038" "pool_3.0.0.0.2039" "pool_3.0.0.0.2040" "pool_3.0.0.0.2041" "pool_3.0.0.0.2042" "pool_3.0.0.0.2043" "pool_3.0.0.0.2044" "pool_3.0.0.0.2045" "pool_3.0.0.0.2046" "pool_3.0.0.0.2047"
_rowName
condo-13 0.037069 0.290698 0.332847 0.322052 0.505879 0.187525 0.023030 0.654751 0.362244 0.589525 ... 0.460528 0.000000 0.002499 0.453131 0.492025 0.013759 0.787879 0.410502 0.135939 0.024021
office_building-11 0.003208 0.231195 0.231483 0.056335 0.183423 0.137407 0.432620 0.428537 0.028169 0.304555 ... 0.318491 0.104283 0.052675 0.357397 0.028384 0.172125 0.391694 0.240013 0.504228 0.097407
igloo-16 0.212397 0.224847 0.085570 0.018010 0.127204 0.020553 0.252013 0.056786 0.204582 0.305146 ... 0.134168 0.486442 0.535087 0.000807 0.008934 0.450499 0.242037 0.000000 0.264583 0.292052
sand_castle-9 0.113618 0.571280 0.026013 0.190822 0.059604 0.249640 0.296866 0.510331 0.029109 0.014127 ... 0.473897 0.445435 0.333632 0.028982 0.195716 0.019377 0.894663 0.061072 0.215425 0.406666
castle-18 0.140019 0.237392 0.059138 0.130825 0.143036 0.065903 0.073389 0.106741 0.206939 0.203547 ... 0.537889 0.012164 0.293059 0.061627 0.005444 0.000000 0.580052 0.626209 0.281134 0.014249

5 rows × 2048 columns

The real-estate dataset contains images of different types of buildings. The following query shows the different categories:

In [5]:
mldb.query("""
    SELECT count(*) as count
    FROM embedded_images_realestate 
    GROUP BY regex_replace(rowName(), '-[\\d]+', '')
""")
Out[5]:
count
_rowName
"[""beach_house""]" 20
"[""cabin""]" 20
"[""castle""]" 20
"[""condo""]" 20
"[""condo_building""]" 20
"[""cottage""]" 20
"[""duplex""]" 20
"[""hut""]" 20
"[""igloo""]" 20
"[""monument""]" 20
"[""office_building""]" 20
"[""sand_castle""]" 20
"[""suburban_house""]" 20
"[""town_house""]" 20
"[""triplex""]" 20

Here are a few sample images:

condo-13 sand_castle-10
office_building-11 town_house-2

Unsupervised learning

The first transfer learning task we will do is using the rich abstract embedding of the images to run an unsupervised dimensionality reduction algorithm to visualize the real-estate dataset.

In the following query, we use the t-SNE algorithm to do dimensionality reduction for our visualization:

In [6]:
print mldb.put("/v1/procedures/tsne", {
    "type": "tsne.train",
    "params": {
        "trainingData": "SELECT * FROM embedded_images_realestate",
        "rowOutputDataset": "tsne_embedding",
        "numOutputDimensions": 2,
        "runOnCreation": True
    }
})
<Response [201]>

The tsne_embedding dataset that the t-SNE procedure generated gives us x and y coordinates for all our images:

In [7]:
mldb.query("SELECT * from tsne_embedding limit 2")
Out[7]:
x y
_rowName
condo-13 -6.030137 237.249969
office_building-11 -8.222577 85.114670

We can now create a scatter plot of all the images in our dataset, positioning them at the coordinates provided by the t-SNE algorithm:

In [8]:
image_prefix = "http://public.mldb.ai/datasets/dataset-builder/images/realestate_png/"

df = mldb.query("SELECT * from tsne_embedding")
bounds = df.quantile([.05, .95]).T.values
fig = plt.figure(figsize=(18, 15), frameon=False)
ax = fig.add_subplot(111, xlim=bounds[0], ylim=bounds[1])
plt.axis('off')

for x in df.iterrows():
    imagebox = OffsetImage(read_png(urllib2.urlopen(image_prefix + "%s.png" % x[0])), zoom=0.35)
    ax.add_artist(AnnotationBbox(imagebox, xy=(x[1]["x"], x[1]["y"]), xycoords='data', frameon=False))