KDNuggets Transfer Learning Blog Post

This is the companion notebook to the guest blog post on KGNuggets.

This is the companion notebook to the guest blog post on KGNuggets.

Import some libraries

In [23]:
from pymldb import Connection
mldb = Connection()

import pandas as pd

Inception on MLDB

We start by creating the inception function that we can call to run the trained Inception-V3 TensorFlow model:

In [7]:
print mldb.put('/v1/functions/inception', {
    "type": 'tensorflow.graph',
    "params": {
        "modelFileUrl": 'archive+'+
        "inputs": 'fetcher(url)[content] AS "DecodeJpeg/contents"',
        "outputs": "pool_3"
<Response [201]>

We can then use it to embed any image in the representation it learned. Let's try doing this to the KDNuggets logo:

In [11]:
kdNuggets = ""

mldb.query("SELECT inception({url: '%s'}) as *" % kdNuggets)
pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. ... pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3.
result 0.405393 0.073578 0.063868 0.133508 0.044338 0.002757 0.579667 0.012046 0.74275 0.862201 ... 0.570614 0.245445 0.192202 0.772916 0.002887 0.424597 0.018911 0.035651 0.114374 1.145283

1 rows × 2048 columns

Loading our dataset

We now load a CSV dataset that contains links to car images from three brands:

In [12]:
print"/v1/procedures", {
    "type": "import.text",
    "params": {
        "dataFileUrl": "",
        "outputDataset": "images"

mldb.query("SELECT * FROM images LIMIT 3")
<Response [201]>
brand url
2 audi
3 audi
4 audi

Let's get a sense of how many images we have in each class:

In [13]:
mldb.query("SELECT count(*) FROM images GROUP BY brand")
"[""audi""]" 69
"[""bmw""]" 72
"[""tesla""]" 72

We can easily run a few images through the network like this:

In [17]:
mldb.query("SELECT inception({url: url}) AS * FROM images LIMIT 3")
pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. ... pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3.
2 0.132836 0.267571 0.460861 0.073345 0.319553 0.245528 0.222645 0.076144 0.266402 0.269410 ... 0.517730 0.324239 0.182698 0.589910 0.296351 0.356238 0.185801 0.236495 0.853976 0.004721
3 0.371351 0.253051 0.222188 0.131357 0.352972 0.163278 0.205562 0.148676 1.125986 0.027997 ... 0.005888 0.139715 0.103083 0.718988 0.326615 0.118558 0.087323 0.117636 0.382889 0.024768
4 0.462379 0.433509 0.411765 0.157537 0.277709 0.221781 0.190680 0.045631 0.885003 0.105056 ... 0.046998 0.388228 0.573246 0.877034 0.491815 0.191424 0.259622 0.402484 0.741471 0.015490

3 rows × 2048 columns

To create our training dataset, we run a transform procedure to apply the TensorFlow model to all images:

In [18]:
print"/v1/procedures", {
    "type": "transform",
    "params": {
        "inputData": """
            SELECT brand,
                   inception({url}) as *
            FROM images
        "outputDataset": "training_dataset"
<Response [201]>

This gives us the following result:

In [19]:
mldb.query("SELECT * FROM training_dataset LIMIT 3")
brand pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. ... pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3. pool_3.
127 bmw 0.273366 0.488088 0.243253 0.293343 0.172615 0.168309 0.421383 0.327155 1.005386 ... 0.018846 0.089321 0.341528 1.002636 0.483638 0.030398 0.017882 0.212639 0.282181 0.225709
108 bmw 0.433248 0.732170 0.282846 0.142594 0.142779 0.172190 0.231635 0.026957 0.271389 ... 0.451374 0.189907 0.286528 0.394389 0.602661 0.053008 0.575223 0.344477 0.370394 0.031122
95 bmw 0.068673 0.441297 0.342519 0.020982 0.224962 0.301158 0.285375 0.053405 0.834576 ... 0.497007 0.361395 0.186176 0.748922 0.359960 0.053431 0.146289 0.509478 0.515487 0.044757

3 rows × 2049 columns

Training a classifier

Let's now train a model. We'll use a 50/50 split for training and testing, and use a random forest:

In [20]:
rez ="/v1/procedures", {
    "type": "classifier.experiment",
    "params": {
        "experimentName": "car_brand_cls",
        "inputData": """        
                {* EXCLUDING(brand)} as features,
                brand as label
            FROM training_dataset
        "mode": "categorical",
        "modelFileUrlPattern": "file:///mldb_data/car_brand_cls.cls",
         "configuration": {
            "type": "bagging",
            "weak_learner": {
                "type": "boosting",
                "weak_learner": {
                    "type": "decision_tree",
                    "max_depth": 5,
                    "update_alg": "gentle",
                    "random_feature_propn": 0.6
                "min_iter": 5,
                "max_iter": 30
            "num_bags": 15

runResults = rez.json()["status"]["firstRun"]["status"]["folds"][0]["resultsTest"]
print rez
<Response [201]>

Let's look at our results on the test set:

In [21]:
    .pivot_table(index="actual", columns="predicted", fill_value=0)
predicted audi bmw tesla
audi 19 13 4
bmw 3 24 9
tesla 3 4 33
In [22]:
accuracy f1Score precision recall support
audi 0.794643 0.622951 0.760000 0.527778 36
bmw 0.741071 0.623377 0.585366 0.666667 36
tesla 0.821429 0.767442 0.717391 0.825000 40

Creating a real-time endpoint

We create a function of type sql.expression that will represent our pipeline and that we call brand_predictor. It takes the URL to an image, passes it through the inception model to extract the features, and then into the car_brand_cls_scorer_0 function that represents our trained model and that was created at the previous step.

In [25]:
print mldb.put("/v1/functions/brand_predictor", {
    "type": "sql.expression",
    "params": {
        "expression": """
                    features: inception({url})
                }) as *
<Response [201]>

We can now call this endpoint on new images and get predictions back:

In [26]:
# good tesla:
# good tesla:
# good bmw:

               {'url': ''}})
GET http://localhost/v1/functions/brand_predictor/application
200 OK
  "output": {
    "scores": [

Where to next?

