This page is part of the documentation for the Machine Learning Database.

It is a static snapshot of a Notebook which you can play with interactively by trying MLDB online now.
It's free and takes 30 seconds to get going.

KDNuggets Transfer Learning Blog Post¶

This is the companion notebook to the MLDB.ai guest blog post on KGNuggets.

The post will soon be published. If you want to try an interactive version of this notebook, simply signup for a free account.

Import some libraries¶

from pymldb import Connection
mldb = Connection()

import pandas as pd

Inception on MLDB¶

We start by creating the inception function that we can call to run the trained Inception-V3 TensorFlow model:

print mldb.put('/v1/functions/inception', {
    "type": 'tensorflow.graph',
    "params": {
        "modelFileUrl": 'archive+'+
            'http://public.mldb.ai/models/inception_dec_2015.zip'+
            '#tensorflow_inception_graph.pb',
        "inputs": 'fetcher(url)[content] AS "DecodeJpeg/contents"',
        "outputs": "pool_3"
    }
})

<Response [201]>

We can then use it to embed any image in the representation it learned. Let's try doing this to the KDNuggets logo:

kdNuggets = "http://www.skytree.net/wp-content/uploads/2014/08/KDnuggets.jpg"

mldb.query("SELECT inception({url: '%s'}) as *" % kdNuggets)

Loading our dataset¶

We now load a CSV dataset that contains links to car images from three brands:

print mldb.post("/v1/procedures", {
    "type": "import.text",
    "params": {
        "dataFileUrl": "https://public.mldb.ai/datasets/car_brand_images/cars_urls.csv",
        "outputDataset": "images"
    }
})

mldb.query("SELECT * FROM images LIMIT 3")

<Response [201]>

Let's get a sense of how many images we have in each class:

mldb.query("SELECT count(*) FROM images GROUP BY brand")

We can easily run a few images through the network like this:

mldb.query("SELECT inception({url: url}) AS * FROM images LIMIT 3")

To create our training dataset, we run a transform procedure to apply the TensorFlow model to all images:

print mldb.post("/v1/procedures", {
    "type": "transform",
    "params": {
        "inputData": """
            SELECT brand,
                   inception({url}) as *
            FROM images
        """,
        "outputDataset": "training_dataset"
    }
})

<Response [201]>

This gives us the following result:

mldb.query("SELECT * FROM training_dataset LIMIT 3")

Training a classifier¶

Let's now train a model. We'll use a 50/50 split for training and testing, and use a random forest:

rez = mldb.post("/v1/procedures", {
    "type": "classifier.experiment",
    "params": {
        "experimentName": "car_brand_cls",
        "inputData": """        
            SELECT 
                {* EXCLUDING(brand)} as features,
                brand as label
            FROM training_dataset
        """,
        "mode": "categorical",
        "modelFileUrlPattern": "file:///mldb_data/car_brand_cls.cls",
         "configuration": {
            "type": "bagging",
            "weak_learner": {
                "type": "boosting",
                "weak_learner": {
                    "type": "decision_tree",
                    "max_depth": 5,
                    "update_alg": "gentle",
                    "random_feature_propn": 0.6
                },
                "min_iter": 5,
                "max_iter": 30
            },
            "num_bags": 15
        }
    }
})

runResults = rez.json()["status"]["firstRun"]["status"]["folds"][0]["resultsTest"]
print rez

<Response [201]>

Let's look at our results on the test set:

pd.DataFrame(runResults["confusionMatrix"])\
    .pivot_table(index="actual", columns="predicted", fill_value=0)

pd.DataFrame.from_dict(runResults["labelStatistics"]).transpose()

Creating a real-time endpoint¶

We create a function of type sql.expression that will represent our pipeline and that we call brand_predictor. It takes the URL to an image, passes it through the inception model to extract the features, and then into the car_brand_cls_scorer_0 function that represents our trained model and that was created at the previous step.

print mldb.put("/v1/functions/brand_predictor", {
    "type": "sql.expression",
    "params": {
        "expression": """
            car_brand_cls_scorer_0(
                {
                    features: inception({url})
                }) as *
        """
    }
})

<Response [201]>

We can now call this endpoint on new images and get predictions back:

# good tesla: http://www.automobile-propre.com/wp-content/uploads/2016/09/tesla-premiere-livraison-france-657x438.jpg
# good tesla: http://insideevs.com/wp-content/uploads/2016/03/JL82776-750x500.jpg
# good bmw: http://www.bmwhk.com/content/dam/bmw/common/all-models/1-series/5-door/2015/images-and-videos/bmw-1-series-wallpaper-1920x1200-03-R.jpg/jcr:content/renditions/cq5dam.resized.img.485.low.time1448014414633.jpg

mldb.get("/v1/functions/brand_predictor/application", 
         data={'input': 
               {'url': 'http://insideevs.com/wp-content/uploads/2016/03/JL82776-750x500.jpg'}})

{
  "output": {
    "scores": [
      [
        "\"audi\"", 
        [
          -8.133334159851074, 
          "2016-05-05T04:18:03Z"
        ]
      ], 
      [
        "\"bmw\"", 
        [
          -7.200000286102295, 
          "2016-05-05T04:18:03Z"
        ]
      ], 
      [
        "\"tesla\"", 
        [
          1.0666667222976685, 
          "2016-05-05T04:18:03Z"
        ]
      ]
    ]
  }
}

Where to next?¶

You can now look at the full Transfer Learning with Tensorflow demo, or check out the other Tutorials and Demos.

	pool_3.0.0.0.0	pool_3.0.0.0.1	pool_3.0.0.0.2	pool_3.0.0.0.3	pool_3.0.0.0.4	pool_3.0.0.0.5	pool_3.0.0.0.6	pool_3.0.0.0.7	pool_3.0.0.0.8	pool_3.0.0.0.9	...	pool_3.0.0.0.2038	pool_3.0.0.0.2039	pool_3.0.0.0.2040	pool_3.0.0.0.2041	pool_3.0.0.0.2042	pool_3.0.0.0.2043	pool_3.0.0.0.2044	pool_3.0.0.0.2045	pool_3.0.0.0.2046	pool_3.0.0.0.2047
_rowName
result	0.405393	0.073578	0.063868	0.133508	0.044338	0.002757	0.579667	0.012046	0.74275	0.862201	...	0.570614	0.245445	0.192202	0.772916	0.002887	0.424597	0.018911	0.035651	0.114374	1.145283

	count(*)
_rowName
"[""audi""]"	69
"[""bmw""]"	72
"[""tesla""]"	72

	pool_3.0.0.0.0	pool_3.0.0.0.1	pool_3.0.0.0.2	pool_3.0.0.0.3	pool_3.0.0.0.4	pool_3.0.0.0.5	pool_3.0.0.0.6	pool_3.0.0.0.7	pool_3.0.0.0.8	pool_3.0.0.0.9	...	pool_3.0.0.0.2038	pool_3.0.0.0.2039	pool_3.0.0.0.2040	pool_3.0.0.0.2041	pool_3.0.0.0.2042	pool_3.0.0.0.2043	pool_3.0.0.0.2044	pool_3.0.0.0.2045	pool_3.0.0.0.2046	pool_3.0.0.0.2047
_rowName
2	0.132836	0.267571	0.460861	0.073345	0.319553	0.245528	0.222645	0.076144	0.266402	0.269410	...	0.517730	0.324239	0.182698	0.589910	0.296351	0.356238	0.185801	0.236495	0.853976	0.004721
3	0.371351	0.253051	0.222188	0.131357	0.352972	0.163278	0.205562	0.148676	1.125986	0.027997	...	0.005888	0.139715	0.103083	0.718988	0.326615	0.118558	0.087323	0.117636	0.382889	0.024768
4	0.462379	0.433509	0.411765	0.157537	0.277709	0.221781	0.190680	0.045631	0.885003	0.105056	...	0.046998	0.388228	0.573246	0.877034	0.491815	0.191424	0.259622	0.402484	0.741471	0.015490

	brand	pool_3.0.0.0.0	pool_3.0.0.0.1	pool_3.0.0.0.2	pool_3.0.0.0.3	pool_3.0.0.0.4	pool_3.0.0.0.5	pool_3.0.0.0.6	pool_3.0.0.0.7	pool_3.0.0.0.8	...	pool_3.0.0.0.2038	pool_3.0.0.0.2039	pool_3.0.0.0.2040	pool_3.0.0.0.2041	pool_3.0.0.0.2042	pool_3.0.0.0.2043	pool_3.0.0.0.2044	pool_3.0.0.0.2045	pool_3.0.0.0.2046	pool_3.0.0.0.2047
_rowName
127	bmw	0.273366	0.488088	0.243253	0.293343	0.172615	0.168309	0.421383	0.327155	1.005386	...	0.018846	0.089321	0.341528	1.002636	0.483638	0.030398	0.017882	0.212639	0.282181	0.225709
108	bmw	0.433248	0.732170	0.282846	0.142594	0.142779	0.172190	0.231635	0.026957	0.271389	...	0.451374	0.189907	0.286528	0.394389	0.602661	0.053008	0.575223	0.344477	0.370394	0.031122
95	bmw	0.068673	0.441297	0.342519	0.020982	0.224962	0.301158	0.285375	0.053405	0.834576	...	0.497007	0.361395	0.186176	0.748922	0.359960	0.053431	0.146289	0.509478	0.515487	0.044757

	accuracy	f1Score	precision	recall	support
audi	0.794643	0.622951	0.760000	0.527778	36
bmw	0.741071	0.623377	0.585366	0.666667	36
tesla	0.821429	0.767442	0.717391	0.825000	40

	brand	url
_rowName
2	audi	https://s3.amazonaws.com/public-mldb-ai/datase...
3	audi	https://s3.amazonaws.com/public-mldb-ai/datase...
4	audi	https://s3.amazonaws.com/public-mldb-ai/datase...