This page is part of the documentation for the Machine Learning Database.

It is a static snapshot of a Notebook which you can play with interactively by trying MLDB online now.
It's free and takes 30 seconds to get going.

Executing JavaScript Code Directly in SQL Queries Using the `jseval` Function Tutorial¶

MLDB provides a complete implementation of the SQL SELECT statement. Most of the functions you are used to using are available in your queries.

MLDB also supports additional functions that extend standard SQL in very interesting ways. One of those function is the jseval function that can be used to execute arbitrary JavaScript code inline in an SQL query.

In this tutorial, we will show some basic usage example followed by two different use-cases for the jseval function:

Formatting data during the import step
Designing custom feature generators

Setting up¶

Before we begin, let's start by importing the pymldb library so we can make REST API calls to MLDB. You can check out the Using pymldb Tutorial for more details.

from pymldb import Connection
mldb = Connection("http://localhost")

Basic usage examples¶

Let's start by writing a simple SQL query that will multiply an input number by 2 in JavaScript:

mldb.query("""
    SELECT
        jseval('
            return val * 2;
        ','val', 5) AS output
""")

The variable val takes the input value 5 and the code is then evaluated.

Our function can also take in multiple parameters as input, and return different output values:

mldb.query("""
    SELECT
        jseval('
            var output = {};
            output["mult"] = val * 2;
            output["name"] = str_val + " Hello!";
            return output;
        ','val,str_val', 5, 'Bonjour!') AS output
""")

In the above example, the string val,str_val mean that the function takes 2 input variables. Those values will be 5 and the string Bonjour!. Since we return a JavaScript object, we essentially return a row where the keys are the objects' keys and the cell values are the object's values.

Now that we have the basics in place, let's continue to a real use-case below.

Formatting data during the import step¶

In the Loading Data From An HTTP Server Tutorial tutorial, we loaded a specific file from an archive that was located on the Stanford Network Analysis Project (SNAP) website.

The dataset contains all the circles of friends in which user no. 3980 is part of. Each row represents a circle of friends, and all the users that are part of that circle will be enumerated on the line.

Let's check's out the unformated version of the data first, by running the import.text procedure:

dataUrl = "http://snap.stanford.edu/data/facebook.tar.gz"

mldb.put("/v1/procedures/import_data", {
    "type": "import.text",
    "params": {
        "dataFileUrl": "archive+" + dataUrl + "#facebook/3980.circles",
        "delimiter": " ", 
        "quoteChar": "",
        "outputDataset": "import_URL2",
        "runOnCreation": True
    }
})

mldb.query("SELECT * NAMED rowName() FROM import_URL2 LIMIT 10")

We see that each line contains the circle number followed by user ids. This type of data is an ideal candidate for MLDB, since we can store it as bags of words, or rather, bags of friends. A dataset of type sparse.mutable can store sparse representations like this one very efficiently.

Normally, we could use the tokenize function to deal with data like this. However, since splitting the data on the <TAB> character yields a variable number of columns, the standard way of importing this won't work very nicely in the import.text procedure.

In the code below, we will use the jseval function to do the following in JavaScript:

create an empty object
split each line on the <TAB> character
store the first element of each line under the key rowName in the object (circle0, circle1, etc...)
store all remaining elements of the line using the element's name as the key, and the number 1 as the value

dataUrl = "http://snap.stanford.edu/data/facebook.tar.gz"

print mldb.put("/v1/procedures/import_non_formated", {
    "type": "import.text",
    "params": {
        "dataFileUrl": "archive+" + dataUrl + "#facebook/3980.circles",
        "headers": ["circles"],
        "select": """
            jseval('
                var row_val = val.split("\t");
                var rtn = {};
                rtn["rowName"] = row_val[0];
                for(i=1; i<row_val.length; i++) {
                    rtn[row_val[i]] = 1;
                }
                return rtn;
                ','val', circles) AS *
        """,
        "outputDataset": {
            "id": "import_non_formated",
            "type": "sparse.mutable"
        },
        "runOnCreation": True
    }
})

<Response [201]>

We can now run a SELECT query on the resulting dataset and get a nice sparse representation:

mldb.query("""
    SELECT * EXCLUDING(rowName)
    NAMED rowName
    FROM import_non_formated 
    ORDER BY CAST(rowName() AS integer) 
    LIMIT 5
""")

We can now answer a simple question like: Is there any friend of user 3980 that appears in more than one of his circle of friends? It can be answered with the following query:

mldb.query("""
    SELECT *
    FROM transpose(
        (
            SELECT sum({* EXCLUDING(rowName)}) as * 
            NAMED 'result'
            FROM import_non_formated
        )
    )
    ORDER BY result DESC
    LIMIT 5
""")

Since the maximum value is 1, we now know that the answer to the above question is no.

Although there are other ways to obtain the same result, using jseval and the dataset of type sparse.mutable allowed us to transform our data in a single step, without knowing its characteristics in advance. This shows how much added flexibility is added by such a function.

Designing custom feature generators¶

Another very powerful way the jseval function can be used is as a feature generator. When trying to prototype and iterate quickly, this can be a very efficient way to try out new ideas.

Let's start by creating a toy dataset using the description of machine learning concepts from Wikipedia:

print mldb.put('/v1/procedures/import_ML_concepts', {
        "type":"import.text",
        "params": {
            "dataFileUrl":"http://public.mldb.ai/datasets/MachineLearningConcepts.csv",
            "outputDataset": "ml_concepts",
            "named": "Concepts",
            "select": "Text",
            "runOnCreation": True
        }
    }
)

<Response [201]>

Taking a peek at our data, we see there is a single column called Text that contains a textual description of an ML concept:

mldb.query("SELECT * FROM ml_concepts")

Let's now create a function of type sql.expression containing a jseval function that calculates different statistics about the string it is given. It calculates things like the number of words in the string, the number of capital letters, etc.

Putting it in an sql.expression allows us to reuse it easily later on.

print mldb.put("/v1/functions/getStats", {
    "type": "sql.expression",
    "params": {
        "expression": """
            jseval(' 
                var result = {};

                result["len"] = txt.length;
                result["numWords"] = txt.split(" ").length;
                result["numCapital"] = txt.replace(/[^A-Z]/g, "").length;
                result["numExpl"] = txt.replace(/[^!]/g, "").length;
                result["numQst"] = txt.replace(/[^?]/g, "").length;
                result["containsHashSign"] = txt.replace(/[^#]/g, "").length >= 1;
                result["numNumbers"] = txt.replace(/[^0-9]/g, "").length;

                result["capitalProportion"] = result["numCapital"] / result["len"];
                result["explProportion"] = result["numExpl"] / result["len"];
                result["qstProportion"] = result["numQst"] / result["len"];
                result["numberProportion"] = result["numNumbers"] / result["len"];

                return result;
            ', 'txt', text) as stats
        """
    }
})

<Response [201]>

Now that we have created our getStats function, we can call it on a single string:

mldb.query("SELECT getStats({text: 'This is a test #hopethisworks #mldb'}) as *")

Looks like it works! We can also call it on the Text column of our ml_concepts dataset to get the statistics for all the rows of our dataset:

mldb.query("SELECT getStats({text: Text}) as * FROM ml_concepts")

Doing most of this is possible in standard SQL, but the jseval implementation is simple, fast and compact. This is a great way to quickly experiment with ideas and gives maximum flexibility to manipulate data.

Where to next?¶

Check out the other Tutorials and Demos.

	output.mult	output.name
_rowName
result	10	Bonjour! Hello!

	"circle0 3989 4009"
_rowName
2	circle1\t4010\t4037
3	circle2\t4013
4	circle3\t4024\t3987\t4015
5	circle4\t4006
6	circle5\t4035
7	circle6\t3999\t4028\t4005\t3996\t4031\t4018\t3...
8	circle7\t3984
9	circle8\t3988\t4030\t4026\t4021
10	circle9\t3983\t3992\t4033\t4017\t4000\t3986
11	circle10\t3990\t4007\t4016\t4025

	3989	4009	4010	4037	4013	3987	4015	4024	4006
_rowName
circle0	1	1	NaN	NaN	NaN	NaN	NaN	NaN	NaN
circle1	NaN	NaN	1	1	NaN	NaN	NaN	NaN	NaN
circle2	NaN	NaN	NaN	NaN	1	NaN	NaN	NaN	NaN
circle3	NaN	NaN	NaN	NaN	NaN	1	1	1	NaN
circle4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1

	result
_rowName
4030	1
4013	1
4020	1
4023	1
3999	1

	Text
_rowName
Artificial neural network	In machine learning and cognitive science, art...
Autoencoder	An autoencoder, autoassociator or Diabolo netw...
Hopfield network	A Hopfield network is a form of recurrent arti...
Boltzmann machine	Boltzmann machine is a type of stochastic recu...
Restricted boltzmann machines	A restricted Boltzmann machine (RBM) is a gene...
Deep belief network	In machine learning, a deep belief network (DB...
Logistic regression	In statistics, logistic regression, or logit r...
Naive bayes classifier	In machine learning, naive Bayes classifiers a...
Support vector machine	In machine learning, support vector machines (...