Melt Procedure

This procedure type allows you to perform a melt operation on a dataset. Melting a dataset takes a set of columns to keep fixed and a set of columns to melt and creates a new rows, one per column to melt, while copying as is the columns to keep fixed.

An example usage is importing JSON data where some fields contain an array of objects and the way we want to process it is one object per row.

Configuration

A new procedure of type melt named <id> can be created as follows:

mldb.put("/v1/procedures/"+<id>, {
    "type": "melt",
    "params": {
        "inputData": <InputQuery>,
        "outputDataset": <OutputDatasetSpec>,
        "keyColumnName": <string>,
        "valueColumnName": <string>,
        "runOnCreation": <bool>
    }
})

with the following key-value definitions for params:

Field, Type, Default	Description
inputData InputQuery	Specification of the data for input to the melt procedure. The select expression must contain these two sub-expressions: one row expression called `to_fix` to identify the columns to keep fixed and another row expression called `to_melt` to identify the columns to melt.
outputDataset OutputDatasetSpec `{"type":"tabular"}`	Configuration for output dataset
keyColumnName string `"key"`	Column name for the key column
valueColumnName string `"value"`	Column name for the value column
runOnCreation bool `true`	If true, the procedure will be run immediately. The response will contain an extra field called `firstRun` pointing to the URL of the run.

Example

Example with a JSON array

Suppose the following dataset data with the friends column containing strings.

rowName	name	age	friends
row1	bill	25	[{"name": "mich", "age": 20}, {"name": "jean", "age": 18}]

We may want to perform operations on the contents of the JSON object in the friends column. To do so, we can perform a melt operation on the output of the parse_json() function.

Doing the query select parse_json(friends, {arrays: 'encode'}) from data will return:

rowName	0	1
row1	{"name": "mich", "age": 20}	{"name": "jean", "age": 18}

We can do the melt like this:

mldb.post("/v1/procedures", {
    "type": "melt",
    "params": {
        "inputData": """
                    SELECT {name, age} as to_fix,
                           {friends*} as to_melt
                    FROM (
                        SELECT name, age,
                               parse_json(friends, {arrays: 'encode'}) AS friends
                        FROM data
                    )""",
        "outputDataset": "melted_data"
    }
})

The melted_data dataset will look like this:

rowName	name	age	key	value
result.friends.0	bill	25	friends.0	{"name": "mich", "age": 20}
result.friends.1	bill	25	friends.1	{"name": "jean", "age": 18}

Example with bags of words

Suppose the following dataset data containing a sentence per row:

rowName	text
row1	hello my friend
row2	hello it's me

By running a melt procedure and using the tokenize function on the text, we can obtain a new dataset with one row per (rowName, word) pair:

mldb.post("/v1/procedures", {
    "type": "melt",
    "params": {
        "inputData": """
            SELECT {rowName() as rowName} as to_fix,
                   {tokenize(text, {splitChars: ' '}) as *} as to_melt
            FROM data
        """,
        "outputDataset": "melted_data"
    }
})

This gives us the following dataset:

rowName	row	key	count
row1.my	row1	my	1
row2.hello	row2	hello	1
row2.me	row2	me	1
row2.it's	row2	it's	1
row1.friend	row1	friend	1
row1.hello	row1	hello	1

Melt Procedure

Configuration

Example

Example with a JSON array

Example with bags of words

See also