Melt Procedure

This procedure type allows you to perform a melt operation on a dataset. Melting a dataset takes a set of columns to keep fixed and a set of columns to melt and creates a new rows, one per column to melt, while copying as is the columns to keep fixed.

An example usage is importing JSON data where some fields contain an array of objects and the way we want to process it is one object per row.

Configuration

A new procedure of type melt named <id> can be created as follows:

mldb.put("/v1/procedures/"+<id>, {
    "type": "melt",
    "params": {
        "inputData": <InputQuery>,
        "outputDataset": <OutputDatasetSpec>,
        "keyColumnName": <string>,
        "valueColumnName": <string>,
        "runOnCreation": <bool>
    }
})

with the following key-value definitions for params:

Field, Type, DefaultDescription

inputData
InputQuery

Specification of the data for input to the melt procedure. The select expression must contain these two sub-expressions: one row expression called to_fix to identify the columns to keep fixed and another row expression called to_melt to identify the columns to melt.

outputDataset
OutputDatasetSpec
{"type":"tabular"}

Configuration for output dataset

keyColumnName
string
"key"

Column name for the key column

valueColumnName
string
"value"

Column name for the value column

runOnCreation
bool
true

If true, the procedure will be run immediately. The response will contain an extra field called firstRun pointing to the URL of the run.

Example

Example with a JSON array

Suppose the following dataset data with the friends column containing strings.

rowName name age friends
row1 bill 25 [{"name": "mich", "age": 20}, {"name": "jean", "age": 18}]

We may want to perform operations on the contents of the JSON object in the friends column. To do so, we can perform a melt operation on the output of the parse_json() function.

Doing the query select parse_json(friends, {arrays: 'encode'}) from data will return:

rowName 0 1
row1 {"name": "mich", "age": 20} {"name": "jean", "age": 18}

We can do the melt like this:

mldb.post("/v1/procedures", {
    "type": "melt",
    "params": {
        "inputData": """
                    SELECT {name, age} as to_fix,
                           {friends*} as to_melt
                    FROM (
                        SELECT name, age,
                               parse_json(friends, {arrays: 'encode'}) AS friends
                        FROM data
                    )""",
        "outputDataset": "melted_data"
    }
})

The melted_data dataset will look like this:

rowName name age key value
result.friends.0 bill 25 friends.0 {"name": "mich", "age": 20}
result.friends.1 bill 25 friends.1 {"name": "jean", "age": 18}

Example with bags of words

Suppose the following dataset data containing a sentence per row:

rowName text
row1 hello my friend
row2 hello it's me

By running a melt procedure and using the tokenize function on the text, we can obtain a new dataset with one row per (rowName, word) pair:

mldb.post("/v1/procedures", {
    "type": "melt",
    "params": {
        "inputData": """
            SELECT {rowName() as rowName} as to_fix,
                   {tokenize(text, {splitChars: ' '}) as *} as to_melt
            FROM data
        """,
        "outputDataset": "melted_data"
    }
})

This gives us the following dataset:

rowName row key count
row1.my row1 my 1
row2.hello row2 hello 1
row2.me row2 me 1
row2.it's row2 it's 1
row1.friend row1 friend 1
row1.hello row1 hello 1

See also