This procedure type allows you to perform a melt operation on a dataset. Melting a dataset takes a set of columns to keep fixed and a set of columns to melt and creates a new rows, one per column to melt, while copying as is the columns to keep fixed.
An example usage is importing JSON data where some fields contain an array of objects and the way we want to process it is one object per row.
A new procedure of type melt
named <id>
can be created as follows:
mldb.put("/v1/procedures/"+<id>, {
"type": "melt",
"params": {
"inputData": <InputQuery>,
"outputDataset": <OutputDatasetSpec>,
"keyColumnName": <string>,
"valueColumnName": <string>,
"runOnCreation": <bool>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
inputData | Specification of the data for input to the melt procedure. The select expression must contain these two sub-expressions: one row expression called |
outputDataset | Configuration for output dataset |
keyColumnName | Column name for the key column |
valueColumnName | Column name for the value column |
runOnCreation | If true, the procedure will be run immediately. The response will contain an extra field called |
Suppose the following dataset data
with the friends
column containing strings.
rowName | name | age | friends |
---|---|---|---|
row1 | bill | 25 | [{"name": "mich", "age": 20}, {"name": "jean", "age": 18}] |
We may want to perform operations on the contents of the JSON object in the
friends
column. To do so, we can perform a melt
operation on the output
of the parse_json()
function.
Doing the query select parse_json(friends, {arrays: 'encode'}) from data
will return:
rowName | 0 | 1 |
---|---|---|
row1 | {"name": "mich", "age": 20} | {"name": "jean", "age": 18} |
We can do the melt like this:
mldb.post("/v1/procedures", {
"type": "melt",
"params": {
"inputData": """
SELECT {name, age} as to_fix,
{friends*} as to_melt
FROM (
SELECT name, age,
parse_json(friends, {arrays: 'encode'}) AS friends
FROM data
)""",
"outputDataset": "melted_data"
}
})
The melted_data
dataset will look like this:
rowName | name | age | key | value |
---|---|---|---|---|
result.friends.0 | bill | 25 | friends.0 | {"name": "mich", "age": 20} |
result.friends.1 | bill | 25 | friends.1 | {"name": "jean", "age": 18} |
Suppose the following dataset data
containing a sentence per row:
rowName | text |
---|---|
row1 | hello my friend |
row2 | hello it's me |
By running a melt
procedure and using the tokenize
function on the text,
we can obtain a new dataset with one row per (rowName, word) pair:
mldb.post("/v1/procedures", {
"type": "melt",
"params": {
"inputData": """
SELECT {rowName() as rowName} as to_fix,
{tokenize(text, {splitChars: ' '}) as *} as to_melt
FROM data
""",
"outputDataset": "melted_data"
}
})
This gives us the following dataset:
rowName | row | key | count |
---|---|---|---|
row1.my | row1 | my | 1 |
row2.hello | row2 | hello | 1 |
row2.me | row2 | me | 1 |
row2.it's | row2 | it's | 1 |
row1.friend | row1 | friend | 1 |
row1.hello | row1 | hello | 1 |