This procedure type allows you to perform a melt operation on a dataset. Melting a dataset takes a set of columns to keep fixed and a set of columns to melt and creates a new rows, one per column to melt, while copying as is the columns to keep fixed.
An example usage is importing JSON data where some fields contain an array of objects and the way we want to process it is one object per row.
A new procedure of type melt named <id> can be created as follows:
mldb.put("/v1/procedures/"+<id>, {
"type": "melt",
"params": {
"inputData": <InputQuery>,
"outputDataset": <OutputDatasetSpec>,
"keyColumnName": <string>,
"valueColumnName": <string>,
"runOnCreation": <bool>
}
})with the following key-value definitions for params:
| Field, Type, Default | Description |
|---|---|
inputData | Specification of the data for input to the melt procedure. The select expression must contain these two sub-expressions: one row expression called |
outputDataset | Configuration for output dataset |
keyColumnName | Column name for the key column |
valueColumnName | Column name for the value column |
runOnCreation | If true, the procedure will be run immediately. The response will contain an extra field called |
Suppose the following dataset data with the friends column containing strings.
| rowName | name | age | friends |
|---|---|---|---|
| row1 | bill | 25 | [{"name": "mich", "age": 20}, {"name": "jean", "age": 18}] |
We may want to perform operations on the contents of the JSON object in the
friends column. To do so, we can perform a melt operation on the output
of the parse_json() function.
Doing the query select parse_json(friends, {arrays: 'encode'}) from data will return:
| rowName | 0 | 1 |
|---|---|---|
| row1 | {"name": "mich", "age": 20} | {"name": "jean", "age": 18} |
We can do the melt like this:
mldb.post("/v1/procedures", {
"type": "melt",
"params": {
"inputData": """
SELECT {name, age} as to_fix,
{friends*} as to_melt
FROM (
SELECT name, age,
parse_json(friends, {arrays: 'encode'}) AS friends
FROM data
)""",
"outputDataset": "melted_data"
}
})
The melted_data dataset will look like this:
| rowName | name | age | key | value |
|---|---|---|---|---|
| result.friends.0 | bill | 25 | friends.0 | {"name": "mich", "age": 20} |
| result.friends.1 | bill | 25 | friends.1 | {"name": "jean", "age": 18} |
Suppose the following dataset data containing a sentence per row:
| rowName | text |
|---|---|
| row1 | hello my friend |
| row2 | hello it's me |
By running a melt procedure and using the tokenize function on the text,
we can obtain a new dataset with one row per (rowName, word) pair:
mldb.post("/v1/procedures", {
"type": "melt",
"params": {
"inputData": """
SELECT {rowName() as rowName} as to_fix,
{tokenize(text, {splitChars: ' '}) as *} as to_melt
FROM data
""",
"outputDataset": "melted_data"
}
})
This gives us the following dataset:
| rowName | row | key | count |
|---|---|---|---|
| row1.my | row1 | my | 1 |
| row2.hello | row2 | hello | 1 |
| row2.me | row2 | me | 1 |
| row2.it's | row2 | it's | 1 |
| row1.friend | row1 | friend | 1 |
| row1.hello | row1 | hello | 1 |