This procedure runs an SQL query on a dataset, and records the output in another dataset. It is frequently used to reduce, reshape and reindex datasets.
It is particularly useful in order to generate a training dataset for machine learning algorithms, which require a pre-indexed dataset with all of the features in place.
A new procedure of type transform
named <id>
can be created as follows:
mldb.put("/v1/procedures/"+<id>, {
"type": "transform",
"params": {
"inputData": <InputQuery>,
"outputDataset": <OutputDatasetSpec>,
"skipEmptyRows": <bool>,
"runOnCreation": <bool>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
inputData | A SQL statement to select the rows from a dataset to be transformed. This supports all MLDB's SQL expressions including but not limited to where, when, order by and group by clauses. These expressions can be used to refine the rows to transform. |
outputDataset | Output dataset configuration. This may refer either to an existing dataset, or a fully specified but non-existing dataset which will be created by the procedure. |
skipEmptyRows | Skip rows from the input dataset where no values are selected |
runOnCreation | If true, the procedure will be run immediately. The response will contain an extra field called |