This procedure generates summary statistics
for every column of the input dataset. Each column from the inputData
will be represented as a row in the outputDataset
.
Mixed or non numeric columns are treated as categorical and the statistics are:
A new procedure of type summary.statistics
named <id>
can be created as follows:
mldb.put("/v1/procedures/"+<id>, {
"type": "summary.statistics",
"params": {
"inputData": <InputQuery>,
"outputDataset": <OutputDatasetSpec>,
"runOnCreation": <bool>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
inputData | An SQL statement to select the input data. The query must not contain GROUP BY or HAVING clauses and, unlike most select expressions, this one can only select whole columns, not expressions involving columns. So X will work, but not X + 1. If you need derived values in the query, create a dataset with the derived columns as a previous step and use a query on that dataset instead. |
outputDataset | Output dataset configuration. This may refer either to an existing dataset, or a fully specified but non-existing dataset which will be created by the procedure. |
runOnCreation | If true, the procedure will be run immediately. The response will contain an extra field called |