Summary Statistics Procedure

This procedure generates summary statistics for every column of the input dataset. Each column from the inputData will be represented as a row in the outputDataset.

Statistics for numeric columns

Statistics for categorical columns

Mixed or non numeric columns are treated as categorical and the statistics are:

Configuration

A new procedure of type summary.statistics named <id> can be created as follows:

mldb.put("/v1/procedures/"+<id>, {
    "type": "summary.statistics",
    "params": {
        "inputData": <InputQuery>,
        "outputDataset": <OutputDatasetSpec>,
        "runOnCreation": <bool>
    }
})

with the following key-value definitions for params:

Field, Type, DefaultDescription

inputData
InputQuery

An SQL statement to select the input data. The query must not contain GROUP BY or HAVING clauses and, unlike most select expressions, this one can only select whole columns, not expressions involving columns. So X will work, but not X + 1. If you need derived values in the query, create a dataset with the derived columns as a previous step and use a query on that dataset instead.

outputDataset
OutputDatasetSpec
{"type":"sparse.mutable"}

Output dataset configuration. This may refer either to an existing dataset, or a fully specified but non-existing dataset which will be created by the procedure.

runOnCreation
bool
true

If true, the procedure will be run immediately. The response will contain an extra field called firstRun pointing to the URL of the run.

Examples