JSON Import Procedure

The JSON Import Procedure type is used to import a text file containing one JSON per line in a dataset.

This procedure will process lines using the parse_json builtin function.

Configuration

A new procedure of type import.json named <id> can be created as follows:

mldb.put("/v1/procedures/"+<id>, {
    "type": "import.json",
    "params": {
        "dataFileUrl": <Url>,
        "outputDataset": <OutputDatasetSpec>,
        "limit": <int>,
        "offset": <int>,
        "ignoreBadLines": <bool>,
        "select": <SqlSelectExpression>,
        "where": <string>,
        "named": <string>,
        "arrays": <JsonArrayHandling>,
        "runOnCreation": <bool>
    }
})

with the following key-value definitions for params:

Field, Type, DefaultDescription

dataFileUrl
Url

URL to load text file from

outputDataset
OutputDatasetSpec
{"type":"tabular"}

Configuration for output dataset

limit
int
0

Maximum number of lines to process

offset
int
0

Skip the first n lines.

ignoreBadLines
bool
false

If true, any line causing an error will be skipped. Any line with an invalid JSON object will cause an error.

select
SqlSelectExpression
"*"

Which columns to use.

where
string
"true"

Which lines to use to create rows.

named
string
"lineNumber()"

Row name expression for output dataset. Note that each row must have a unique name and that names cannot be objects.

arrays
JsonArrayHandling
"parse"

Describes how arrays are encoded in the JSON output. For ''parse' (default), the arrays become structured values. For 'encode', arrays containing atoms are sparsified with the values representing one-hot keys and boolean true values

runOnCreation
bool
true

If true, the procedure will be run immediately. The response will contain an extra field called firstRun pointing to the URL of the run.

Enumeration JsonArrayHandling

ValueDescription
parse

Arrays will be parsed into nested values

encode

Arrays will be encoded as one-hot values

See also