MLDB Type System

This page describes the type system MLDB uses to store and process data.

Atomic types

MLDB's atomic types are the following:

Data Point Timestamps

Every data point stored or manipulated by MLDB has an associated timestamp, even if the data point is itself of type timestamp. Data point timestamps are specified on row creation or as inputs to procedures which create datasets. Literals appearing in queries have a timestamp of -inf but any value's timestamp can be modified in a query with the special @ operator (which takes the same right-hand value as the TIMESTAMP keyword). For example, 1 @ '2010-01-02T23:45:33Z' will have a finite timestamp in 2010.

Complex types

Within SQL expressions and functions, the type system is more sophisticated. In addition to the types mentioned above, the following are permitted:

When comparing rows, MLDB first sorts the columns by name and performs a lexicographical comparison of the column's names and values. To illustrate this, consider these rows:

id unflattened_row
id_1 { python : 1, java : 1, c++ : 3 }
id_2 { scala : 4, java : 3, c++ : 1 }
id_3 { python : 1, ada : 2 }

The rows are ordered in this way by MLDB when doing comparison:

id unflattened_row
id_3 { ada : 2, python : 1 }
id_2 { c++ : 1, java : 3, scala : 4 }
id_1 { c++ : 3, java : 1, python : 1 }

Similarly, MLDB uses embedding's values to lexicographical order embeddings.

Complex type flattening

When a complex type is returned as part of an SQL query result or stored in a dataset, it is flattened into a set of columns with atomic values.

Rows are flattened column by column into the parent row with their existing name, either unprefixed if using the query syntax as * or prefixed with <prefix>. if using the query syntax as <prefix>.

For example, select {x: 1, y: 2} as output, {x: 3, y: 4} as * yields

output.x output.y x y
1 2 3 4

Embeddings are flattened by creating one column per value, with the name being an incrementing integer from 0 upwards, prefixed with <prefix>. if using the query syntax as <prefix>, otherwise a prefix will be automatically generated.

For example, select [1,2] as x yields

x.0 x.1
4 6

As a result, sorting by column names where there are more than 9 columns may give strange results, with 10 sorting before 2. This can be addressed at the application level.