The SQL query function type allows the creation of a function to perform an SQL query against a dataset. The query is parameterized on the function input values. SQL query function allows for joins to be implemented in MLDB.
Functions created this way executes the given query against the given dataset, and return a single row (the details of row construction is given below).
A new function of type sql.query named <id> can be created as follows:
mldb.put("/v1/functions/"+<id>, {
"type": "sql.query",
"params": {
"query": <InputQuery>,
"output": <SqlQueryOutput>
}
})with the following key-value definitions for params:
| Field, Type, Default | Description |
|---|---|
query | SQL query to run. The values in the dataset, as well as the input values, will be available for the expression calculation |
output | Controls how the query output is converted into a row. |
The output field has the following possible values:
SqlQueryOutput| Value | Description |
|---|---|
FIRST_ROW | Return only the first row of the query |
NAMED_COLUMNS | Output is a table with a 'value' and optional 'column' column. Output row will be constructed from all of the returned columns, assembled into a single row, with column names provided by the 'column' column, or if null, the row name. |
The function input values are declared in the SQL query by
prefixing their name with a $ character. For UTF-8 names
or those that are not simple identifiers, it is necessary to
put them inside '"' characters. For example,
$x refers to the input value named x.$"x and y" refers to the input value named x and y.The input values are available in the entire expression, including as part of table expressions or join expressions.
There are two possibilities for the output:
output field is set to FIRST_ROW (the default), then
it will outputs the SELECT expression applied to the first matching
row (ie, with an OFFSET of 0 and a LIMIT of 1). If multiple
rows are matched, all but the first will be ignored.output field is set to NAMED_COLUMNS, then the query
must return a two-column dataset with a column and a value
each column. Each row in the output will generate a single
column in the output row, with the column name equal to value of the
column column, and the column value equal to the value of the
value column. In this case the OFFSET and LIMIT will be
respected. This allows for more sophisticated rows to be returned
from queries, and is especially useful in conjunction with joins.As an example, if the table returned from the query is the following
| column | value |
|---|---|
| x | 1 |
| y | 2 |
then for a FIRST_ROW output, we will produce the row corresponding
to the first row of the table, viz
| column | value |
|---|---|
| x | 1 |
whereas for a NAMED_COLUMNS output, we will produce the row with one
column per output row:
| x | y |
|---|---|
| 1 | 2 |
Note that the FIRST_ROW could accept any output format from the query,
whereas the output for NAMED_COLUMNS must have exactly the two columns
given.
WHERE clause
so that it is not necessary to scan the whole table. Otherwise
the queries may be very slow.As an example, the following sql.query object would strip
out any numeric-valued columns and uppercase all names from a
passed in row:
mldb.put("/v1/functions/row_transform", {
"type": "sql.query",
"params": {
"query": """
SELECT upper(column) AS column, value
FROM row_dataset($input)
WHERE CAST (value AS NUMBER) IS NULL
""",
"output": "NAMED_COLUMNS"
}
})
The following SQL yields the following result:
SELECT row_transform({input: {x: 1, y: 2, z: "three"}})[output] AS *
| Z |
|---|
| "three" |
sql.expression function type calculates pure SQL
expressions, without a dataset involved.