The SQL query function type allows the creation of a function to perform an SQL query against a dataset. The query is parameterized on the function input values. SQL query function allows for joins to be implemented in MLDB.
Functions created this way executes the given query against the given dataset, and return a single row (the details of row construction is given below).
A new function of type sql.query
named <id>
can be created as follows:
mldb.put("/v1/functions/"+<id>, {
"type": "sql.query",
"params": {
"query": <InputQuery>,
"output": <SqlQueryOutput>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
query | SQL query to run. The values in the dataset, as well as the input values, will be available for the expression calculation |
output | Controls how the query output is converted into a row. |
The output
field has the following possible values:
SqlQueryOutput
Value | Description |
---|---|
FIRST_ROW | Return only the first row of the query |
NAMED_COLUMNS | Output is a table with a 'value' and optional 'column' column. Output row will be constructed from all of the returned columns, assembled into a single row, with column names provided by the 'column' column, or if null, the row name. |
The function input values are declared in the SQL query by
prefixing their name with a $
character. For UTF-8 names
or those that are not simple identifiers, it is necessary to
put them inside '"' characters. For example,
$x
refers to the input value named x
.$"x and y"
refers to the input value named x and y
.The input values are available in the entire expression, including as part of table expressions or join expressions.
There are two possibilities for the output:
output
field is set to FIRST_ROW
(the default), then
it will outputs the SELECT
expression applied to the first matching
row (ie, with an OFFSET
of 0 and a LIMIT
of 1). If multiple
rows are matched, all but the first will be ignored.output
field is set to NAMED_COLUMNS
, then the query
must return a two-column dataset with a column
and a value
each column. Each row in the output will generate a single
column in the output row, with the column name equal to value of the
column
column, and the column value equal to the value of the
value
column. In this case the OFFSET
and LIMIT
will be
respected. This allows for more sophisticated rows to be returned
from queries, and is especially useful in conjunction with joins.As an example, if the table returned from the query is the following
column | value |
---|---|
x | 1 |
y | 2 |
then for a FIRST_ROW
output, we will produce the row corresponding
to the first row of the table, viz
column | value |
---|---|
x | 1 |
whereas for a NAMED_COLUMNS
output, we will produce the row with one
column per output row:
x | y |
---|---|
1 | 2 |
Note that the FIRST_ROW
could accept any output format from the query,
whereas the output for NAMED_COLUMNS
must have exactly the two columns
given.
WHERE
clause
so that it is not necessary to scan the whole table. Otherwise
the queries may be very slow.As an example, the following sql.query
object would strip
out any numeric-valued columns and uppercase all names from a
passed in row:
mldb.put("/v1/functions/row_transform", {
"type": "sql.query",
"params": {
"query": """
SELECT upper(column) AS column, value
FROM row_dataset($input)
WHERE CAST (value AS NUMBER) IS NULL
""",
"output": "NAMED_COLUMNS"
}
})
The following SQL yields the following result:
SELECT row_transform({input: {x: 1, y: 2, z: "three"}})[output] AS *
Z |
---|
"three" |
sql.expression
function type calculates pure SQL
expressions, without a dataset involved.