An SQL From expression specifies the dataset(s) on which a query will run.
The simplest type of From expression is simply a dataset name. In that
case, the query will run on the given dataset. The dataset name can be followed by as <alias>
, in which case the query will be run on the given dataset, as if it was named <alias>
. For example:
SELECT x.* FROM dataset AS x
A full query in parentheses (i.e. ( )
) can be substituted for a dataset name. For example:
SELECT * FROM (SELECT * FROM dataset WHERE column1 = 2) as subselect WHERE column2 = 4'
From Expressions can be combined together to perform joins with the pattern <FromExpression1> <JoinType> <FromExpression2> ON <ValueExpression>
. For example:
SELECT *
FROM (SELECT * FROM dataset WHERE column1 = 2) as subselect
JOIN dataset2 as x ON subselect.column2 = x.column3 + 1
<FromExpression1>
and <FromExpression2>
are called the "left side" and "right side" of the join, respectively, and the <ValueExpression>
is called the "join condition".
The following <JoinType>
s are supported:
left JOIN right
, left INNER JOIN right
The output contains a row for each combination of rows of left
and rows of right
that satisfies the join condition.
left LEFT JOIN right
, left LEFT OUTER JOIN right
First, an inner join is performed. Then, for each row in left
that does not satisfy the join condition with any row in right
, a joined row is added with null values in columns of right
. The output therefore always has at least one row for each row in left
.
left RIGHT JOIN right
, left RIGHT OUTER JOIN right
First, an inner join is performed. Then, for each row in right
that does not satisfy the join condition with any row in left
, a joined row is added with null values in columns of left
. This is the converse of a left join: the output will always have a row for each row in right
.
left OUTER JOIN right
, left FULL JOIN right
, left FULL OUTER JOIN right
First, an inner join is performed. Then, for each row in left
that does not satisfy the join condition with any row in right
, a joined row is added with null values in columns of right
. Also, for each row of right
that does not satisfy the join condition with any row in left
, a joined row with null values in the columns of left
is added. The output therefore always has at least one row for each row of both left
and right
.
Queries can be made to a sample of a dataset by using the sample
function in the FROM expression. For example:
SELECT x.* FROM sample(dataset, {rows: 25, withReplacement: FALSE}) AS x
The second argument, a row expression with the dataset's configuration, is optional. When it is not specified, it use the sampled dataset's default parameters.
See sampled
dataset type for more details.
Queries can be made to the transpose of a dataset by using the transpose() function in the FROM expression. For example:
SELECT x.* FROM transpose(dataset) AS x
See transposed
dataset type for more details.
Queries can be made to the union of several datasets by using the merge() function in the FROM expression. For example:
SELECT x.* FROM merge(dataset1, dataset2, dataset3) AS x
See merged
dataset type for more details.
In some circumstances, it may be useful to use a row as a dataset,
particularly when using the sql.query
function type or a
sub-select. This can be done using the syntax
SELECT ... FROM row_dataset(expression) ...
if a dataset with one row per column in the input row is required, or
SELECT ... FROM atom_dataset(expression) ...
if a dataset with one row per atom in the input row is required (in other words, the row is pre-flattened and a table is made of the scalar values at the leaf nodes).
When this construct is used, a dataset is constructed with one row
for each column or atom in the expression, with one column called value
containing the value of the column, and one column called column
with the column name. For example, the query
SELECT * FROM row_dataset({x: 1, y:2, z: 'three'})
would yield the following result for both row_dataset
and atom_dataset
:
column | value |
---|---|
x | 1 |
y | 2 |
z | "three" |
whereas the following query
SELECT * FROM row_dataset({w: {x: 1, y:1, y:2, z: 'three'}})
would yield the following:
column | value |
---|---|
w | {x: 1, y:1, y:2, z: 'three'} |
whereas replacing row_dataset
with atom_dataset
would yield
SELECT * FROM atom_dataset({w: {x: 1, y:1, y:2, z: 'three'}})
column | value |
---|---|
w.x | 1 |
w.y | 2 |
w.z | "three" |
The COLUMN EXPR
and STRUCTURED COLUMN EXPR
constructs can be used to
similar effect when the goal is to apply a function or transformation to
each element of a row, rather then convert it into a table.
The column
column of the row_dataset
and atom_dataset
functions will
be of type PATH
. This will not compare true with any literal string,
which can be surprising. If it is necessary to compare it with a string, it is
necessary to convert the string into a path with CAST ('stringtocomparewith' AS PATH)
or parse_path('stringtocomparewith')
.
sample
and transpose
join