The sampled dataset type allows sampling of another dataset. The sampling operation is virtual, in other words, no copy of the initial dataset is made.
A new dataset of type sampled
named <id>
can be created as follows:
mldb.put("/v1/datasets/"+<id>, {
"type": "sampled",
"params": {
"rows": <int>,
"fraction": <float>,
"withReplacement": <bool>,
"dataset": <SqlFromExpression>,
"seed": <int>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
rows | Number of rows to sample from |
fraction | Fraction of rows to sample from |
withReplacement | Sample with or without replacement. Sampling with replacement means that the same input row can appear in the output more than once. |
dataset | Dataset to sample |
seed | Seed value for the random number generator. The purpose of this parameter is to permit reproducible random samples. This parameter is optional, with the default value being selected randomly for each sample. |
sample
function can also be used within From expressions.merged
dataset type is another dataset transformationtransform
procedure type can be used to modify a dataset ready for merging