Stats Table Derived Columns Function Generator Procedure (Experimental)

This procedure is used with stats tables and makes the process of specifying the SQL expression representing the derived columns over the stats tables' counts easier. This is acheived by allowing the specification of an expression template that will be expanded for all stats tables.

Configuration

A new procedure of type experimental.statsTable.derivedColumnsGenerator named <id> can be created as follows:

mldb.put("/v1/procedures/"+<id>, {
    "type": "experimental.statsTable.derivedColumnsGenerator",
    "params": {
        "runOnCreation": <bool>,
        "functionId": <string>,
        "statsTableFileUrl": <Url>,
        "expression": <string>
    }
})

with the following key-value definitions for params:

Field, Type, DefaultDescription

runOnCreation
bool
true

If true, the procedure will be run immediately. The response will contain an extra field called firstRun pointing to the URL of the run.

functionId
string

ID to use for the instance of the sql.expression function type that will be created

statsTableFileUrl
Url

URL of the model file (with extension '.st') to load. This file is created by the statsTable.train procedure type.

expression
string

Expression to be expanded

Template rules

In the following, TBL represents the current stat table's name.

Example

We will reuse the same example as in the statsTable.train procedure type, which is using an online ad campaign. Below are the stats table counts:

rowName trial-host click-host purchase-host trial-region click-region purchase-region
br_1 0 0 0 0 0 0
br_2 0 0 0 0 0 0
br_3 1 0 1 1 1 0
br_4 1 0 1 2 1 1

One derived column that is useful is the ratio of clicks to impressions, or click-through ratio (CTR). The SQL expression to represent this for the host stats table is "click-host"/"trial-host". In real-life situations, we have many stats table to consider so we will provide a template for the procedure to expand.

The following expression:

"click"/"trial" as "ctr_$tbl", ln("purchase"+1) as "logPurchase_$tbl"

will produce this expanded expression for the dataset above:

"click-host"/"trial-host" as "ctr_host", 
log("purchase-host"+1) as "logPurchase_host", 
"click-region"/"trial-region" as "ctr_region",
log("purchase-host"+1) as "logPurchase_host"

As a best practice, it is recommended to always put quotes around columns when specifying expressions.

Applying the expanded expression to the br_4 row will produce the following output:

rowName ctr_host logPurchase_host ctr_region logPurchase_region
br_4 0 0.3 0.5 0.3

See also