Feature Hashing Function

The hashed column feature generator function is a vectorizer that can be used on large or variable length rows, or fixed lenght rows with lots of categorical values, to produce a smaller fixed length numerical feature vector.

Configuration

A new function of type feature_hasher named <id> can be created as follows:

mldb.put("/v1/functions/"+<id>, {
    "type": "feature_hasher",
    "params": {
        "numBits": <int>,
        "mode": <HashingMode>
    }
})

with the following key-value definitions for params:

Field, Type, DefaultDescription

numBits
int
8

Number of bits to use for the hash. The number of resulting buckets will be \(2^{\text{numBits}}\).

mode
HashingMode
"columns"

Hashing mode to use. Controls what gets hashed.

Hashing mode

The mode field controls what gets hashed:

Input and Output Value

Functions of this type have a single input value called columns which is a row and a single output value called hash which is a row of size \(2^{\text{numBits}}\).

See also