SVD Row Embedding

Functions of this type embed a row of a dataset into an SVD space, producing a singular vector, using a model previously trained by an svd.train procedure type.

Configuration

A new function of type svd.embedRow named <id> can be created as follows:

mldb.put("/v1/functions/"+<id>, {
"type": "svd.embedRow",
"params": {
"modelFileUrl": <Url>,
"maxSingularValues": <int>,
"acceptUnknownValues": <bool>
}
})

with the following key-value definitions for params:

Field, Type, DefaultDescription

modelFileUrl
Url

URL of the model file (with extension '.svd') to load. This file is created by the svd.train procedure type.

maxSingularValues
int
-1

Maximum number of singular values to use (-1 = all)

acceptUnknownValues
bool
false

This parameter (which defaults to false) tells us whether or not unknown values should be accepted by the SVD. An unknown value occurs when a column that was always a number in training is presented with a string value, or vice versa, or when a string valued column is presented with a value unknown in training. If its value is true, an unknown value will be silently ignored. If its value is false, an unknown value will return an error when the function is applied.

Input and Output Value

Functions of this type has a single input value called row which is a row. The columns that are expected in this row depend on the features that were trained into the SVD model. For example, if in the training the input value was "select": "x,y", then the function will expect two columns called x and y.

These functions have a single output value called embedding which contains a row. This row will contain columns with names prefixed with the outputColumn parameter of the svd.train procedure type that trained the model, followed by a 4 digit number for each of the singular values. By default, the outputColumn parameter is svd so the columns of the output row will be svd0001, svd0002, etc.

Validation

When an SVD procedure is trained, it infers the type of input values and does feature extraction based upon the types seen in training. If the type of an input value passed into the function doesn't match the input value type seen in training, then the SVD may not give sensible outputs. This happens in the following situations:

• A column was always a number in training, but a string value is passed in on the corresponding input value;
• A column had string values in training, but the string value passed in was not seen at all in training.

The acceptUnknownValues parameter controls what happens in this situation. If the value of that parameter is true, then a column with an unknown value will be ignored completely. If the value of the parameter is false, then the application of the function will return an error.

The main use of that parameter is to catch errors in the development phase, for example accidentially encoding a parameter as a string when it should be an int or mixing column names.