Support Vector Machine Training Procedure

This procedure trains a Support Vector Machine (SVM) model and stores the model file to disk. It is a wrapper around the popular open-source LIBSVM library. For more information about LIBSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Configuration

A new procedure of type svm.train named <id> can be created as follows:

mldb.put("/v1/procedures/"+<id>, {
    "type": "svm.train",
    "params": {
        "trainingData": <InputQuery>,
        "modelFileUrl": <Url>,
        "configuration": <JSON>,
        "functionName": <string>,
        "svmType": <SVMType>,
        "runOnCreation": <bool>
    }
})

with the following key-value definitions for params:

Field, Type, Default	Description
trainingData InputQuery	Specification of the data for input to the SVM Procedure. This should be organized as an embedding, with each selected row containing the same set of columns with numeric values to be used as coordinates. The select statement does not support groupby and having clauses.
modelFileUrl Url	URL where the model file (with extension '.svm') should be saved. This file can be loaded by a function of type 'svm'.
configuration JSON	Configuration object to use for the SVM Procedure. Each one has its own parameters. If none is passed, then the configuration will be loaded from the ConfigurationFile parameter
functionName string	If specified, a SVM function of this name will be created using the trained SVM
svmType SVMType `"classification"`	If specified, a SVM function of this name will be created using the trained SVM.
runOnCreation bool `true`	If true, the procedure will be run immediately. The response will contain an extra field called `firstRun` pointing to the URL of the run.

Type of SVM

There are 5 types of SVM that can be trained:

classification will train a regular SVM for multi-class classification
nu-classification will train the nu version of multi-class SVM classification
one class will train a one-class SVM that will evaluate how alike a vector is to the training input
regression will train a SVM for regression
nu-regression will train the nu version of SVM for regression

In the nu version of the SVM, the nu parameter is used to control the number of support vectors.

You can choose the type of SVM in the svmType parameter of the procedure training

Label

You must set the label parameter of the procedure training to specify which column in the input is to be used as label for classification, or as regression value. All other columns will be used as the feature vector.

Configuration Contents

Here are the fields that you can specify in configuration:

kernel specifies the type of SVM kernel to be used (see below). Default value is 'rbf'
degree specifies the degree of polynome for polynomial kernels. Default value is '3'
coef0 specifies the coefficient of polynomial for sigmoid kernels. Default value is 0.
eps specifies the stopping criteria for SVM training. Default value is 1e-3.
C specifies the C parameter for various kernels. Default value is 1.
gamma specifies gamma parameter for various kernels. Default value is 1 divided by the number of features.
nu specifies the nu parameter for NU and one class SVM. Default value is 0.5.
p specifies the p parameter for SVM regression. Default value is 0.1.
shrinking specifies whether to use shrinking heuristics. Default is 1.
probability specifies whether to perform probability estimates. Default is 0.

Kernels

The following type of kernels are supported, when applying feature vectors x and y: