This procedure trains a classification model and stores the model file to disk.
A new procedure of type classifier.train
named <id>
can be created as follows:
mldb.put("/v1/procedures/"+<id>, {
"type": "classifier.train",
"params": {
"mode": <ClassifierMode>,
"multilabelStrategy": <MultilabelStrategy>,
"trainingData": <InputQuery>,
"algorithm": <string>,
"configuration": <JSON>,
"configurationFile": <string>,
"equalizationFactor": <float>,
"modelFileUrl": <Url>,
"functionName": <string>,
"runOnCreation": <bool>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
mode | Model mode: |
multilabelStrategy | Multilabel strategy: |
trainingData | SQL query which specifies the features, labels and optional weights for training. The query should be of the form The select expression must contain these two columns:
The select expression can contain an optional The query must not contain |
algorithm | Algorithm to use to train classifier with. This must point to an entry in the configuration or configurationFile parameters. See the classifier configuration documentation for details. |
configuration | Configuration object to use for the classifier. Each one has its own parameters. If none is passed, then the configuration will be loaded from the ConfigurationFile parameter. See the classifier configuration documentation for details. |
configurationFile | File to load configuration from. This is a JSON file containing only objects, strings and numbers. If the configuration object is non-empty, then that will be used preferentially. See the classifier configuration documentation for details. |
equalizationFactor | Amount to adjust weights so that all classes have an equal total weight. A value of 0 will not equalize weights at all. A value of 1 will ensure that the total weight for both positive and negative examples is exactly identical. A number between will choose a balanced tradeoff. Typically 0.5 (default) is a good number to use for unbalanced probabilities. See the classifier configuration documentation for details. |
modelFileUrl | URL where the model file (with extension '.cls') should be saved. This file can be loaded by the |
functionName | If specified, an instance of the |
runOnCreation | If true, the procedure will be run immediately. The response will contain an extra field called |
This procedures supports many training algorithm. The configuration is explained on the classifier configuration page.
The status of a Classifier procedure training will return a JSON representation of the model parameters of the trained classifier, to allow introspection.
The mode
field controls which mode the classifier will operate in:
boolean
mode will use a boolean label, and will predict the probability of
the label being true as a single floating point number.regression
mode will use a numeric label, and will predict the value of
the label itself.categorical
mode will use a categorical (multi-class) label, and will
predict the probability of each of the categories independently. This
style therefore produces multiple outputs.multilabel
mode will do multi-label classification
by using a set of categorical (multi-class) labels, and will
predict the probability of each of the categories independently. This
style therefore produces multiple outputs. The multilabelStrategy
field
controls how multilabel classification is handled.In all operation modes but multilabel
, the label is a single scalar value. The multilabel
handles
categorial classification problems where each example has a set of labels instead of a single one.
To this end the label
input must be a row. In this row each column with a non-null value will be a
label value in the example's set. The column name is used to identify the label, while the value itself is disregarded.
This makes multi-label classification easy to use with bag of words, for example.
classifier.test
procedure type allows the accuracy of a predictor to be tested against
held-out data.probabilizer.train
procedure type trains a probabilizer.classifier
function type applies a classifier to a feature vector, producing a classification score.classifier.explain
function type explains how a classifier produced its output.probabilizer
function type works with classifier.apply to convert scores to probabilities.