Classifier configuration

MLDB supports many algorithms to solve supervised learning tasks for both classification and regression. There are two procedures that are available to train a model:

Both of those procedures share the configuration keys algorithm, configuration, configurationFile and equalizationFactor. This document explains how to use them.

Outline

Methods of configuring a classifier training

There are three ways of configuring which classifier will be trained:

  1. Leave the configuration and configurationFile empty, and choose a standard algorithm configuration by name. See below for the contents of the default configurationFile).
  2. Put the configuration inline in the configuration parameter (JSON) and set algorithm to either empty (if the configuration is at the top level) or to the dot separated path if it's not at the top level. See below for details on specifying your own configuration.
  3. Put the configuration in an external resource identified by the configurationFile parameter, and set the algorithm as in number 2. See below for details on specifying your own configurationFile.

Configuration file contents

A configuration JSON object or the contents of a configurationFile looks like this (see below for the contents of the default, overrideable configurationFile:

{
    "algorithm_name": {
        "type": "classifier_type",
        "parameter": "value",
        ...
    },
    ...
}

The classifier training procedure includes support for the following types of classifiers.

These classifiers tend to be high performance implementations of well known classifiers which train and predict fast and are often a good default choice when a generic classification step is required.

Algorithms

Decision Trees (type=decision_tree)

ParameterRangeDefaultDescription
verbosity0-52verbosity of information from training
profiletrue|false|1|0falsewhether or not to profile
validatetrue|false|1|0falseperform expensive internal validation
trace0-0trace execution of training in a very fine-grained fashion
max_depth0- or -1-1give maximum tree depth. -1 means go until data separated
update_algnormal gentle probprobselect the type of output that the tree gives
random_feature_propn0.0-1.01proportion of the features to enable (for random forests)

The update_alg parameter can take three different values: prob, normal and gentle. Here is how they work, using an example with a leaf node that contains 8 positive and 2 negative labels:

For more details, please refer to Friedman, Hastie, Tibshirani, "Additive Logistic Regression: A Statistical View of Boosting" , The Annals of Statistics 2000, Vol. 28, No. 2, 337–407

Generalized Linear Models (type=glz)

ParameterRangeDefaultDescription
verbosity0-52verbosity of information from training
profiletrue|false|1|0falsewhether or not to profile
validatetrue|false|1|0falseperform expensive internal validation
add_biastrue|false|1|0trueadd a constant bias term to the classifier?
decodetrue|false|1|0truerun the decoder (link function) after classification?
link_functionlogit probit comp_log_log linear loglogitwhich link function to use for the output function
regularizationnone l1 l2l2type of regularization on the weights (L1 is slower due to an iterative algorithm)
regularization_factor-1 to infinite1.0000000000000001e-05regularization factor to use. auto-determined if negative (slower). the bigger this value is, the more regularization on the weights
max_regularization_iteration1 to infinite1000maximum number of iterations for the L1 regularization
regularization_epsilonpositive number0.0001smallest weight update before assuming convergence for the L1 iterative algorithm
normalizetrue|false|1|0truenormalize features to have zero mean and unit variance for greater numeric stability (slower training but recommended with L1 regularization)
conditiontrue|false|1|0falsecondition features to have no correlation for greater numeric stability (but much slower training)
feature_proportion0 to 11use only a (random) portion of available features when training classifier

The different options for the link_function parameter are defined as follows:

Name Link Function Activation Function (inverse of the link function)
logit \[g(x)=\ln \left( \frac{x}{1-x} \right) \] \[g^{-1}(x) = \frac{1}{1 + e^{-x}}\]
probit \(g(x)=\Phi^{-1}(x)\)

where \(\Phi\) is the normal distribution's CDF
\[g^{-1}(x) = \Phi (x)\]
comp_log_log \[g(x)=\ln \left( - \ln \left( 1-x \right) \right)\] \[g^{-1}(x) = 1 - e^{-e^x}\]
linear \[g(x)=x\] \[g^{-1}(x) = x\]
log \[g(x)=\ln x\] \[g^{-1}(x) = e^x\]

Bagging (type=bagging)

The bagging algorithm, also known as bootstrap aggregating, is used in conjunction with another algorithm, for instance with a decision tree to create bagged decision trees. There is an example of this in the default configuration file for the bdt key.

ParameterRangeDefaultDescription
verbosity0-554verbosity of information from training
profiletrue|false|1|0falsewhether or not to profile
validatetrue|false|1|0falseperform expensive internal validation
num_bagsN>=110number of bags to divide classifier into
validation_split00.349999994how much of training data to hold off as validation data
weak_leanerperceptron, bagging, boosting, naive_bayes, stump, decision_tree, glz, boosted_stumps, null, onevsall, fasttext

See also : Bagging on Wikipedia.

Boosting (type=boosting)

ParameterRangeDefaultDescription
verbosity0-52verbosity of information from training
profiletrue|false|1|0falsewhether or not to profile
validatetrue|false|1|0falseperform expensive internal validation
validation_split00.300000012how much of training data to hold off as validation data
min_iter1-max_iter10minimum number of training iterations to run
max_iter>=min_iter500maximum number of training iterations to run
cost_functionexponential logisticexponentialselect cost function for boosting weight update
short_circuit_window0-0short circuit (stop) training if no improvement for N iter (0 off)
trace_training_acctrue|false|1|0falsetrace the accuracy of the training set as well as validation
weak_leanerperceptron, bagging, boosting, naive_bayes, stump, decision_tree, glz, boosted_stumps, null, onevsall, fasttext

See also : Boosting on Wikipedia.

Neural Networks (type=perceptron)

ParameterRangeDefaultDescription
verbosity0-52verbosity of information from training
profiletrue|false|1|0falsewhether or not to profile
validatetrue|false|1|0falseperform expensive internal validation
validation_split00.300000012how much of training data to hold off as validation data
min_iter1-max_iter10minimum number of training iterations to run
max_iter>=min_iter100maximum number of training iterations to run
learning_ratereal0.00999999978positive: rate of learning relative to dataset size: negative for absolute
arch(see doc)%ihidden unit specification; %i=in vars, %o=out vars; eg 5_10
activationlogsig tanh tanhs identity softmax nonstandardtanhactivation function for neurons
output_activationlogsig tanh tanhs identity softmax nonstandardtanhactivation function for output layer of neurons
decorrelatetrue|false|1|0truedecorrelate the features before training
normalizetrue|false|1|0truenormalize to zero mean and unit std before training
batch_size0.0-1.0 or 1 - nvectors1024number of samples in each "mini batch" for stochastic
target_value0.0-1.00.800000012the output for a 1 that we ask the network to provide

Naive Bayes (type=naive_bayes)

ParameterRangeDefaultDescription
verbosity0-52verbosity of information from training
profiletrue|false|1|0falsewhether or not to profile
validatetrue|false|1|0falseperform expensive internal validation
trace0-0trace execution of training in a very fine-grained fashion
feature_prop0.01which proportion of features do we look at

Note that our version of the Naive Bayes Classifier only supports discrete features. Numerical-valued columns (types NUMBER and INTEGER) are accepted, but they will be discretized prior to training. To do so, we will simply split all the values in two, using the threshold that provides the best separation of classes. You can always do your own discretization, for instance using a CASE expression.

FastText (type=fasttext)

ParameterRangeDefaultDescription
verbosity0-52verbosity of information from training
profiletrue|false|1|0falsewhether or not to profile
validatetrue|false|1|0falseperform expensive internal validation
epoch1+5Number of iterations over the data
dims1+100Number of dimensions in the embedding
verbosity0+0Level of verbosity in standard output

Note that our version of the Fast Text Classifier only supports feature counts, and currently does not support regression.

See also : fastText on arXiv.

Default configurationFile

The default, overrideable configurationFile contains the following predefined configurations, which can be accessed by name with the algorithm parameter:

{

    "nn": { 
        "_note": "Neural Network",
        
        "type": "perceptron",
        "arch": 50,
        "verbosity": 3,
        "max_iter": 100,
        "learning_rate": 0.01,
        "batch_size": 10
    },


    "bbdt": {
        "_note": "Bagged boosted decision trees",
        
        "type": "bagging",
        "verbosity": 3,
        "weak_learner": {
            "type": "boosting",
            "verbosity": 3,
            "weak_learner": {
                "type": "decision_tree",
                "max_depth": 3,
                "verbosity": 0,
                "update_alg": "gentle",
                "random_feature_propn": 0.5
            },
            "min_iter": 5,
            "max_iter": 30
        },
        "num_bags": 5
    },

    "bbdt2": {
        "_note": "Bagged boosted decision trees",
        
        "type": "bagging",
        "verbosity": 1,
        "weak_learner": {
            "type": "boosting",
            "verbosity": 3,
            "weak_learner": {
                "type": "decision_tree",
                "max_depth": 5,
                "verbosity": 0,
                "update_alg": "gentle",
                "random_feature_propn": 0.8
            },
            "min_iter": 5,
            "max_iter": 10,
            "verbosity": 0
        },
        "num_bags": 32
    },

    "bbdt_d2": {
        "_note": "Bagged boosted decision trees",
        
        "type": "bagging",
        "verbosity": 3,
        "weak_learner": {
            "type": "boosting",
            "verbosity": 3,
            "weak_learner": {
                "type": "decision_tree",
                "max_depth": 2,
                "verbosity": 0,
                "update_alg": "gentle",
                "random_feature_propn": 1
            },
            "min_iter": 5,
            "max_iter": 30
        },
        "num_bags": 5
    },

    "bbdt_d5": {
        "_note": "Bagged boosted decision trees",
        
        "type": "bagging",
        "verbosity": 3,
        "weak_learner": {
            "type": "boosting",
            "verbosity": 3,
            "weak_learner": {
                "type": "decision_tree",
                "max_depth": 5,
                "verbosity": 0,
                "update_alg": "gentle",
                "random_feature_propn": 1
            },
            "min_iter": 5,
            "max_iter": 30
        },
        "num_bags": 5
    },

    "bdt": {
        "_note": "Bagged decision trees",
        
        "type": "bagging",
        "verbosity": 3,
        "weak_learner": {
            "type": "decision_tree",
            "verbosity": 0,
            "max_depth": 5
        },
        "num_bags": 20
    },

    "dt": {
        "_note": "Plain decision tree",
        
        "type": "decision_tree",
        "max_depth": 8,
        "verbosity": 3,
        "update_alg": "prob"
    },

    "glz_linear": {
        "_note": "Generalized Linear Model, linear link function, to be used for 'regression' mode",

        "type": "glz",
        "link_function": "linear",
        "verbosity": 3,
        "normalize ": "true",
        "regularization" = "l2"
    },

    "glz": {
        "_note": "Generalized Linear Model.  Very smooth but needs very good features",

        "type": "glz",
        "verbosity": 3,
        "normalize ": " true",
        "regularization" = "l2"
    },

    "glz2": {
        "_note": "Generalized Linear Model.  Very smooth but needs very good features",

        "type": "glz",
        "verbosity": 3
    },

    "bglz": {
        "_note": "Bagged random GLZ",

        "type": "bagging",
        "verbosity": 1,
        "validation_split": 0.1,
        "weak_learner": {
            "type": "glz",
            "feature_proportion": 1.0,
            "verbosity": 0    
        },
        "num_bags": 32
    },


    "bs": {
        "_note": "Boosted stumps",

        "type": "boosted_stumps",
        "min_iter": 10,
        "max_iter": 200,
        "update_alg": "gentle",
        "verbosity": 3
    },

    "bs2": {
        "_note": "Boosted stumps",

        "type": "boosting",
        "verbosity": 3,
        "weak_learner": {
            "type": "decision_tree",
            "max_depth": 1,
            "verbosity": 0,
            "update_alg": "gentle"
        },
        "min_iter": 5,
        "max_iter": 300,
        "trace_training_acc": "true"
    },

    "bbs2": {
        "_note": "Bagged boosted stumps",

        "type": "bagging",
        "num_bags": 5,
        "weak_learner": {
            "type": "boosting",
            "verbosity": 3,
            "weak_learner": {
                "type": "decision_tree",
                "max_depth": 1,
                "verbosity": 0,
                "update_alg": "gentle"
            },
            "min_iter": 5,
            "max_iter": 300,
            "trace_training_acc": "true"
        }
    },

    "naive_bayes": {
        "_note": "Naive Bayes",

        "type": "naive_bayes",
        "feature_prop": "1",
        "verbosity": 3
    }
}

Training Weighting

This section describes how you can set different weights for each example in your training set, either based upon the label or based upon a calculation over the row, to enable finer control over which examples the classifier makes the most effort to classify.

Equalizing class weights

The equalizationFactor parameter can be used to adjust an unbalanced training set to be more balanced for training, which frequently has the effect of requiring the classifiers to focus more on separating the positive and negative classes rather then getting really high scores for the dominant class.

Setting example weight explicitly

The optional weight expression in the trainingData parameter of the configuration must evaluate to a positive number that implies how many examples this counts for. For example, a single row with a weight of 2, or the same single row duplicated twice with a weight of 1 will have the same effect.

Note that only the relative weights matter. Before the classifier is trained, the weights will be normalized so that they sum to 1 to avoid numerical issues in the classifier training process.

Combining the two

If the two weighting methods are combined, then the weight expression will be used to set the relative weight per example within its label class, and the equalizationFactor will adjust the relative weight of each class.

See also