Function selection can be applied to dummy variables

What is the best 2-class classifier for your application? [closed]

Closed . This question is based on opinion. No responses are currently accepted.

Would you like to improve this question? Update the question so it can be answered with facts and quotes by editing this post.

Closed 3 years ago.


  • One classifier per answer
  • Vote if you agree
  • Reject / remove duplicates.
  • Enter your application in the comment


Random forest

  • easily captures complicated structure / nonlinear relationship
  • invariable to the scale of the variables
  • No dummy variables need to be created for categorical predictors
  • Not much variable selection is required
  • relatively difficult to cover

Logistic regression:

  • Fast and powerful on most data sets
  • Almost no parameters to match
  • processes both discrete / continuous functions
  • The model is easy to interpret
  • (Not really limited to binary classifications)

Regularized discriminant for monitored problems with noisy data

  1. Computationally efficient
  2. Robust against noise and outliers in data
  3. Both linear discriminant (LD) and quadratic discriminant (QD) classifiers can be obtained from the same implementation by changing the regularization parameters' [lambda, r] 'to' [1 0] 'for LD classifiers and' [0 0] 'Can be set for QD classifier - very useful for reference purposes.
  4. Model is easy to interpret and export
  5. Works well for sparse and 'wide' data sets where the class covariance matrices may not be well defined.
  6. An estimate of the rear class probability can be estimated for each sample by applying the softmax function to the discriminant values ​​for each class.

Link to the original work by Friedman et al . From 1989 here. There are also very good explanations by Kuncheva in her book "Combining Pattern Classifiers".

Gradient trees.

  • At least as accurate as HF in many applications
  • Seamlessly includes missing values
  • Var importance
  • Partial dependency diagrams
  • GBM versus randomForest in R: handles MUCH larger data sets

Gaussian Process Classifier - It gives probabilistic predictions (useful when the frequency of your operational relative classes is different from that in your training set, or when the false-positive / false-negative costs are unknown or variable). It also provides an estimate of the uncertainty in model predictions due to the uncertainty in "estimating the model" from a finite data set. The co-variance function corresponds to the kernel function in an SVM and can therefore also be applied directly to non-vectorial data (e.g. character strings or diagrams, etc.). The math framework is neat too (but don't use the Laplace approximation). Automated model selection by maximizing the marginal probability.

Essentially combines good properties of logistic regression and SVM.

L1-regulated logistic regression.

  • It's computationally fast.
  • It has an intuitive interpretation.
  • There is only one easy-to-understand hyperparameter that can be automatically adjusted through cross-validation, which is often a good way to go.
  • The coefficients are piecewise linear and their relationship to the hyperparameter is immediately and easily visible in a simple diagram.
  • This is one of the less dubious methods of variable selection.
  • It has a really cool name too.

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.