Classification Algorithms with Python

Hüseyin Kaya
6 min readDec 30, 2021

--

In this article, I will talk about the most used classification algorithms. We’ll also look at how they use it with Python.

Using classification algorithms allows us to divide data into specific classes. Suppose we have a dataset of pictures of lions and deer. Using classification algorithms, we can divide the lions in the pictures into one class and the deer into another class. Let’s expand the example a bit. Let’s say we have a dataset with a large number of photos of people. By classifying people according to their similarities, we can find people who are similar to each other.

The underlying problem of classification algorithms is to determine which class a new data belongs to. Experiments are also aimed at solving this problem.

Before we talk about algorithms, let’s take a look at the types of classification.

Types of Classification

Binary Classification

If we want to divide the data into two classes, it is the type of classification we should use. Gender classification, old-young classification, lion-deer classification are examples of binary classification.

Binary Classification

Multiclass Classification

It is the type of classification used for the data that we will divide into more than two classes. Each data can belong to only one class. So a person can be young and old, but not both at the same time.

Multiclass Classification

Multi-Label Classification

It is a type of classification where each data can belong to more than one class. For example, the subject of a book may be about economics, politics and sociology.

Multi-Label Classification

Now that we’ve talked about the types of classification, we can now take a look at the algorithms.

Classification Algorithms

Logistic Regression

Despite its name, logistic regression is a machine learning algorithm used in classification rather than regression. In this model, the probabilities that describe the possible outcomes of a single trial are modeled using the logistic function.

Its probabilistic approach and giving information about the statistical significance of the features expressing the data are among its advantages.

It only works when the predicted variable is binary, assuming all predictors are independent of each other and assuming no missing values ​​in the data. These assumptions and study constraints are disadvantages.

Logistic Regression
Using logistic regression with Python

Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to fit linear classifiers and regressors under convex loss functions such as Support Vector Machines and Logistic Regression. Although SGD has been around for a long time in the machine learning community, it has recently received a significant amount of attention in the context of large-scale learning.

SGD has been successfully applied to large-scale machine learning problems frequently encountered in text classification and natural language processing.

Its efficiency and ease of application are among its advantages.

Its disadvantages are that it requires a set of hyperparameters such as the arrangement parameter and the number of iterations.

Stochastic Gradient Descent
Using SGD with Python

Naive Bayes

Naive Bayes is a set of supervised learning algorithms based on the application of Bayes’ theorem with the assumption of “naive” conditional independence between each pair of features given the value of the class variable.

It works on nonlinear problems.

It has the advantages of being efficient, not biased about outliers, and having a probabilistic approach.

Its major disadvantage is that it is based on the assumption that the features have the same statistical significance.

Naive Bayes
Using Naive Bayes with Python

K-Nearest Neighbours

K-Nearest Neighbors does not attempt to create a general internal model, it only stores samples of the training data. The classification is calculated by a simple majority vote of the nearest neighbors of each point. A query point is assigned the data class that has the most representatives among the point’s nearest neighbors.

Its advantages are that it is simple to understand, fast and efficient.

The disadvantage is that the number of neighbors ‘k’ has to be chosen.

K-Nearest Neighbours
Using KNN with Python

Decision Tree

The decision tree produces a set of rules that can be used for classification, given the attribute data of the classes. It makes classification according to these rules.

Its advantages are many. It is very easy to understand and visualize. It requires very little data preprocessing. It can work with both numerical and categorical data.

The disadvantages are that they can form complex trees that do not generalize well, and decision trees can be unstable as small changes in the data can result in an entirely different tree being produced.

Decision Tree
Using Decision Tree with Python

Random Forest

A random forest is a meta-estimator that fits a set of decision tree classifiers across various sub-samples of the dataset and uses the mean to improve prediction accuracy and control for overfitting. The subsample size is controlled by parameters, otherwise the entire dataset is used to build each tree.

Its advantages include reducing overfitting and, in most cases, more accurate than decision trees.

Its disadvantage is that it is slow, difficult to implement and a complex algorithm.

Random Forest
Using Random Forest with Python

Support Vector Machine

Support vector machines are a set of supervised learning methods used for classification, regression and outlier detection.

Support Vector Machines are a representation of training data as categorized points in space as wide open space as possible. New samples are then mapped to the same area and predicted to belong to a category based on which side of the gap they fall on.

It is effective in high-dimensional spaces. It uses a subset of training points in the decision function, so it is memory efficient as well. It is versatile: different kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

If the number of features is more than the number of samples, it is very important to avoid overfitting when choosing Kernel functions. SVMs do not provide direct probability estimates, they are calculated using an expensive five-fold cross validation.

Support Vector Machine
Using SVM with Python

--

--

No responses yet