From "Natural" to "Semi-Natural": The Advanced Way to Bayesian Classifiers

In machine learning classification tasks,Naive Bayes（Naive Bayes) is widely popular for its simplicity and efficiency, but its"simple"The name also implies its limitations.

In order to break through this limitation,Semi-naive Bayes（Semi-Naive Bayes) came into being.

This article will introduce in detail the principles, application scenarios and how to use Naive Bayes and Semi-Naughty Bayes.scikit-learnThe library implements them.

1. Naive Bayes: Simple but "naive"

Naive BayesIt is a simple probability classifier based on Bayesian theorem, itsCore ideaIt uses the assumption of independence between features to simplify the calculation.

Specifically, Naive Bayes assumes that each feature is independent of each other, that is, given a category label, the joint probability of all features can be decomposed into the product of the conditional probabilities of each feature.

Expressed by mathematical formulas as: $ P(X|Y)=P(x_1|Y)\times P(x_2|Y)\times\cdots\times P(x_n|Y) $

Among them, $X$is an eigenvector, $ Y $is the category tag, $x_1,x_2,\ldots,x_n $ are various features.

Naive BayesianAdvantagesIt is efficient in computing and suitable for high-dimensional data (such as news classification).

2. Semi-naive Bayes: Relaxation of the Assumption of Independence

althoughNaive BayesExcellent in many scenarios, but one of its key assumptions:Characteristic independence, in practical applications, it is often difficult to satisfy.

In the real world, there is usually a certain correlation between features.

For example, in text classification, the appearance of certain words may be closely related to the appearance of other words.

In this case,Naive BayesThe independence assumption will cause a degradation in classifier performance.

In order to solve this problem, semi-natural Bayes came into being.

Semi-naive BayesTo a certain extent, the assumption of feature independence is relaxed, allowing certain correlations between features to improve the performance of the classifier.

Semi-naive BayesThe core improvement is to allow dependencies between some features, and by capturing the dependencies between key features, improving classification accuracy while maintaining controllable computing complexity.

3. Practical comparison

Here is a simple example to comparesimpleandSemi-simpleBayesian accuracy when there is a dependency in attributes.

First, generate the test data:

categoryYThere are two values,0and1
categoryYDecideX1Distribution ofY=0When the mean is0，Y=1When the mean is1）
X2Depend onX1（X2 = X1 + Noise), simulate the dependencies between attributes

import numpy as np
 from sklearn.model_selection import train_test_split

 # Generate simulation data: Y affects X1 and X2, and X2 depends on X1
 (42)
 n_samples = 1000
 Y = (0, 2, n_samples)
 X1 = (n_samples)
 X2 = (n_samples)
 
 for i in range(n_samples):
     if Y[i] == 0:
         x1 = (0, 1)
         x2 = x1 + (0, 0.5) # X2 depends on X1
     else:
         x1 = (1, 1)
         x2 = x1 + (0, 0.5) # X2 depends on X1
     X1[i] = x1
     X2[i] = x2
 
 X = ((X1, X2)).T
 X_train, X_test, y_train, y_test = train_test_split(
     X,
     Y,
     test_size=0.3,
     random_state=42,
 )

Then use naive and semi-naive Bayes models to train the data separately to see their respective accuracy.

Notice,scikit-learnNo direct provisionSemi-naive BayesThe following example improves the Naive Bayesian model by manually calculating the correlation between features.

from sklearn.naive_bayes import GaussianNB
 from import accuracy_score
 from sklearn.linear_model import LinearRegression

 # Naive Bayes (assuming attributes are independent)
 nb = GaussianNB()
 (X_train, y_train)
 y_pred_nb = (X_test)
 acc_nb = accuracy_score(y_test, y_pred_nb)

 # Semi-naive Bayes (manual implementation, assuming X2 depends on X1)
 # Training phase: Estimate the parameters of each category
 def train_semi_naive_bayes(X, y):
     params = {}
     for cls in [0, 1]:
         X_cls = X[y == cls]
         X1_cls = X_cls[:, 0]
         X2_cls = X_cls[:, 1]

         # Estimate the parameters of P(X1|Y) (Gaussian distribution)
         mu_X1 = (X1_cls)
         sigma_X1 = (X1_cls)

         # Estimate the parameters of P(X2|Y,X1) (linear regression)
         lr = LinearRegression().fit(X1_cls.reshape(-1, 1), X2_cls)
         a, b = lr.coef_[0], lr.intercept_
         residuals = X2_cls - (X1_cls.reshape(-1, 1))
         sigma_X2_given_X1 = (residuals)

         params[cls] = {
             "prior": (y == cls) / len(y),
             "mu_X1": mu_X1,
             "sigma_X1": sigma_X1,
             "a": a,
             "b": b,
             "sigma_X2_given_X1": sigma_X2_given_X1,
         }
     return params


 # Prediction stage: Calculate logarithmic probability
 def predict_semi_naive_bayes(X, params):
     y_pred = []
     for x1, x2 in X:
         log_prob = {0: 0, 1: 0}
         for cls in [0, 1]:
             p = params[cls]
             # Calculate P(Y)
             log_prob[cls] += (p["prior"])
             # Calculate P(X1|Y)
             log_prob[cls] += -0.5 * (2 * * p["sigma_X1"] ** 2) - (
                 x1 - p["mu_X1"]
             ) ** 2 / (2 * p["sigma_X1"] ** 2)
             # Calculate P(X2|Y,X1)
             mu_x2 = p["a"] * x1 + p["b"]
             log_prob[cls] += -0.5 * (2 * * p["sigma_X2_given_X1"] ** 2) - (
                 x2 - mu_x2
             ) ** 2 / (2 * p["sigma_X2_given_X1"] ** 2)
         y_pred.append(0 if log_prob[0] > log_prob[1] else 1)
     return (y_pred)


 params = train_semi_naive_bayes(X_train, y_train)
 y_pred_semi = predict_semi_naive_bayes(X_test, params)
 ac_semi = accuracy_score(y_test, y_pred_semi)

 # Output result
 print(f"Natural Bayesian Accuracy: {acc_nb:.4f}")
 print(f"Semi Naive Bayesian Accuracy: {acc_semi:.4f}")

 ## Output result:
 '''
 Naive Bayesian Accuracy: 0.6333
 Semi-naive Bayesian Accuracy: 0.7000
 '''

Naive BayesBecause the attributes are independent,X1andX2Performance decreases slightly when there is a dependency.

andSemi-naive BayesBy explicit modelingX2rightX1The dependence of the joint probability is more accurately estimated, thereby obtaining higher accuracy.

This example simply shows the advantages of seminatural Bayes when there is a dependency in attributes.

4. Summary

Naive BayesandSemi-naive BayesThey are all classification algorithms based on Bayes theorem.

in,Naive BayesAssuming that features are independent of each other, it is suitable for scenarios with strong feature independence, such as:

Strong characteristic independence: When features are indeed independent of each other, Naive Bayes can play to its advantages. For example, in spam classification, vocabulary in emails can often be considered as independent characteristics.
Small amount of data: Because Naive Bayes has low computational complexity, it can quickly train the model with less data.
Not high requirements for classification accuracy: In some scenarios where classification accuracy is not high, Naive Bayes can be used as a fast and effective solution.

andSemi-naive BayesThis assumption has been relaxed to a certain extent and is applicable to scenarios where features are correlated, such as:

Features are related: When there is a certain correlation between features, seminatural Bayes can better capture these relationships, thereby improving classification performance. For example, in a medical diagnosis, there may be an association between certain symptoms.
High requirements for classification accuracy: In scenarios where high-precision classification is required, semi-natural Bayes can improve performance by considering the correlation between features.
Large amount of data: Semi-naive Bayes can better play to its advantages when there is enough data to estimate the correlation between features.