Algorithm

Logistic Regression

Description

A generalized linear model used for binary classification. It estimates the parameters of a logistic model using iterative optimization. Our implementation uses a raw C++ full-batch Gradient Descent optimizer with an inlined sigmoid activation function for maximum throughput. It handles large datasets by processing the entire batch in memory (or chunks if extended) and updates weights based on the gradient of the Log-Likelihood.

$$ J(\theta) = -\frac{1}{m} \sum_{i=1}^m [y^{(i)}\log(h_\theta(x^{(i)})) + (1-y^{(i)})\log(1-h_\theta(x^{(i)}))] $$

Algorithm Workflow

START Initialize weight vector $\mathbf{w}$ and bias $b$ to zeros.

EPOCH LOOP Iterate for a fixed set of `epochs`.

LINEAR Compute linear response $z_i = \mathbf{w}^T \mathbf{x}_i + b$ for all samples.

ACTIVATE Apply Sigmoid function $p_i = \frac{1}{1 + e^{-z_i}}$.

ERROR Compute prediction error $e_i = p_i - y_i$.

GRADIENT Calculate $\nabla w = X^T (\hat{y} - y)$ accumulating over all samples.

UPDATE Apply update rule $\mathbf{w} \leftarrow \mathbf{w} - \eta \nabla w$ (where $\eta$ is learning rate).

CONVERGE Repeat until max epochs.

PREDICT Return 1 if $p_i > 0.5$ else 0.

Implementation Details

Implemented in `LogisticRegression.cpp`.

for(int e=0; e<epochs; e++){
    double z = dot_product(row, coef) + intercept;
    double p = 1.0 / (1.0 + exp(-z));
    // Gradient update
    coef += learning_rate * (y - p) * row;
}

Complexity & Optimization

Time Complexity

O(Epochs * N * P).

Space Complexity

O(P).

Optimizations

Sigmoid inline.

Limitations

Linear boundary.

Use Cases

Binary classification.