Algorithm

Principal Component Analysis

Description

A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. Our implementation focuses on numerical accuracy using the Covariance method and LAPACK's symmetric eigenvalue solver.

$$ C = \frac{1}{n-1}X^T X, \quad C = V \Lambda V^T $$

Algorithm Workflow

START Input data matrix $X$.
CENTER Subtract the mean of each column from the data.
COVARIANCE Compute the covariance matrix $C = \frac{1}{N-1} X^T X$.
EIGEN Solving the eigenvalue problem $C v = \lambda v$ using LAPACK `dsyev`.
SORT Order eigenvalues $\lambda$ descending and reorder eigenvectors $v$.
SELECT Choose the top $k$ eigenvectors corresponding to the largest eigenvalues.
TRANSFORM Project $X$ onto the new subspace: $X_{new} = X \cdot V_k$.
END Return the transformed data and components.

Implementation Details

Implemented in `PCA.cpp` using LAPACK `dsyev` (Symmetric Eigen Value solver). This is numerically robust. The Covariance matrix computation is manual C++ loops.

// Compute Covariance
// ...
// Eigen decomposition
F77_CALL(dsyev)("V", "U", &p, cov.data(), ...);
// Select top k vectors

Complexity & Optimization

Time Complexity

O(P^3).

Space Complexity

O(P^2).

Optimizations

LAPACK dsyev.

Limitations

Linear.

Use Cases

Dimensionality reduction.