Skip to contents

Introduction

The R package FactoMineR2 is designed for multivariate data analysis. It offers a wide array of statistical techniques for exploratory data analysis, including principal component analysis (PCA). FactoMineR2 is user-friendly and provides comprehensive tools for visualizing and interpreting complex data structures, making it a valuable resource for researchers and data analysts.

library(FactoMineR2)
X <- iris[,1:4]
head(X)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1          5.1         3.5          1.4         0.2
#> 2          4.9         3.0          1.4         0.2
#> 3          4.7         3.2          1.3         0.2
#> 4          4.6         3.1          1.5         0.2
#> 5          5.0         3.6          1.4         0.2
#> 6          5.4         3.9          1.7         0.4

Standardize the dataset

Before performing a PCA, it is important to standardize the dataset. This ensures that all variables are on the same scale, which is necessary for accurate results. In this example, the dataset is standardized by subtracting the mean and dividing by the standard deviation.

(XX)/σ (X - \bar{X}) / \sigma

X_scaled <- standardize_norm(X, center = TRUE, scale = TRUE)
head(X_scaled)
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,]   -0.8976739  1.01560199    -1.335752   -1.311052
#> [2,]   -1.1392005 -0.13153881    -1.335752   -1.311052
#> [3,]   -1.3807271  0.32731751    -1.392399   -1.311052
#> [4,]   -1.5014904  0.09788935    -1.279104   -1.311052
#> [5,]   -1.0184372  1.24503015    -1.335752   -1.311052
#> [6,]   -0.5353840  1.93331463    -1.165809   -1.048667

Compute eigenvalues

The next step is to compute the eigenvalues and eigenvectors of the covariance matrix. This is done using the get_eigen() function in R. The eigenvalues represent the variance explained by each principal component, while the eigenvectors represent the direction of the principal components in the original feature space.

eigs <- get_eigen(X_scaled)

Perform PCA

Finally, we can perform a principal component analysis (PCA) on the standardized dataset. This will reduce the dimensionality of the data and identify the principal components that explain the most variance.

# You can also use the `pca_ind_coords()` function.
ind_coords <- t(t(as.matrix(eigs[["U"]])) * sqrt(eigs[["values"]]))
head(ind_coords)
#>           [,1]       [,2]        [,3]         [,4]
#> [1,] -2.257141 -0.4784238  0.12727962  0.024087508
#> [2,] -2.074013  0.6718827  0.23382552  0.102662845
#> [3,] -2.356335  0.3407664 -0.04405390  0.028282305
#> [4,] -2.291707  0.5953999 -0.09098530 -0.065735340
#> [5,] -2.381863 -0.6446757 -0.01568565 -0.035802870
#> [6,] -2.068701 -1.4842053 -0.02687825  0.006586116