pca – Row Coding

Plotting pca biplot with ggplot2

September 4, 2023 by Tarik

Python scikit learn pca.explained_variance_ratio_ cutoff

August 9, 2023 by Tarik

Yes, you are nearly right. The pca.explained_variance_ratio_ parameter returns a vector of the variance explained by each dimension. Thus pca.explained_variance_ratio_[i] gives the variance explained solely by the i+1st dimension. You probably want to do pca.explained_variance_ratio_.cumsum(). That will return a vector x such that x[i] returns the cumulative variance explained by the first i+1 dimensions. import … Read more

Selecting multiple odd or even columns/rows for dataframe

August 9, 2023 by Tarik

raise LinAlgError(“SVD did not converge”) LinAlgError: SVD did not converge in matplotlib pca determination

August 6, 2023 by Tarik

This can happen when there are inf or nan values in the data. Use this to remove nan values: ori_data.dropna(inplace=True)

Obtain eigen values and vectors from sklearn PCA

June 5, 2023 by Tarik

Your implementation You are computing the eigenvectors of the correlation matrix, that is the covariance matrix of the normalized variables. data/=np.std(data, axis=0) is not part of the classic PCA, we only center the variables. So the sklearn PCA does not feature scale the data beforehand. Apart from that you are on the right track, if … Read more

Principal components analysis using pandas dataframe

May 11, 2023 by Tarik

Most sklearn objects work with pandas dataframes just fine, would something like this work for you? import pandas as pd import numpy as np from sklearn.decomposition import PCA df = pd.DataFrame(data=np.random.normal(0, 1, (20, 10))) pca = PCA(n_components=5) pca.fit(df) You can access the components themselves with pca.components_

Feature/Variable importance after a PCA analysis

February 28, 2023 by Tarik

First of all, I assume that you call features the variables and not the samples/observations. In this case, you could do something like the following by creating a biplot function that shows everything in one plot. In this example, I am using the iris data. Before the example, please note that the basic idea when … Read more

Principal Component Analysis (PCA) in Python

February 15, 2023 by Tarik

I posted my answer even though another answer has already been accepted; the accepted answer relies on a deprecated function; additionally, this deprecated function is based on Singular Value Decomposition (SVD), which (although perfectly valid) is the much more memory- and processor-intensive of the two general techniques for calculating PCA. This is particularly relevant here … Read more

Recovering features names of explained_variance_ratio_ in PCA with sklearn

November 29, 2022 by Tarik

This information is included in the pca attribute: components_. As described in the documentation, pca.components_ outputs an array of [n_components, n_features], so to get how components are linearly related with the different features you have to: Note: each coefficient represents the correlation between a particular pair of component and feature import pandas as pd import … Read more

Principal component analysis in Python

November 11, 2022 by Tarik

Months later, here’s a small class PCA, and a picture: #!/usr/bin/env python “”” a small class for Principal Component Analysis Usage: p = PCA( A, fraction=0.90 ) In: A: an array of e.g. 1000 observations x 20 variables, 1000 rows x 20 columns fraction: use principal components that account for e.g. 90 % of the … Read more