one-hot-encoding – Row Coding

One hot encoding of string categorical features

November 29, 2023 by Tarik

If you are on sklearn>0.20.dev0 In [11]: from sklearn.preprocessing import OneHotEncoder …: cat = OneHotEncoder() …: X = np.array([[‘a’, ‘b’, ‘a’, ‘c’], [0, 1, 0, 1]], dtype=object).T …: cat.fit_transform(X).toarray() …: Out[11]: array([[1., 0., 0., 1., 0.], [0., 1., 0., 0., 1.], [1., 0., 0., 1., 0.], [0., 0., 1., 0., 1.]]) If you are on … Read more

How to one hot encode several categorical variables in R

September 11, 2023 by Tarik

Adding dummy columns to the original dataframe

July 20, 2023 by Tarik

In [77]: df = pd.concat([df, pd.get_dummies(df[‘YEAR’])], axis=1); df Out[77]: JOINED_CO GENDER EXEC_FULLNAME GVKEY YEAR CONAME BECAMECEO \ 5622 NaN MALE Ira A. Eichner 1004 1992 AAR CORP 19550101 5622 NaN MALE Ira A. Eichner 1004 1993 AAR CORP 19550101 5622 NaN MALE Ira A. Eichner 1004 1994 AAR CORP 19550101 5622 NaN MALE Ira A. … Read more

Feature names from OneHotEncoder

June 8, 2023 by Tarik

A list with the original column names can be passed to get_feature_names. >>> encoder.get_feature_names([‘Sex’, ‘AgeGroup’]) array([‘Sex_female’, ‘Sex_male’, ‘AgeGroup_0’, ‘AgeGroup_15’, ‘AgeGroup_30’, ‘AgeGroup_45’, ‘AgeGroup_60’, ‘AgeGroup_75’], dtype=object) DEPRECATED: get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead. As per sklearn.preprocessing.OneHotEncoder. >>> encoder.get_feature_names_out([‘Sex’, ‘AgeGroup’]) array([‘Sex_female’, ‘Sex_male’, ‘AgeGroup_0’, ‘AgeGroup_15’, ‘AgeGroup_30’, ‘AgeGroup_45’, ‘AgeGroup_60’, ‘AgeGroup_75’], dtype=object)

One Hot Encoding using numpy [duplicate]

June 4, 2023 by Tarik

Usually, when you want to get a one-hot encoding for classification in machine learning, you have an array of indices. import numpy as np nb_classes = 6 targets = np.array([[2, 3, 4, 0]]).reshape(-1) one_hot_targets = np.eye(nb_classes)[targets] The one_hot_targets is now array([[[ 0., 0., 1., 0., 0., 0.], [ 0., 0., 0., 1., 0., 0.], [ … Read more

Running get_dummies on several DataFrame columns?

March 2, 2023 by Tarik

With pandas 0.19, you can do that in a single line : pd.get_dummies(data=df, columns=[‘A’, ‘B’]) Columns specifies where to do the One Hot Encoding. >>> df A B C 0 a c 1 1 b c 2 2 a b 3 >>> pd.get_dummies(data=df, columns=[‘A’, ‘B’]) C A_a A_b B_b B_c 0 1 1.0 0.0 0.0 … Read more

Can sklearn random forest directly handle categorical features?

February 18, 2023 by Tarik

No, there isn’t. Somebody’s working on this and the patch might be merged into mainline some day, but right now there’s no support for categorical variables in scikit-learn except dummy (one-hot) encoding.

How can I one hot encode in Python?

September 28, 2022 by Tarik

Approach 1: You can use pandas’ pd.get_dummies. Example 1: import pandas as pd s = pd.Series(list(‘abca’)) pd.get_dummies(s) Out[]: a b c 0 1.0 0.0 0.0 1 0.0 1.0 0.0 2 0.0 0.0 1.0 3 1.0 0.0 0.0 Example 2: The following will transform a given column into one hot. Use prefix to have multiple dummies. … Read more

Convert array of indices to one-hot encoded array in NumPy

September 16, 2022 by Tarik

Create a zeroed array b with enough columns, i.e. a.max() + 1. Then, for each row i, set the a[i]th column to 1. >>> a = np.array([1, 0, 3]) >>> b = np.zeros((a.size, a.max() + 1)) >>> b[np.arange(a.size), a] = 1 >>> b array([[ 0., 1., 0., 0.], [ 1., 0., 0., 0.], [ 0., … Read more