import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoderShort Notes: Encoding columns in a dataframe
While there are numerous existing blogs detailing OneHotEncoding, LabelEncoding, and other encoding techniques, this blog will specifically concentrate on efficiently encoding one or multiple columns of a dataframe in a single operation. This is achieved through the use of the ColumnTransformer API provided by scikit-learn.
Let’s begin 😀

Installing Libraries
Let’s create a dummy dataframe
We will create a dataframe name employee_df with columns field, salary, avg_years_of_exp, and gender_category. Column gender_category will have one of either Male/Female whichever has the highest proportion in that particular field.
employees_df = pd.DataFrame({
'field': ['Tech', 'Finance', 'HR', 'Marketing', 'Sales','BioTech'],
'salary': ['high', 'high', 'low', 'medium', 'medium', 'high'],
'avg_years_of_exp': [4, 6, 5, 8, 8, 10],
'gender_category': ['Male', 'Female', 'Female', 'Male', 'Male', 'Female'], # max(Male, Female) gender for each field
})field, andgender_categoryare non-ordinal categorical featuressalaryis an ordinal categorical featureavg_years_of_explooks like a categorical feature as well, but when considering the bigger picture, where we would have thousands of records, and maybe in floating point data types, will not be treated as a categorical feature. We can create a year_experice_range column containing different range of experience (For E.g., 0-3, 4-6, etc.) and treat that as a categorical feature. But we will ignore that for now.
Creating Ordinal Feature and OrdinalEncoder
Ordinal related to a column which can be thought of as a categorical one, but with a maintained sequencing or hierarchy. For instance, (1) Rank 1,2, or 3 ; (2) Salary as high, low, or medium; (3) height as tall, taller, tallest and so on.
ordinal_feature = ['salary']
ordinal_transformer = OrdinalEncoder()Creating Non Ordinal Feature and OneHotEncoder
non_ordinal_categorical_features = ['field', 'gender_category']
non_ordinal_categorical_transformer = OneHotEncoder(handle_unknown="ignore")Creating Column Transformer
We provide data for ordinal_transformer & non_ordinal_categorical_transformer
column_transformer = ColumnTransformer(transformers=[
('ordinal', ordinal_transformer, ordinal_feature),
('non_ordinal_category', non_ordinal_categorical_transformer, non_ordinal_categorical_features)],
remainder='drop')remainder='drop'will drop all the remaining columns which do not required to be transformed. If you want to keep the remaining columns as it is, you may provideremainder='passthrough
pd.DataFrame(column_transformer.fit_transform(employees_df))| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 |
| 1 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 2 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 3 | 2.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 4 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 |
| 5 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
As you can see, we are not really able to comprehend which column represents what value from the original dataframe. To compensate for it, we will just perform a couple of tweeks.
Creating the final Transformer with Columns intact and understandable
non_ordinal_categorical_transformer = OneHotEncoder(sparse_output=False, handle_unknown="ignore") # New code added
# Note: sparse_output=False is required to preserve column orders and provide a prefix for the columns.
column_transformer = ColumnTransformer(transformers=[
('ordinal', ordinal_transformer, ordinal_feature),
('non_ordinal_category', non_ordinal_categorical_transformer, non_ordinal_categorical_features)],
remainder='drop') # This remains same
column_transformer.set_output(transform='pandas') # New code addedColumnTransformer(transformers=[('ordinal', OrdinalEncoder(), ['salary']),
('non_ordinal_category',
OneHotEncoder(handle_unknown='ignore',
sparse_output=False),
['field', 'gender_category'])])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ColumnTransformer(transformers=[('ordinal', OrdinalEncoder(), ['salary']),
('non_ordinal_category',
OneHotEncoder(handle_unknown='ignore',
sparse_output=False),
['field', 'gender_category'])])['salary']
OrdinalEncoder()
['field', 'gender_category']
OneHotEncoder(handle_unknown='ignore', sparse_output=False)
df_pandas = column_transformer.fit_transform(employees_df)
df_pandas| ordinal__salary | non_ordinal_category__field_BioTech | non_ordinal_category__field_Finance | non_ordinal_category__field_HR | non_ordinal_category__field_Marketing | non_ordinal_category__field_Sales | non_ordinal_category__field_Tech | non_ordinal_category__gender_category_Female | non_ordinal_category__gender_category_Male | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 |
| 1 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 2 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 3 | 2.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 4 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 |
| 5 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |