category_encoders

A library of sklearn compatible categorical variable encoders

Python BSD-3-Clause scikit-learn-contrib
2,472 Stars
406 Forks
40 Issues

Languages

Python 99.3%

Recent Activity

Last 12 weeks

Top Contributors

Recent Releases

Project Info

Created
November 29, 2015
Last Updated
December 6, 2025
License
BSD-3-Clause
Default Branch
master

About This Project

Category Encoders is a scikit-learn-contrib library providing a comprehensive set of encoders for categorical variables. I authored this project but am no longer the day-to-day maintainer. It has grown into one of the most widely-used categorical encoding libraries in the Python ecosystem.

Features

  • 15+ Encoding Methods: OneHot, Target, Binary, Hashing, Leave-One-Out, and more
  • Scikit-learn Compatible: Full pipeline and transformer API support
  • Handles Missing Values: Built-in strategies for missing data
  • Feature Engineering: Advanced encodings for high-cardinality features

Installation

pip install category-encoders

Quick Start

import category_encoders as ce

# Target encoding
encoder = ce.TargetEncoder(cols=['category_column'])
X_encoded = encoder.fit_transform(X, y)

# Binary encoding for high-cardinality
encoder = ce.BinaryEncoder(cols=['high_card_column'])
X_encoded = encoder.fit_transform(X)

# Use in sklearn pipelines
from sklearn.pipeline import Pipeline
pipe = Pipeline([
    ('encoder', ce.OrdinalEncoder()),
    ('classifier', RandomForestClassifier())
])

Academic Citation

Category Encoders was published in the Journal of Open Source Software (JOSS) and has been cited in numerous academic papers.

Related Posts