:probabl. · by the scikit-learn founders

scikit-learn
Community
Survey
2024–25

How 536 practitioners work, what they need, and where the ecosystem is heading — with actionable benchmarks for consulting teams.

536 respondents

6 languages

35 questions

79%

OS is crucial

52%

Custom estimators

40%

5+ models in prod

Free Report

Get the Full Survey Results

Includes maturity model framework, client profile matcher, and consulting-ready benchmarks.

Full name

Please enter your name

Work email

Please enter a valid email

Company

Please enter your company

Role

No spam. Your info stays with :probabl. only.

What's inside

Project direction & priorities

ML task priorities & feature demand

Ecosystem tools & libraries

Production & deployment patterns

GPU criticality assessment

ML maturity model framework

Client profile matcher table

Six strategic takeaways

:probabl. · François Goupil

scikit-learn
Community
Survey
2024–25

How 536 scikit-learn practitioners work, what they need, and where the ecosystem is heading

536

Total Responses

Questions Asked

Topic Areas

2024–25

Survey Years

Introduction

About This Survey

This report presents findings from the scikit-learn Community Survey 2024–25, conducted among 536 scikit-learn users across 6 languages. Respondents are practitioners who actively use scikit-learn — ranging from individual researchers to engineers running models in production. This is a community survey, not a broad industry study: the findings reflect the scikit-learn user base specifically.

The survey covers four areas: library priorities, ML task usage, ecosystem tooling, and deployment patterns. For technology consultants advising clients who use scikit-learn in production, these results offer a useful reference point for what is common practice, what is lacking, and where friction tends to occur.

Published by Probabl, the company behind scikit-learn's continued development, and authored by François Goupil.

Survey scope

→

536 scikit-learn users

Active practitioners — from individual researchers to teams running models in production
→

6 languages

English, French, Spanish, Portuguese, Mandarin, Japanese
→

35 questions, 4 topic areas

Library priorities, ML task usage, ecosystem tooling, and deployment patterns

A note on methodology & scope

Respondents self-selected via scikit-learn community channels across 6 languages. This means findings are representative of active, engaged scikit-learn users — not the ML practitioner population broadly. Priority questions used a ranked scale (1–8). Multi-select questions reflect the share of respondents who chose each option. Cross-language responses were aligned to canonical English question keys before analysis. Where findings are used for client benchmarking, apply within scope: they describe scikit-learn teams specifically.

At a Glance

Key Findings

79%

Strongly agree open-source ML is crucial for AI transparency and reproducibility

95%

Use pandas as their primary DataFrame library (95.4%)

73%

Regularly use RandomForest — the most widely used estimator in the survey

52%

Have written their own scikit-learn estimator — a sign of deep library engagement

Project Direction

What the Community Wants Improved

Priority rankings — weighted avg., scale 1–8

Performance

Highest weighted average

New Features

Strong high-end skew

Technical Documentation

Consistently mid-high

Reliability

Solid across all scores

Educational Materials

High-score concentration

Packaging

Broadly distributed

Website Redesign

Lower overall priority

Key Takeaway

Performance is the clearest priority in this survey, with a strong concentration of top scores. For scikit-learn users, computational speed is a felt bottleneck — not an abstract wish. This is consistent with the growing dataset sizes and longer training times reported in the deployment section.

"Better support for larger datasets and GPU acceleration would be transformative for production workflows." — Survey respondent

"More algorithms for time series and anomaly detection, with the same clean API." — Survey respondent

What this means in practice

For teams working with scikit-learn clients, performance and new features rank far above documentation or website improvements. When assessing bottlenecks, start with compute and iteration speed before recommending new tooling or retraining workflows.

Technical Questions

ML Task Priorities & Feature Demand

ML Tasks by Priority — avg. rank, scale 1–7

Classification

Regression

Forecasting

Outlier / Anomaly

Dim. Reduction

Clustering

Important ML Features — % of 503 respondents

Feature importances

68.0%

Uncertainty estimates

67.0%

Prob. calibration

56.9%

Sample weights

40.2%

Regressor calibration

39.4%

Cost-sensitive

29.6%

Non-euclidean

17.9%

Metadata routing

11.5%

What this tells us

Classification and regression lead by a clear margin — these remain the bread-and-butter of scikit-learn usage. The near-parity between feature importances (68.0%) and uncertainty estimates (67.0%) as required features is notable: within this community, explainability is already a baseline expectation, not a premium requirement.

A note for consulting teams

For clients using scikit-learn in regulated or decision-critical contexts, this data suggests their own engineering teams already expect explainability tooling. If that expectation isn't met by current infrastructure, it's worth surfacing — particularly as AI governance requirements tighten.

Ecosystem

Tools & Libraries in Use

DataFrame Libraries — 527 respondents

pandas

95.4%

Polars

33.4%

Spark

20.5%

Dask

11.2%

DuckDB

9.7%

cudf

8.5%

Modin

2.1%

Most-Used Estimators — 520 respondents

RandomForest

73.3%

Pipeline

62.5%

LogisticRegression

58.1%

ColumnTransformer

41.0%

HistGradientBoosting

31.3%

Evaluation Visualizations — 524 respondents

Confusion Matrix

83.0%

Feature Importance

70.4%

ROC Curve

67.4%

Precision-Recall

56.3%

Learning Curves

48.7%

Residual Plots

38.2%

Reliability Diag.

12.4%

Analysis

Within this community, the data layer is highly standardised: pandas at 95.4% is effectively universal among scikit-learn users. Polars at 33.4% is a notable second — its growth reflects the same performance pressure that topped the priorities list. On the modelling side, RandomForest (73.3%) and Pipeline (62.5%) confirm that the classic composable scikit-learn workflow is the norm in production, not a pattern people are moving away from.

Deployment & Operations

Production Usage Patterns

Typical Model Training Duration — 516 respondents

< 10 seconds

8.6%

< 1 minute

16.6%

< 10 minutes

33.6%

< 1 hour

38.1%

< 1 day

33.3%

> 1 day

12.7%

Deployed Models Currently Maintained

Count	Responses	Share
1 model	98	21.4%
2 models	82	17.9%
3 models	57	12.5%
4 models	24	5.3%
5 models	13	2.8%
More than 5	183	40.0%

Experiment Tracking Tools — 360 respondents

MLflow

62.2%

Weights & Biases

29.2%

Custom tool

23.1%

DVC

10.8%

Neptune

3.1%

Scheduling Tools — 277 respondents

Airflow

47.3%

Custom tool

25.3%

Kubeflow

20.9%

Argo

8.3%

Dagster

5.1%

Metaflow

4.7%

A production-grade community

40% of respondents maintain more than 5 models in production. This is a community that has moved well past experimentation — model versioning, monitoring, and governance are live concerns, not future plans. Most training runs complete within an hour, suggesting manageable compute footprints that nonetheless require proper tracking and reproducibility tooling.

Open-source MLOps is the norm

MLflow leads experiment tracking at 62.2%, and Airflow leads scheduling at 47.3%. Both are open-source. The 25.3% using custom scheduling tools points to significant internal engineering investment. For teams advising scikit-learn users, this context matters: these practitioners have already built their own tooling layer and have clear opinions about what works.

Deployment · GPU

How Critical is GPU Support?

Respondents rated the importance of GPU capabilities within scikit-learn on a scale of 1 (not critical) to 5 (very critical).

Rating distribution — 518 respondents

1 — Low

13.9%

19.3%

3 — Mid

23.2%

23.4%

5 — High

20.3%

2.97

Average rating

Slightly above the midpoint — GPU support is a real concern for a meaningful share of users

43.7%

Rate it 4 or 5

Nearly half of respondents consider GPU capabilities highly or very critical

33.2%

Rate it 1 or 2

A third of users work at scales where CPU is sufficient — reflecting a heterogeneous user base

Community Values

On Open Source & Reproducibility

"Open-source ML & AI is crucial for AI transparency" — 516 respondents

Strongly agree — 79.1%

Agree — 20.7%

Neither — 2.5%

Disagree / SD — <1%

Custom Estimator Authorship — 507 respondents

Yes — 52.3%

No — 47.7%

What this signals

52% of respondents have written custom estimators — meaning they've gone beyond using scikit-learn out of the box and built on top of its API. Combined with the near-universal endorsement of open-source ML as critical infrastructure, this reflects a community with deep, deliberate investment in the ecosystem. For teams advising organisations that use scikit-learn, custom estimator codebases often represent years of accumulated domain logic — worth understanding before any modernisation or migration work begins.

Diagnostic Framework

ML Maturity Model for scikit-learn Teams

Derived from survey data on deployment scale, tooling adoption, and custom code usage. Use this to orient a client assessment conversation.

Level 1

Experimenting

Models trained and evaluated in notebooks; no formal deployment
pandas only; no MLflow or experiment tracking
Training completes in seconds or minutes
No custom estimators; using scikit-learn out of the box
1–2 models in use, none formally "deployed"

~21% of survey respondents maintain only 1 model

Engagement type: capability assessment, training

Level 2

Productionising

Models deployed but manually updated; limited monitoring
MLflow in use for tracking; Airflow or cron for scheduling
Training runs take minutes to an hour
Some custom estimators written; Pipeline and ColumnTransformer in use
2–5 models maintained; informal review process

~36% of respondents maintain 2–5 models

Engagement type: MLOps review, pipeline standardisation

Level 3

Operating at Scale

5+ models in production; automated retraining pipelines
MLflow + Airflow/Kubeflow stack; custom scheduling tooling
Training may take hours to days; GPU demand relevant
Significant custom estimator codebase; internal ML platform
Formal model governance and performance monitoring in place

40% of respondents maintain more than 5 models

Engagement type: governance, performance, scale architecture

Engagement Signals

Client Profile Matcher

Survey-derived profiles mapped to typical consulting engagement types. Use as a starting point for scoping conversations.

Signal from survey	Benchmark	What it often indicates	Typical engagement	Questions to ask
5+ models in production	40% of teams	Model portfolio without formal governance; monitoring gaps likely	MLOps maturity review; model governance framework	Who owns each model? How is degradation detected?
Custom estimator code in use	52% of users	Accumulated domain logic embedded in scikit-learn API; undocumented debt	Technical debt audit; migration risk assessment; documentation sprint	Who wrote it? Is it tested? What breaks if you change it?
No feature importance or uncertainty tooling	68–67% need it	Explainability gap; potential compliance risk in regulated sectors	Explainability assessment; governance readiness review	Do stakeholders ever ask why the model predicted X?
Training takes >1 hour or >1 day	40% of teams	Compute bottleneck limiting iteration speed and experimentation velocity	Infrastructure review; hardware/cloud optimisation; profiling engagement	How many experiments can the team run per week?
Custom scheduling code (not Airflow/Kubeflow)	25% of teams	Significant internal engineering investment; brittle pipelines; bus-factor risk	Pipeline standardisation; MLOps platform consolidation	What does the custom code do that standard tools don't?
Forecasting or anomaly detection as top-3 task	Ranked #3 & #4	Likely using bespoke or poorly-maintained solutions; maintenance burden	Tooling assessment; bespoke-to-library migration; capability build	What library or code handles these tasks today?

Summary

Six Takeaways for Teams Working with scikit-learn Users

⚡

Performance is the #1 Priority

Across all languages and geographies, performance ranked as the top unmet need. When working with scikit-learn teams, bottlenecks in compute and iteration speed are likely to be the most felt pain — start there.

🔍

Explainability is Already Expected

Feature importances and uncertainty estimates are needed by 68–67% of respondents. Within this community, these are standard requirements — not advanced features. Teams without them should know they're behind their peers.

📦

Polars is Growing Fast

Polars has reached 33.4% adoption among scikit-learn users — driven by the same performance concerns that top the priorities list. pandas remains dominant at 95.4%, but the two increasingly coexist in the same teams.

🛠

Custom Estimator Code is Widespread

52% of respondents have extended scikit-learn with custom estimators. This code often encodes years of domain-specific logic. Before recommending any platform change, it's worth understanding how much of it exists and how well it's documented.

📈

Forecasting & Anomaly Detection Are Underserved

Ranked #3 and #4 in task priority, these are areas where scikit-learn's native support is limited. Teams with these use cases often rely on bespoke solutions — a common source of maintenance burden.

🔗

Open MLOps Tooling Dominates

MLflow (62%) and Airflow (47%) are the clear standards for tracking and scheduling within this community. Proprietary alternatives face a well-informed, already-equipped audience — adoption arguments need to be specific and practical.

From this report to action

Does your team look like this survey?

If you're advising a team that uses scikit-learn in production, Probabl can run a structured benchmarking session — mapping your client against the survey data across deployment scale, tooling maturity, explainability coverage, and custom code risk. It takes 90 minutes and produces a concrete gap analysis.

We are the team behind scikit-learn. We built the maturity model in this report from our own community data — and we can apply it directly to your client's stack.

Try Skore

Open-source Python library. Automated evaluation reports, cross-validation insights, and model comparison — built for scikit-learn, in one line of code.

Try Skore free →

Book a benchmarking session

90 minutes. We map your client against this survey data and deliver a gap analysis. No commitment required — just a structured conversation grounded in real practitioner data.

Request a session →

scikit-learnCommunitySurvey2024–25

scikit-learnCommunitySurvey2024–25

Does your team look like this survey?

scikit-learn
Community
Survey
2024–25

scikit-learn
Community
Survey
2024–25