:probabl. · by the scikit-learn founders

scikit-learn
Community
Survey
2024–25

How 536 practitioners work, what they need, and where the ecosystem is heading — with actionable benchmarks for consulting teams.

536 respondents
6 languages
35 questions
79%
OS is crucial
52%
Custom estimators
40%
5+ models in prod
Free Report
Get the Full Survey Results
Includes maturity model framework, client profile matcher, and consulting-ready benchmarks.
Please enter your name
Please enter a valid email
Please enter your company
No spam. Your info stays with :probabl. only.
What's inside
Project direction & priorities
ML task priorities & feature demand
Ecosystem tools & libraries
Production & deployment patterns
GPU criticality assessment
ML maturity model framework
Client profile matcher table
Six strategic takeaways
:probabl. · François Goupil

scikit-learn
Community
Survey
2024–25

How 536 scikit-learn practitioners work, what they need, and where the ecosystem is heading
536
Total Responses
35
Questions Asked
4
Topic Areas
2024–25
Survey Years
About This Survey

This report presents findings from the scikit-learn Community Survey 2024–25, conducted among 536 scikit-learn users across 6 languages. Respondents are practitioners who actively use scikit-learn — ranging from individual researchers to engineers running models in production. This is a community survey, not a broad industry study: the findings reflect the scikit-learn user base specifically.

The survey covers four areas: library priorities, ML task usage, ecosystem tooling, and deployment patterns. For technology consultants advising clients who use scikit-learn in production, these results offer a useful reference point for what is common practice, what is lacking, and where friction tends to occur.

Published by Probabl, the company behind scikit-learn's continued development, and authored by François Goupil.

Survey scope
  • 536 scikit-learn users
    Active practitioners — from individual researchers to teams running models in production
  • 6 languages
    English, French, Spanish, Portuguese, Mandarin, Japanese
  • 35 questions, 4 topic areas
    Library priorities, ML task usage, ecosystem tooling, and deployment patterns
A note on methodology & scope

Respondents self-selected via scikit-learn community channels across 6 languages. This means findings are representative of active, engaged scikit-learn users — not the ML practitioner population broadly. Priority questions used a ranked scale (1–8). Multi-select questions reflect the share of respondents who chose each option. Cross-language responses were aligned to canonical English question keys before analysis. Where findings are used for client benchmarking, apply within scope: they describe scikit-learn teams specifically.

Key Findings
79%
Strongly agree open-source ML is crucial for AI transparency and reproducibility
95%
Use pandas as their primary DataFrame library (95.4%)
73%
Regularly use RandomForest — the most widely used estimator in the survey
52%
Have written their own scikit-learn estimator — a sign of deep library engagement
What the Community Wants Improved
Priority rankings — weighted avg., scale 1–8
1
Performance
Highest weighted average
2
New Features
Strong high-end skew
3
Technical Documentation
Consistently mid-high
4
Reliability
Solid across all scores
5
Educational Materials
High-score concentration
6
Packaging
Broadly distributed
7
Website Redesign
Lower overall priority
Key Takeaway

Performance is the clearest priority in this survey, with a strong concentration of top scores. For scikit-learn users, computational speed is a felt bottleneck — not an abstract wish. This is consistent with the growing dataset sizes and longer training times reported in the deployment section.

"Better support for larger datasets and GPU acceleration would be transformative for production workflows." — Survey respondent
"More algorithms for time series and anomaly detection, with the same clean API." — Survey respondent
What this means in practice

For teams working with scikit-learn clients, performance and new features rank far above documentation or website improvements. When assessing bottlenecks, start with compute and iteration speed before recommending new tooling or retraining workflows.

Questions to ask your client
  • How long does a typical model training run take today — and how often does that block iteration?
  • Which features or algorithms are missing that would change what you can deliver?
  • What's your current process for testing a new scikit-learn version before upgrading in production?
ML Task Priorities & Feature Demand
ML Tasks by Priority — avg. rank, scale 1–7
Classification
#1
Regression
#2
Forecasting
#3
Outlier / Anomaly
#4
Dim. Reduction
#5
Clustering
#6
Important ML Features — % of 503 respondents
Feature importances
68.0%
Uncertainty estimates
67.0%
Prob. calibration
56.9%
Sample weights
40.2%
Regressor calibration
39.4%
Cost-sensitive
29.6%
Non-euclidean
17.9%
Metadata routing
11.5%
What this tells us

Classification and regression lead by a clear margin — these remain the bread-and-butter of scikit-learn usage. The near-parity between feature importances (68.0%) and uncertainty estimates (67.0%) as required features is notable: within this community, explainability is already a baseline expectation, not a premium requirement.

A note for consulting teams

For clients using scikit-learn in regulated or decision-critical contexts, this data suggests their own engineering teams already expect explainability tooling. If that expectation isn't met by current infrastructure, it's worth surfacing — particularly as AI governance requirements tighten.

Questions to ask your client
  • Can your models currently produce feature importances or confidence intervals on demand?
  • Do you have forecasting or anomaly detection workloads that are handled outside scikit-learn? Why?
  • Who in the business consumes model outputs — and do they ever ask "why did the model do that?"
Tools & Libraries in Use
DataFrame Libraries — 527 respondents
pandas
95.4%
Polars
33.4%
Spark
20.5%
Dask
11.2%
DuckDB
9.7%
cudf
8.5%
Modin
2.1%
Most-Used Estimators — 520 respondents
RandomForest
73.3%
Pipeline
62.5%
LogisticRegression
58.1%
ColumnTransformer
41.0%
HistGradientBoosting
31.3%
Evaluation Visualizations — 524 respondents
Confusion Matrix
83.0%
Feature Importance
70.4%
ROC Curve
67.4%
Precision-Recall
56.3%
Learning Curves
48.7%
Residual Plots
38.2%
Reliability Diag.
12.4%
Analysis

Within this community, the data layer is highly standardised: pandas at 95.4% is effectively universal among scikit-learn users. Polars at 33.4% is a notable second — its growth reflects the same performance pressure that topped the priorities list. On the modelling side, RandomForest (73.3%) and Pipeline (62.5%) confirm that the classic composable scikit-learn workflow is the norm in production, not a pattern people are moving away from.

Questions to ask your client
  • Are there any data processing steps that currently don't fit into a scikit-learn Pipeline? What are they?
  • Has anyone on the team started using Polars? If so, where does that create friction with existing tooling?
  • How are your ML and data engineering teams currently sharing preprocessing code?
Production Usage Patterns
Typical Model Training Duration — 516 respondents
< 10 seconds
8.6%
< 1 minute
16.6%
< 10 minutes
33.6%
< 1 hour
38.1%
< 1 day
33.3%
> 1 day
12.7%
Deployed Models Currently Maintained
CountResponsesShare
1 model9821.4%
2 models8217.9%
3 models5712.5%
4 models245.3%
5 models132.8%
More than 518340.0%
Experiment Tracking Tools — 360 respondents
MLflow
62.2%
Weights & Biases
29.2%
Custom tool
23.1%
DVC
10.8%
Neptune
3.1%
Scheduling Tools — 277 respondents
Airflow
47.3%
Custom tool
25.3%
Kubeflow
20.9%
Argo
8.3%
Dagster
5.1%
Metaflow
4.7%
A production-grade community

40% of respondents maintain more than 5 models in production. This is a community that has moved well past experimentation — model versioning, monitoring, and governance are live concerns, not future plans. Most training runs complete within an hour, suggesting manageable compute footprints that nonetheless require proper tracking and reproducibility tooling.

Open-source MLOps is the norm

MLflow leads experiment tracking at 62.2%, and Airflow leads scheduling at 47.3%. Both are open-source. The 25.3% using custom scheduling tools points to significant internal engineering investment. For teams advising scikit-learn users, this context matters: these practitioners have already built their own tooling layer and have clear opinions about what works.

Questions to ask your client
  • How do you currently know when a deployed model's performance has degraded?
  • If you had to retrain all your models tomorrow, how long would it take and who would do it?
  • What does your custom scheduling/tracking code do that MLflow or Airflow doesn't?
How Critical is GPU Support?

Respondents rated the importance of GPU capabilities within scikit-learn on a scale of 1 (not critical) to 5 (very critical).

Rating distribution — 518 respondents
1 — Low
13.9%
2
19.3%
3 — Mid
23.2%
4
23.4%
5 — High
20.3%
2.97
Average rating
Slightly above the midpoint — GPU support is a real concern for a meaningful share of users
43.7%
Rate it 4 or 5
Nearly half of respondents consider GPU capabilities highly or very critical
33.2%
Rate it 1 or 2
A third of users work at scales where CPU is sufficient — reflecting a heterogeneous user base
On Open Source & Reproducibility
"Open-source ML & AI is crucial for AI transparency" — 516 respondents
79% strongly agree
Strongly agree — 79.1%
Agree — 20.7%
Neither — 2.5%
Disagree / SD — <1%
Custom Estimator Authorship — 507 respondents
52.3% wrote own estimator
Yes — 52.3%
No — 47.7%
What this signals

52% of respondents have written custom estimators — meaning they've gone beyond using scikit-learn out of the box and built on top of its API. Combined with the near-universal endorsement of open-source ML as critical infrastructure, this reflects a community with deep, deliberate investment in the ecosystem. For teams advising organisations that use scikit-learn, custom estimator codebases often represent years of accumulated domain logic — worth understanding before any modernisation or migration work begins.

ML Maturity Model for scikit-learn Teams

Derived from survey data on deployment scale, tooling adoption, and custom code usage. Use this to orient a client assessment conversation.

Level 1
Experimenting
  • Models trained and evaluated in notebooks; no formal deployment
  • pandas only; no MLflow or experiment tracking
  • Training completes in seconds or minutes
  • No custom estimators; using scikit-learn out of the box
  • 1–2 models in use, none formally "deployed"
~21% of survey respondents maintain only 1 model
Engagement type: capability assessment, training
Level 2
Productionising
  • Models deployed but manually updated; limited monitoring
  • MLflow in use for tracking; Airflow or cron for scheduling
  • Training runs take minutes to an hour
  • Some custom estimators written; Pipeline and ColumnTransformer in use
  • 2–5 models maintained; informal review process
~36% of respondents maintain 2–5 models
Engagement type: MLOps review, pipeline standardisation
Level 3
Operating at Scale
  • 5+ models in production; automated retraining pipelines
  • MLflow + Airflow/Kubeflow stack; custom scheduling tooling
  • Training may take hours to days; GPU demand relevant
  • Significant custom estimator codebase; internal ML platform
  • Formal model governance and performance monitoring in place
40% of respondents maintain more than 5 models
Engagement type: governance, performance, scale architecture
Questions to locate a client on this model
  • How many scikit-learn models are currently running in production — and how are they versioned?
  • What triggers a model retrain — schedule, drift detection, or manual decision?
  • Do you have custom estimator code? Is it tested, documented, and understood by more than one person?
  • Who owns model performance post-deployment — the data science team, engineering, or no-one clearly?
Client Profile Matcher

Survey-derived profiles mapped to typical consulting engagement types. Use as a starting point for scoping conversations.

Signal from survey Benchmark What it often indicates Typical engagement Questions to ask
5+ models in production40% of teamsModel portfolio without formal governance; monitoring gaps likelyMLOps maturity review; model governance frameworkWho owns each model? How is degradation detected?
Custom estimator code in use52% of usersAccumulated domain logic embedded in scikit-learn API; undocumented debtTechnical debt audit; migration risk assessment; documentation sprintWho wrote it? Is it tested? What breaks if you change it?
No feature importance or uncertainty tooling68–67% need itExplainability gap; potential compliance risk in regulated sectorsExplainability assessment; governance readiness reviewDo stakeholders ever ask why the model predicted X?
Training takes >1 hour or >1 day40% of teamsCompute bottleneck limiting iteration speed and experimentation velocityInfrastructure review; hardware/cloud optimisation; profiling engagementHow many experiments can the team run per week?
Custom scheduling code (not Airflow/Kubeflow)25% of teamsSignificant internal engineering investment; brittle pipelines; bus-factor riskPipeline standardisation; MLOps platform consolidationWhat does the custom code do that standard tools don't?
Forecasting or anomaly detection as top-3 taskRanked #3 & #4Likely using bespoke or poorly-maintained solutions; maintenance burdenTooling assessment; bespoke-to-library migration; capability buildWhat library or code handles these tasks today?
Six Takeaways for Teams Working with scikit-learn Users
Performance is the #1 Priority

Across all languages and geographies, performance ranked as the top unmet need. When working with scikit-learn teams, bottlenecks in compute and iteration speed are likely to be the most felt pain — start there.

🔍
Explainability is Already Expected

Feature importances and uncertainty estimates are needed by 68–67% of respondents. Within this community, these are standard requirements — not advanced features. Teams without them should know they're behind their peers.

📦
Polars is Growing Fast

Polars has reached 33.4% adoption among scikit-learn users — driven by the same performance concerns that top the priorities list. pandas remains dominant at 95.4%, but the two increasingly coexist in the same teams.

🛠
Custom Estimator Code is Widespread

52% of respondents have extended scikit-learn with custom estimators. This code often encodes years of domain-specific logic. Before recommending any platform change, it's worth understanding how much of it exists and how well it's documented.

📈
Forecasting & Anomaly Detection Are Underserved

Ranked #3 and #4 in task priority, these are areas where scikit-learn's native support is limited. Teams with these use cases often rely on bespoke solutions — a common source of maintenance burden.

🔗
Open MLOps Tooling Dominates

MLflow (62%) and Airflow (47%) are the clear standards for tracking and scheduling within this community. Proprietary alternatives face a well-informed, already-equipped audience — adoption arguments need to be specific and practical.

From this report to action

Does your team look like this survey?

If you're advising a team that uses scikit-learn in production, Probabl can run a structured benchmarking session — mapping your client against the survey data across deployment scale, tooling maturity, explainability coverage, and custom code risk. It takes 90 minutes and produces a concrete gap analysis.

We are the team behind scikit-learn. We built the maturity model in this report from our own community data — and we can apply it directly to your client's stack.

Try Skore

Open-source Python library. Automated evaluation reports, cross-validation insights, and model comparison — built for scikit-learn, in one line of code.

Try Skore free →
Book a benchmarking session

90 minutes. We map your client against this survey data and deliver a gap analysis. No commitment required — just a structured conversation grounded in real practitioner data.

Request a session →