How 536 practitioners work, what they need, and where the ecosystem is heading — with actionable benchmarks for consulting teams.
This report presents findings from the scikit-learn Community Survey 2024–25, conducted among 536 scikit-learn users across 6 languages. Respondents are practitioners who actively use scikit-learn — ranging from individual researchers to engineers running models in production. This is a community survey, not a broad industry study: the findings reflect the scikit-learn user base specifically.
The survey covers four areas: library priorities, ML task usage, ecosystem tooling, and deployment patterns. For technology consultants advising clients who use scikit-learn in production, these results offer a useful reference point for what is common practice, what is lacking, and where friction tends to occur.
Published by Probabl, the company behind scikit-learn's continued development, and authored by François Goupil.
Respondents self-selected via scikit-learn community channels across 6 languages. This means findings are representative of active, engaged scikit-learn users — not the ML practitioner population broadly. Priority questions used a ranked scale (1–8). Multi-select questions reflect the share of respondents who chose each option. Cross-language responses were aligned to canonical English question keys before analysis. Where findings are used for client benchmarking, apply within scope: they describe scikit-learn teams specifically.
Performance is the clearest priority in this survey, with a strong concentration of top scores. For scikit-learn users, computational speed is a felt bottleneck — not an abstract wish. This is consistent with the growing dataset sizes and longer training times reported in the deployment section.
For teams working with scikit-learn clients, performance and new features rank far above documentation or website improvements. When assessing bottlenecks, start with compute and iteration speed before recommending new tooling or retraining workflows.
Classification and regression lead by a clear margin — these remain the bread-and-butter of scikit-learn usage. The near-parity between feature importances (68.0%) and uncertainty estimates (67.0%) as required features is notable: within this community, explainability is already a baseline expectation, not a premium requirement.
For clients using scikit-learn in regulated or decision-critical contexts, this data suggests their own engineering teams already expect explainability tooling. If that expectation isn't met by current infrastructure, it's worth surfacing — particularly as AI governance requirements tighten.
Within this community, the data layer is highly standardised: pandas at 95.4% is effectively universal among scikit-learn users. Polars at 33.4% is a notable second — its growth reflects the same performance pressure that topped the priorities list. On the modelling side, RandomForest (73.3%) and Pipeline (62.5%) confirm that the classic composable scikit-learn workflow is the norm in production, not a pattern people are moving away from.
| Count | Responses | Share |
|---|---|---|
| 1 model | 98 | 21.4% |
| 2 models | 82 | 17.9% |
| 3 models | 57 | 12.5% |
| 4 models | 24 | 5.3% |
| 5 models | 13 | 2.8% |
| More than 5 | 183 | 40.0% |
40% of respondents maintain more than 5 models in production. This is a community that has moved well past experimentation — model versioning, monitoring, and governance are live concerns, not future plans. Most training runs complete within an hour, suggesting manageable compute footprints that nonetheless require proper tracking and reproducibility tooling.
MLflow leads experiment tracking at 62.2%, and Airflow leads scheduling at 47.3%. Both are open-source. The 25.3% using custom scheduling tools points to significant internal engineering investment. For teams advising scikit-learn users, this context matters: these practitioners have already built their own tooling layer and have clear opinions about what works.
Respondents rated the importance of GPU capabilities within scikit-learn on a scale of 1 (not critical) to 5 (very critical).
52% of respondents have written custom estimators — meaning they've gone beyond using scikit-learn out of the box and built on top of its API. Combined with the near-universal endorsement of open-source ML as critical infrastructure, this reflects a community with deep, deliberate investment in the ecosystem. For teams advising organisations that use scikit-learn, custom estimator codebases often represent years of accumulated domain logic — worth understanding before any modernisation or migration work begins.
Derived from survey data on deployment scale, tooling adoption, and custom code usage. Use this to orient a client assessment conversation.
Survey-derived profiles mapped to typical consulting engagement types. Use as a starting point for scoping conversations.
| Signal from survey | Benchmark | What it often indicates | Typical engagement | Questions to ask |
|---|---|---|---|---|
| 5+ models in production | 40% of teams | Model portfolio without formal governance; monitoring gaps likely | MLOps maturity review; model governance framework | Who owns each model? How is degradation detected? |
| Custom estimator code in use | 52% of users | Accumulated domain logic embedded in scikit-learn API; undocumented debt | Technical debt audit; migration risk assessment; documentation sprint | Who wrote it? Is it tested? What breaks if you change it? |
| No feature importance or uncertainty tooling | 68–67% need it | Explainability gap; potential compliance risk in regulated sectors | Explainability assessment; governance readiness review | Do stakeholders ever ask why the model predicted X? |
| Training takes >1 hour or >1 day | 40% of teams | Compute bottleneck limiting iteration speed and experimentation velocity | Infrastructure review; hardware/cloud optimisation; profiling engagement | How many experiments can the team run per week? |
| Custom scheduling code (not Airflow/Kubeflow) | 25% of teams | Significant internal engineering investment; brittle pipelines; bus-factor risk | Pipeline standardisation; MLOps platform consolidation | What does the custom code do that standard tools don't? |
| Forecasting or anomaly detection as top-3 task | Ranked #3 & #4 | Likely using bespoke or poorly-maintained solutions; maintenance burden | Tooling assessment; bespoke-to-library migration; capability build | What library or code handles these tasks today? |
Across all languages and geographies, performance ranked as the top unmet need. When working with scikit-learn teams, bottlenecks in compute and iteration speed are likely to be the most felt pain — start there.
Feature importances and uncertainty estimates are needed by 68–67% of respondents. Within this community, these are standard requirements — not advanced features. Teams without them should know they're behind their peers.
Polars has reached 33.4% adoption among scikit-learn users — driven by the same performance concerns that top the priorities list. pandas remains dominant at 95.4%, but the two increasingly coexist in the same teams.
52% of respondents have extended scikit-learn with custom estimators. This code often encodes years of domain-specific logic. Before recommending any platform change, it's worth understanding how much of it exists and how well it's documented.
Ranked #3 and #4 in task priority, these are areas where scikit-learn's native support is limited. Teams with these use cases often rely on bespoke solutions — a common source of maintenance burden.
MLflow (62%) and Airflow (47%) are the clear standards for tracking and scheduling within this community. Proprietary alternatives face a well-informed, already-equipped audience — adoption arguments need to be specific and practical.
If you're advising a team that uses scikit-learn in production, Probabl can run a structured benchmarking session — mapping your client against the survey data across deployment scale, tooling maturity, explainability coverage, and custom code risk. It takes 90 minutes and produces a concrete gap analysis.
We are the team behind scikit-learn. We built the maturity model in this report from our own community data — and we can apply it directly to your client's stack.
Open-source Python library. Automated evaluation reports, cross-validation insights, and model comparison — built for scikit-learn, in one line of code.
Try Skore free →90 minutes. We map your client against this survey data and deliver a gap analysis. No commitment required — just a structured conversation grounded in real practitioner data.
Request a session →