probabl-ai/skills — methodology for agents

For AI coding agents

Methodology for agents, by the scikit-learn maintainers.

probabl-ai/skills is a collection of thirteen skills that bring scikit-learn, skrub, and Skore methodology into any agentic coding tool. The agent not only writes a pipeline, but it also builds, evaluates, and iterates with the rigor your team would otherwise enforce by hand.

Browse the skills on GitHub →See the skills

By the scikit-learn core maintainersOpen source · BSD-3-ClauseAny agent

Install the skill pack

npx skills add github.com/probabl-ai/skills

The skill pack

Thirteen skills, organized the way the best data science teams already work, improved daily.

01 / ML pipeline lifecycle

From data source to an evaluated, tested, audited learner.

The five skills that bound a single experiment end to end: declare it, evaluate it, prove the score is real, then read the report back — no leaky shortcuts.

build-ml-pipeline→

Declare the pipeline from data source to predictor as a skrub DataOps graph. Stops at the declared object — no fit, split, tuning, or persistence.

evaluate-ml-pipeline→

Evaluate a single sklearn-compatible learner: pick the right entry point (skore.evaluate first), the right cross-validator, and consume report metadata.

test-ml-pipeline→

Router that owns the tests/ folder of an ML workspace and the experiment ↔ test pairing rule. Dispatches to a per-category subskill.

smoke-test-ml-pipeline→

Diagnostic-by-construction pytest that catches the "load → featurize → split" anti-pattern by predicting on a disjoint, no-buffer slice of the real data source.

audit-ml-pipeline→

Owns the audit/ folder: one # %% file per experiment that loads its skore report read-only and streams a markdown digest. Read-only — never calls evaluate or put.

02 / Iteration loop

Source the next experiment, two ways.

A driver skill that owns journal/JOURNAL.md, plus two sourcing strategies. Pick where the next idea comes from — the audit digest, or the user.

iterate-ml-experiment→

Drives the iteration loop on top of an ML workspace — owns journal/JOURNAL.md and per-experiment design notes, and dispatches to a sourcing strategy below.

iterate-from-skore→

Source the next experiment by reading the audit digest at scratch/audit/<stem>/audit.md — every issue / tip row drives a backlog item, following the row’s documentation link for the mitigation.

iterate-from-user→

Source the next experiment from the user directly — free text, a scientific article URL, or a resource link (GitHub issue, spec, or reference repo).

03 / Workspace & tooling

Keep the repo clean and organized while the agent runs.

Where files live, how they're styled, which env manager is used, and the curated stack the maintainers actually reach for. The boring discipline that makes the rest possible.

organize-ml-workspace→

Decide where files live: reusable code, per-experiment scripts (jupytext-style # %%), reports. One file per experiment.

python-code-style→

Place the project's ruff.toml template and run ruff (lint + format) on touched files. numpydoc for docstrings.

python-env-manager→

Detect the project's env manager (pixi / uv / poetry / hatch / conda / pip+venv) and issue the right install command. Defaults to pixi when bootstrapping.

data-science-python-stack→

Opinionated one-library-per-job Python stack, organized into mandatory / user-choice / optional / transitive tiers.

04 / API references

Any library, indexed on demand.

One skill that discovers the public API of any installed package — so the agent reads real signatures instead of hallucinating them.

python-api→

Discover the public API of any installed Python package — inspect.signature + pydoc for a symbol, dir / pkgutil for a module, versioned-docs search with cache for the narrative. Carries conceptual orientation for sklearn / skrub / skore.

See it in practice

Install the pack, then prompt the way you already do.

The skills are plain markdown files. The agent reads them as part of its session and reaches for the right one when the task fits. You don't have to invoke them by name.

01
Install the pack
One command into your agent's skills directory: npx skills add github.com/probabl-ai/skills. BSD-3-Clause — fork it if you want.
02
Prompt the workflow, not the tool
"Build a churn model on this CSV and tell me what's weakest." The agent routes through the right skills.
03
Get a Skore report, not a notebook
Structured evaluation, fold-level diagnostics, per-slice metrics. The same report object whether you ran it or the agent did.
04
Iterate, audit, ship
Use the iterate-from-* skills to source the next experiment. Sync to Skore Hub or MLflow when it's ready for production.

agent_session.py

# Skill: build-ml-pipeline
import skrub
from sklearn.linear_model import Ridge
model = skrub.tabular_pipeline(Ridge())

# Skill: evaluate-ml-pipeline
import skore
report = skore.evaluate(model, df, y, splitter=5)
report

# Skill: iterate-from-diagnostic
# The agent inspects the report:
#   → calibration drifts on the over-65 slice
#   → next experiment: per-slice isotonic calibration

# Skill: skore-api → push when you're ready
project = skore.Project("churn", mode="hub")
project.put("baseline", report)

Why we built this

Agentic AI is fast. Methodology is what makes it trustworthy.

AI assistants ship scikit-learn pipelines in seconds - downloads doubled from 100M to 200M monthly in nine months. The bottleneck isn't compute, it's the absence of shared standards. Skills are how we put the maintainers' methodology in the loop, by default, every time the agent writes code.

Pitfall warnings before training

Task-appropriate metrics

Fold-level CV diagnostics

Reports the team can review

Hands-on support

Want this wired into your team's workflow?

Probabl runs Forward Deployed Engineering engagements for teams putting agentic ML on rails. We'll audit your pipeline, integrate the skill pack alongside your existing stack, and pair with your team on the first reports that go to production.

Talk to the team →Read the FDE brief

Methodology for agents, by the scikit-learn maintainers.

Thirteen skills, organized the way the best data science teams already work, improved daily.

From data source to an evaluated, tested, audited learner.

Source the next experiment, two ways.

Keep the repo clean and organized while the agent runs.

Any library, indexed on demand.

Install the pack, then prompt the way you already do.

Install the pack

Prompt the workflow, not the tool

Get a Skore report, not a notebook

Iterate, audit, ship

Agentic AI is fast. Methodology is what makes it trustworthy.

Want this wired into your team's workflow?