Skore × TabICL

A tabular foundation model,
pretrained. No tuning.
With the evidence to ship it to production.

TabICL is the open tabular foundation model from soda-inria, pretrained on millions of synthetic tables and competitive with heavily tuned XGBoost, CatBoost, and LightGBM out of the box. Skore turns its predictions into all-inclusive reports, calibration plots, drift monitors, and audit-ready model cards. One open stack, end to end.

Try TabICL Read Gaël’s primer

By the scikit-learn founders TabICL by soda-inria, ICML 2025 Open source, MIT license Local-first, Skore Hub optional

01The thesis

Tabular foundation models are not replacing classical ML.

TabICL is the natural endpoint of a twenty-year program. Scikit-learn democratized tabular ML with fit and predict. skrub absorbs the mess of real-world data (strings, dates, high-cardinality categoricals). TabICL extends the same API surface with a pretrained transformer: millions of synthetic tables become priors, predictions arrive in a single forward pass, no per-task tuning required.

TabICL gives you a pretrained model. Skore gives you the evidence to ship it to production.

A pretrained foundation model shifts the burden. The question is no longer “did my hyperparameter search converge?”, it is “does this pretrained behaviour hold on my data, my regulator, my edge cases?” Skore is built for that question: per-fold reports, calibration diagnostics, methodological warnings, comparison views, and audit-ready artifacts, all from one skore.evaluate().

02One screen, two paradigms

The same problem. Two pipelines. Two visions of tabular ML.

On the left, the careful 2025 baseline: tune your gradient boosting, persist what worked. On the right, what 2026 looks like: call a foundation model, persist the evidence.

hgbt_tuned.py 2025 baseline

from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.model_selection import cross_val_score
import optuna
from skore import Project, evaluate

# Tune across 100 trials, 5-fold CV, minutes of compute
def objective(trial):
    params = {
        "learning_rate": trial.suggest_float(...),
        "max_depth":     trial.suggest_int(...),
        "max_leaf_nodes":trial.suggest_int(...),
        # ... five more
    }
    return cross_val_score(
        HistGradientBoostingClassifier(**params),
        X, y, scoring="roc_auc", cv=5
    ).mean()

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)

model = HistGradientBoostingClassifier(**study.best_params)
report = evaluate(model, X, y, splitter=5)
project = Project("baseline")
project.put("hgbt_tuned", report)

Wall-clock: minutes of search, then a single fit. You ship: a tuned model and a report.

tabicl.py 2026 with TFM

from tabicl import TabICLClassifier
from skore import Project, evaluate




# No tuning. No grid. Pretrained priors do the work.




model = TabICLClassifier()  # pretrained TFM




report = evaluate(model, X, y, splitter=5)
project = Project("tabicl")
project.put("tabicl_default", report)

Wall-clock: seconds. You ship: the same report, only Skore knows the difference.

03The demos

One narrative arc. Five examples.

Each demo is a runnable notebook benchmarking TabICL against scikit-learn’s HistGradientBoostingClassifier, with Skore as the comparison layer. Together they form one sentence:

out-of-the-box→ handles dirt→ calibrated→ distributional→ small-data robust

Browse the demos

Demo 01 · out-of-the-box

The five-minute model

“Hyperparameter tuning is a tax. Foundation models pay it once, for everyone.”

HGBT default vs HGBT + 100 Optuna trials vs TabICL default. Skore’s ComparisonReport renders accuracy versus wall-clock time. The orange bar lands at the same accuracy as the tuned baseline, in a fraction of the total time.

Demo 02 · handles dirt

Dirty data, clean prediction

“Real data is messy. Tools should hide the mess. skrub absorbs the dirt; TabICL ingests what’s left.”

Mixed strings, dates, high-cardinality categoricals. Three pipelines: HGBT manual, TableVectorizer + HGBT, TableVectorizer + TabICL. Skore breaks down per-column contribution: preprocessing vs. model.

Demo 03 · calibrated

Calibrated probabilities for risk

“In high-stakes decisions, calibration is the product.”

Probabilistic classification on imbalanced data. Skore’s calibration view: reliability diagrams, ECE/MCE, and 2D probability surfaces side-by-side. Position sizing depends on P(y|x), not on the argmax.

Demo 04 · distributional

Quantile regression, single fit

“Distributions, not point predictions, are the language of risk.”

Heteroscedastic regression target with regime-dependent variance. HGBT trains N models for N quantiles. TabICL emits the full predictive distribution from a single fit. Skore renders coverage and pinball loss.

Demo 05 · small-data robust

Small data, strong prior

“When data is scarce, the prior is everything. Foundation models bring a free one.”

Sweep training sizes from 50 to 5,000. HGBT collapses below ~300; TabICL’s pre-training acts as a free strong prior. Skore plots learning curves with confidence bands. Opens up clinical trials, manufacturing QA, rare events.

04Get started

Three pip installs. One scikit-learn API.

Install the open stack from PyPI, drop it into any scikit-learn pipeline, and start with the worked notebooks.

# the foundation model
$ pip install tabicl

# the evaluation and reporting layer
$ pip install skore

# skrub for real-world data ingestion
$ pip install skrub

Deploying TabICL in a regulated environment?

Probabl offers Forward Deployed Engineering engagements for teams structuring tabular ML around foundation models. Our Design Parter Program is now starting its second cohort, with priority access to Skore Enterprise.

Talk to Forward Deployed Engineering

A tabular foundation model, pretrained. No tuning. With the evidence to ship it to production.