Skore × TabICL

A tabular foundation model,
pretrained. No tuning.
The evidence to ship it.

TabICL is the open tabular foundation model from soda-inria, pretrained on millions of synthetic tables and competitive with heavily tuned XGBoost, CatBoost, and LightGBM out of the box. Skore turns its predictions into per-fold reports, calibration plots, drift monitors, and audit-ready model cards. One open stack, end to end.

By the scikit-learn founders TabICL by soda-inria, ICML 2025 Open source, MIT license Local-first, Skore Hub optional
01The thesis

Tabular foundation models are not replacing classical ML. They build on the scikit-learn API.

TabICL is the natural endpoint of a twenty-year program. Scikit-learn democratized tabular ML with fit and predict. skrub absorbs the mess of real-world data (strings, dates, high-cardinality categoricals). HistGradientBoosting brought XGBoost-grade performance into the platform. TabICL extends the same API surface with a pretrained transformer: millions of synthetic tables become priors, predictions arrive in a single forward pass, no per-task tuning required.

TabICL gives you a pretrained model. Skore gives you the evidence to ship it.

A pretrained foundation model shifts the burden. The question is no longer “did my hyperparameter search converge?”, it is “does this pretrained behaviour hold on my data, my regulator, my edge cases?” Skore is built for that question : per-fold reports, calibration diagnostics, methodological warnings, comparison views, and audit-ready artifacts, all from one project.put().

02One screen, two paradigms

The same problem. Two pipelines. Two visions of tabular ML.

On the left, the careful 2024 baseline : tune your gradient boosting, persist what worked. On the right, what 2026 looks like : call a foundation model, persist the evidence.

hgbt_tuned.py 2024 baseline
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.model_selection import cross_val_score
import optuna
from skore import Project, evaluate

# Tune across 100 trials, 5-fold CV, minutes of compute
def objective(trial):
    params = {
        "learning_rate": trial.suggest_float(...),
        "max_depth":     trial.suggest_int(...),
        "max_leaf_nodes":trial.suggest_int(...),
        # ... five more
    }
    return cross_val_score(
        HistGradientBoostingClassifier(**params),
        X, y, scoring="roc_auc", cv=5
    ).mean()

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)

model = HistGradientBoostingClassifier(**study.best_params)
report = evaluate(model, X, y, splitter=5)
project = Project("baseline")
project.put("hgbt_tuned", report)
Wall-clock : minutes of search, then a single fit. You ship : a tuned model and a report.
tabicl.py 2026 with TFM
from tabicl import TabICLClassifier
from skore import Project, evaluate




# No tuning. No grid. Pretrained priors do the work.




model = TabICLClassifier()  # pretrained TFM




report = evaluate(model, X, y, splitter=5)
project = Project("tabicl")
project.put("tabicl_default", report)
Wall-clock : seconds. You ship : the same report, only Skore knows the difference.
03The demos

One narrative arc. Five places where Skore proves the model.

Each demo is a runnable notebook benchmarking TabICL against scikit-learn’s HistGradientBoostingClassifier, with Skore as the comparison layer. Together they form one sentence :

out-of-the-box handles dirt calibrated distributional small-data robust

Demo 01 · out-of-the-box

The five-minute model

“Hyperparameter tuning is a tax. Foundation models pay it once, for everyone.”

HGBT default vs HGBT + 100 Optuna trials vs TabICL default. Skore’s ComparisonReport renders accuracy versus wall-clock time. The orange bar lands at the same accuracy as the tuned baseline, in a fraction of the total time.

Demo 02 · handles dirt

Dirty data, clean prediction

“Real data is messy. Tools should hide the mess. skrub absorbs the dirt; TabICL ingests what’s left.”

Mixed strings, dates, high-cardinality categoricals. Three pipelines : HGBT manual, TableVectorizer + HGBT, TableVectorizer + TabICL. Skore breaks down per-column contribution : preprocessing vs. model.

Demo 03 · calibrated

Calibrated probabilities for risk

“In high-stakes decisions, calibration is the product.”

Probabilistic classification on imbalanced data. Skore’s calibration view : reliability diagrams, ECE/MCE, and 2D probability surfaces side-by-side. Position sizing depends on P(y|x), not on the argmax.

Demo 04 · distributional

Quantile regression, single fit

“Distributions, not point predictions, are the language of risk.”

Heteroscedastic regression target with regime-dependent variance. HGBT trains N models for N quantiles. TabICL emits the full predictive distribution from a single fit. Skore renders coverage and pinball loss.

Demo 05 · small-data robust

Small data, strong prior

“When data is scarce, the prior is everything. Foundation models bring a free one.”

Sweep training sizes from 50 to 5,000. HGBT collapses below ~300; TabICL’s pre-training acts as a free strong prior. Skore plots learning curves with confidence bands. Opens up clinical trials, manufacturing QA, rare events.

04Get started

Three pip installs. One scikit-learn API.

Install the open stack from PyPI, drop it into any scikit-learn pipeline, and start with the worked notebooks.

# the foundation model
$ pip install -U tabicl

# the evaluation and reporting layer
$ pip install -U skore

# skrub for real-world data ingestion
$ pip install -U skrub

Deploying TabICL in a regulated environment ?

:probabl. offers Forward Deployed Engineering engagements for teams structuring tabular ML around foundation models. Design partner program now in second cohort, with priority access to TabICLv2 features and Skore Enterprise.

Talk to Forward Deployed Engineering