from sklearn.ensemble import HistGradientBoostingClassifier from sklearn.model_selection import cross_val_score import optuna from skore import Project, evaluate # Tune across 100 trials, 5-fold CV, minutes of compute def objective(trial): params = { "learning_rate": trial.suggest_float(...), "max_depth": trial.suggest_int(...), "max_leaf_nodes":trial.suggest_int(...), # ... five more } return cross_val_score( HistGradientBoostingClassifier(**params), X, y, scoring="roc_auc", cv=5 ).mean() study = optuna.create_study(direction="maximize") study.optimize(objective, n_trials=100) model = HistGradientBoostingClassifier(**study.best_params) report = evaluate(model, X, y, splitter=5) project = Project("baseline") project.put("hgbt_tuned", report)
A tabular foundation model,
pretrained. No tuning.
The evidence to ship it.
TabICL is the open tabular foundation model from soda-inria, pretrained on millions of synthetic tables and competitive with heavily tuned XGBoost, CatBoost, and LightGBM out of the box. Skore turns its predictions into per-fold reports, calibration plots, drift monitors, and audit-ready model cards. One open stack, end to end.
Tabular foundation models are not replacing classical ML. They build on the scikit-learn API.
TabICL is the natural endpoint of a twenty-year program. Scikit-learn democratized tabular ML with
fit and predict. skrub absorbs the mess of real-world data (strings, dates, high-cardinality categoricals). HistGradientBoosting brought XGBoost-grade performance into the platform. TabICL extends the same API surface with a pretrained transformer: millions of synthetic tables become priors, predictions arrive in a single forward pass, no per-task tuning required.
TabICL gives you a pretrained model. Skore gives you the evidence to ship it.
A pretrained foundation model shifts the burden. The question is no longer
“did my hyperparameter search converge?”, it is “does this pretrained behaviour hold on my data, my regulator, my edge cases?” Skore is built for that question : per-fold reports, calibration diagnostics, methodological warnings, comparison views, and audit-ready artifacts, all from one project.put().
The same problem. Two pipelines. Two visions of tabular ML.
On the left, the careful 2024 baseline : tune your gradient boosting, persist what worked. On the right, what 2026 looks like : call a foundation model, persist the evidence.
from tabicl import TabICLClassifier from skore import Project, evaluate # No tuning. No grid. Pretrained priors do the work. model = TabICLClassifier() # pretrained TFM report = evaluate(model, X, y, splitter=5) project = Project("tabicl") project.put("tabicl_default", report)
One narrative arc. Five places where Skore proves the model.
Each demo is a runnable notebook benchmarking TabICL against scikit-learn’s HistGradientBoostingClassifier, with Skore as the comparison layer. Together they form one sentence :
out-of-the-box→ handles dirt→ calibrated→ distributional→ small-data robust
The five-minute model
“Hyperparameter tuning is a tax. Foundation models pay it once, for everyone.”
HGBT default vs HGBT + 100 Optuna trials vs TabICL default. Skore’s ComparisonReport renders accuracy versus wall-clock time. The orange bar lands at the same accuracy as the tuned baseline, in a fraction of the total time.
Dirty data, clean prediction
“Real data is messy. Tools should hide the mess. skrub absorbs the dirt; TabICL ingests what’s left.”
Mixed strings, dates, high-cardinality categoricals. Three pipelines : HGBT manual, TableVectorizer + HGBT, TableVectorizer + TabICL. Skore breaks down per-column contribution : preprocessing vs. model.
Calibrated probabilities for risk
“In high-stakes decisions, calibration is the product.”
Probabilistic classification on imbalanced data. Skore’s calibration view : reliability diagrams, ECE/MCE, and 2D probability surfaces side-by-side. Position sizing depends on P(y|x), not on the argmax.
Quantile regression, single fit
“Distributions, not point predictions, are the language of risk.”
Heteroscedastic regression target with regime-dependent variance. HGBT trains N models for N quantiles. TabICL emits the full predictive distribution from a single fit. Skore renders coverage and pinball loss.
Small data, strong prior
“When data is scarce, the prior is everything. Foundation models bring a free one.”
Sweep training sizes from 50 to 5,000. HGBT collapses below ~300; TabICL’s pre-training acts as a free strong prior. Skore plots learning curves with confidence bands. Opens up clinical trials, manufacturing QA, rare events.
Three pip installs. One scikit-learn API.
Install the open stack from PyPI, drop it into any scikit-learn pipeline, and start with the worked notebooks.
# the foundation model $ pip install -U tabicl # the evaluation and reporting layer $ pip install -U skore # skrub for real-world data ingestion $ pip install -U skrub
Deploying TabICL in a regulated environment ?
:probabl. offers Forward Deployed Engineering engagements for teams structuring tabular ML around foundation models. Design partner program now in second cohort, with priority access to TabICLv2 features and Skore Enterprise.
Talk to Forward Deployed Engineering