Scikit-learn Expert Practitioner Certification

What will be evaluated?

The Scikit-learn Expert Practitioner Certification is designed to ensure that our certified professionals posses both the conceptual understanding and practical skills of a senior data scientist. When applying to it, you should be proficient in the usage of a broad range of scikit-learn’s tools and functions, as well as posses skills in the following areas:

Expert-level machine learning

In-depth knowledge of machine learning algorithms, including emerging trends and best practices.

Algorithm development

Ability to develop and implement custom machine learning algorithms tailored to specific problems

Model deployment

Expertise in deploying machine learning models into production environments, including knowledge of MLOps.

Research & innovation

Ability to conduct independent research and contribute to the development of new methods or tools.

Strategic planning

Involvement in long-term planning and strategy development for data science initiatives within the organization.

Strategic vision

Strong understanding of the broader industry and market trends to shape the strategic direction of machine learning efforts.

Model diagnostics

Identify, troubleshoot, and resolve potential problems within the machine learning pipeline of other team members.

What do I need to know?

Arrow pointing to the right

Machine learning concepts

- Supervised learning and unsupervised (regression, classification, clustering, dimensional reduction)
- Types of model families (tree-based, linear, ensemble, neighbors)
- Loss functions and surrogate loss
- Splitting criteria in Decision Trees
- Filter, wrapper and embedded methods for feature selection
- Calibration (expected calibration error) vs ranking power (ROC AUC / GINI)

Arrow pointing to the right

Model building and evaluation

- Create your own estimator
â—¦ NearestCentroid
â—¦ Recommender systems
â—¦ Transformers
- Metadata routing
- Calibration plots with CalibrationDisplay and post-calibration with CalibratedClassifierCV

Arrow pointing to the right

Interpretation of results & communication

Explainability and interpretability
â—¦ partial dependence plots: impact non-linear on the target?
â—¦ permutation importance
- Debugging the methodology
â—¦ given a plot, give a diagnostic for the model
â—¦ identify pitfalls in the modeling process (e.g. Feature selection techniques inside or outside the pipeline)
â—¦ code comprehension and good practices

Arrow pointing to the right

Data preprocessing

- Loading parquet datasets
- Extract information from plots, e.g. decide on which family of models may be the best fit
- Data wrangling
â—¦ Combining data from multiple sources
â—¦ Adding new features or derived attributes (e.g. lagged features for time based data)

Arrow pointing to the right

Model selection and validation

Performing hyperparameter tuning with proper scoring rules (calibration)

Arrow pointing to the right

Model deployment

Understanding how to save and load trained models using joblib , pickle or skops.

Get training with Skolar

Prepare for the certifications with three online courses on Skolar, each matching a certification level and reflecting a data scientist’s typical career path.