The Scikit-learn Expert Practitioner Certification is designed to ensure that our certified professionals posses both the conceptual understanding and practical skills of a senior data scientist. When applying to it, you should be proficient in the usage of a broad range of scikit-learn’s tools and functions, as well as posses skills in the following areas:
In-depth knowledge of machine learning algorithms, including emerging trends and best practices.
Ability to develop and implement custom machine learning algorithms tailored to specific problems
Expertise in deploying machine learning models into production environments, including knowledge of MLOps.
Ability to conduct independent research and contribute to the development of new methods or tools.
Involvement in long-term planning and strategy development for data science initiatives within the organization.
Strong understanding of the broader industry and market trends to shape the strategic direction of machine learning efforts.
Identify, troubleshoot, and resolve potential problems within the machine learning pipeline of other team members.
- Supervised learning and unsupervised (regression, classification, clustering, dimensional reduction)
- Types of model families (tree-based, linear, ensemble, neighbors)
- Loss functions and surrogate loss
- Splitting criteria in Decision Trees
- Filter, wrapper and embedded methods for feature selection
- Calibration (expected calibration error) vs ranking power (ROC AUC / GINI)
- Create your own estimator
â—¦ NearestCentroid
â—¦ Recommender systems
â—¦ Transformers
- Metadata routing
- Calibration plots with CalibrationDisplay and post-calibration with CalibratedClassifierCV
Explainability and interpretability
â—¦ partial dependence plots: impact non-linear on the target?
â—¦ permutation importance
- Debugging the methodology
â—¦ given a plot, give a diagnostic for the model
â—¦ identify pitfalls in the modeling process (e.g. Feature selection techniques inside or outside the pipeline)
â—¦ code comprehension and good practices
- Loading parquet datasets
- Extract information from plots, e.g. decide on which family of models may be the best fit
- Data wrangling
â—¦ Combining data from multiple sources
â—¦ Adding new features or derived attributes (e.g. lagged features for time based data)
Performing hyperparameter tuning with proper scoring rules (calibration)
Understanding how to save and load trained models using joblib , pickle or skops.
Prepare for the certifications with three online courses on Skolar, each matching a certification level and reflecting a data scientist’s typical career path.
Receive fresh and non-spammy
updates from the team