Professional certification

What will be evaluated?

The Scikit-learn Professional Practitioner Certification is designed to ensure that our certified professionals possess both the conceptual understanding and practical skills of a mid-level data scientist. When applying to it, you should be proficient in the usage of scikit-learn’s tools and functions, as well as possess skills in the following areas:

Advanced machine learning knowledge

Proficiency in a broad range of machine learning algorithms and the ability to select appropriate models for specific problems.

Programming expertise

Strong coding skills in Python, with experience in optimizing code for performance and scalability.

Data handling and engineering

Ability to handle large datasets, including data extraction, transformation, and loading processes.

Feature engineering

Experience in creating and selecting features to improve model performance.

Model tuning and optimization

Proficiency in hyperparameter tuning, model selection, and ensemble methods to improve model performance.

Critical thinking

Ability to approach complex problems systematically and evaluate multiple solutions. This includes being able to diagnose possible issues in a model pipeline.

Business expertise

Understanding of how machine learning projects align with business goals and the ability to translate technical results into actionable business insights.

What do I need to know?

Machine learning concepts

- Supervised learning and unsupervised (regression, classification, clustering, dimensional reduction)
- Types of model families (tree-based, linear, ensemble, neighbors)
- Regularization (L1, L2, Elasticnet)
- Hard and soft predictions in classification (predict vs predict_proba)
- Model overfitting and underfitting impact on soft predictions

Model building and evaluation

- Linear models as baselines
- Handling correlation with regularization and feature selection
- Understanding of bagging and boosting ensemble methods
- Correct choice of metrics (presence of outliers, imbalanced settings, etc)

Interpretation of results & communication

- Visualizing model results using intermediate plotting techniques (matplotlib, seaborn)
- Interpreting and communicating model outputs and performance metrics to non-technical stakeholders

Data preprocessing

- Loading parquet datasets
- Visualizing data with intermediate plotting techniques (heatmaps, PCA)
- Identify strongly correlated features
- Handling missing values in the target by using label propagation
- Feature engineering using PolynomialFeatures, SplineTransformer, etc
- Combining features with FeatureUnion

Model selection and validation

- Broader understanding of cross-validation techniques (group structure, non i.i.d. data, etc)
- Performing hyperparameter tuning using GridSearchCV, RandomSearchCV
- Stability of optimal hyperparameters across splits with nested cross validation

Scikit-learn Professional Practitioner Certification