The Scikit-learn Professional Practitioner Certification is designed to ensure that our certified professionals possess both the conceptual understanding and practical skills of a mid-level data scientist. When applying to it, you should be proficient in the usage of scikit-learn’s tools and functions, as well as possess skills in the following areas:
Proficiency in a broad range of machine learning algorithms and the ability to select appropriate models for specific problems.
Strong coding skills in Python, with experience in optimizing code for performance and scalability.
Ability to handle large datasets, including data extraction, transformation, and loading processes.
Experience in creating and selecting features to improve model performance.
Proficiency in hyperparameter tuning, model selection, and ensemble methods to improve model performance.
Ability to approach complex problems systematically and evaluate multiple solutions. This includes being able to diagnose possible issues in a model pipeline.
Understanding of how machine learning projects align with business goals and the ability to translate technical results into actionable business insights.
- Supervised learning and unsupervised (regression, classification, clustering, dimensional reduction)
- Types of model families (tree-based, linear, ensemble, neighbors)
- Regularization (L1, L2, Elasticnet)
- Hard and soft predictions in classification (predict vs predict_proba)
- Model overfitting and underfitting impact on soft predictions
- Linear models as baselines
- Handling correlation with regularization and feature selection
- Understanding of bagging and boosting ensemble methods
- Correct choice of metrics (presence of outliers, imbalanced settings, etc)
- Visualizing model results using intermediate plotting techniques (matplotlib, seaborn)
- Interpreting and communicating model outputs and performance metrics to non-technical stakeholders
- Loading parquet datasets
- Visualizing data with intermediate plotting techniques (heatmaps, PCA)
- Identify strongly correlated features
- Handling missing values in the target by using label propagation
- Feature engineering using PolynomialFeatures, SplineTransformer, etc
- Combining features with FeatureUnion
- Broader understanding of cross-validation techniques (group structure, non i.i.d. data, etc)
- Performing hyperparameter tuning using GridSearchCV, RandomSearchCV
- Stability of optimal hyperparameters across splits with nested cross validation
Prepare for the certifications with three online courses on Skolar, each matching a certification level and reflecting a data scientist’s typical career path.
Receive fresh and non-spammy
updates from the team