Scikit-learn Professional Practitioner Certification

What will be evaluated?

The Scikit-learn Professional Practitioner Certification is designed to ensure that our certified professionals possess both the conceptual understanding and practical skills of a mid-level data scientist. When applying to it, you should be proficient in the usage of scikit-learn’s tools and functions, as well as possess skills in the following areas:

Advanced machine learning knowledge

Proficiency in a broad range of machine learning algorithms and the ability to select appropriate models for specific problems.

Programming skills

Proficiency in Python, particularly in using libraries such as scikit-learn, Pandas, and NumPy.

Data manipulation

Ability to clean, manipulate, and preprocess data using Python libraries.

Data visualization

Leveraging Python plotting tools and interpreting results effectively to create robust data-driven solutions.

Model tuning and optimization

Proficiency in hyperparameter tuning, model selection, and ensemble methods to improve model performance.

Model evaluation

Familiarity with techniques for evaluating model performance, such as cross-validation, confusion matrices, and ROC curves.

Attention to detail

Strong attention to detail to ensure data accuracy and model reliability.

Problem solving

Basic problem solving skills with a logical approach to analyzing and addressing issues. This includes making design choices for data pipelines and their evaluation.

What do I need to know?

Need expert advice on the best statistical method for your case? Is your training or inference slower than expected? Want a confidential discussion about your data? Need fast responses without waiting months?

Arrow pointing to the right

Machine learning concepts

- Types of Machine Learning: Supervised, Unsupervised, and Semi-supervised learning.
- Model Families: Tree-based, Linear, Ensemble, Neighbors.
- Key concepts: features, labels, training and test sets
- Model overfitting and underfitting
- Bias/variance trade-off

Arrow pointing to the right

Model building and evaluation

- Splitting datasets into training and testing sets using train_test_split
- Training ML models using the fit() method
- Making predictions using the predict() method
- Evaluating model performance with most common metrics (accuracy, precision, recall, F1 score, confusion matrix, mean squared error, R-squared)
- Interpreting score with respect to dummy models

Arrow pointing to the right

Interpretation of results & communication

- Visualizing model results using basic plotting techniques (matplotlib, seaborn)
- Interpreting and communicating model outputs and performance metrics to non-technical stakeholders

Arrow pointing to the right

Data preprocessing

- Loading parquet datasets
- Visualizing data with basic plotting techniques (scatterplot, boxplot)
- Identify wrongly encoded predictive columns (e.g. float encoded as string)
- Handling missing values using imputation SimpleImputer
-
Correct choice of feature scaling using StandardScaler, MinMaxScaler, etc
- Encoding categorical data using OrdinalEncoder and OneHotEncoder
-
Combining preprocessing steps with ColumnTransformer

Arrow pointing to the right

Model selection and validation

- Understanding and implementing cross-validation techniques (KFold, ShuffleSplit, etc)
- Learning and validation curves
- Performing hyperparameter tuning using GridSearchCV, RandomSearchCV
- Stability of learned coefficients across splits

Get training with Skolar

Prepare for the certifications with three online courses on Skolar, each matching a certification level and reflecting a data scientist’s typical career path.

Interface - Elements Webflow Library - BRIX Templates