Updates for 2024 #63
Replies: 7 comments 7 replies
-
Pandas notebook updatesSpecifying numeric data types
String replacement
# Remove price column symbols
car_sales["Price"] = car_sales["Price"].str.replace('[\$\,\.]', '',
regex=True) # Tell pandas to replace using regex
|
Beta Was this translation helpful? Give feedback.
-
Matplotlib notebook updatesGeneral workflow
Trying to plot non-numeric columns
# Note: In previous versions of matplotlib and pandas, have the "Price" column as a string would
# return an error
car_sales["Price"] = car_sales["Price"].astype(str)
# car_sales["Price"] = car_sales["Price"].astype(int) # Turning the Price column into an integer looks better
# Plot a scatter plot (does not look as good as with .astype(int))
car_sales.plot(x="Odometer (KM)", y="Price", kind="scatter"); Seaborn plotting styles namespace change
|
Beta Was this translation helpful? Give feedback.
-
Scikit-Learn notebook updates
RandomForestClassifier
# Hyperparameter grid RandomizedSearchCV will search over
param_distributions = {"n_estimators": [10, 100, 200, 500, 1000, 1200],
"max_depth": [None, 5, 10, 20, 30],
"max_features": ["sqrt", "log2", None],
"min_samples_split": [2, 4, 6],
"min_samples_leaf": [1, 2, 4]}
from sklearn.model_selection import RandomizedSearchCV, train_test_split
np.random.seed(42)
# Split into X & y
X = heart_disease.drop("target", axis=1)
y = heart_disease["target"]
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Set n_jobs to -1 to use all available cores on your machine (if this causes errors, try n_jobs=1)
clf = RandomForestClassifier(n_jobs=-1)
# Setup RandomizedSearchCV
rs_clf = RandomizedSearchCV(estimator=clf,
param_distributions=param_distributions,
n_iter=20, # try 20 models total
cv=5, # 5-fold cross-validation
verbose=2) # print out results
# Fit the RandomizedSearchCV version of clf
rs_clf.fit(X_train, y_train); Creation of train/validation/test setChanged creation of train/validation/test sets from indexing to random splitting. I find this cleaner and less prone to error. from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Set the seed
np.random.seed(42)
# Read in the data
heart_disease = pd.read_csv("../data/heart-disease.csv")
# Split into X (features) & y (labels)
X = heart_disease.drop("target", axis=1)
y = heart_disease["target"]
# Training and test split (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Create validation and test split by spliting testing data in half (30% test -> 15% validation, 15% test)
X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, test_size=0.5)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
# Make predictions
y_preds = clf.predict(X_valid)
# Evaluate the classifier
baseline_metrics = evaluate_preds(y_valid, y_preds)
baseline_metrics Pipeline upgrades
pipe_grid = {
"preprocessor__num__imputer__strategy": ["mean", "median"], # note the double underscore after each prefix "preprocessor__"
"model__n_estimators": [100, 1000],
"model__max_depth": [None, 5],
"model__max_features": ["sqrt"],
"model__min_samples_split": [2, 4]
} 4.2.1 Classification model evaluation metrics - ROC Curve
from sklearn.metrics import RocCurveDisplay
roc_curve_display = RocCurveDisplay.from_estimator(estimator=clf,
X=X_test,
y=y_test) |
Beta Was this translation helpful? Give feedback.
-
In the lecture Hyperparameter tuning with RandomizedSearchCVRemove
Read more in scikit-learn documentation: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html |
Beta Was this translation helpful? Give feedback.
-
Getting
|
Beta Was this translation helpful? Give feedback.
-
TensorFlow Notebook UpdatesDue to changes in workflow/TensorFlow library updates, going to remake the Dog Vision project. This will be the latest version of TensorFlow (2.14.0, as of October 2023). Currently the notebook will be under the |
Beta Was this translation helpful? Give feedback.
-
There's an increase in students encountering errors when installing jupyter or creating a conda env with jupyter, with Python 3.12 somehow already installed. The error message always is something like this:
You can install jupyter with pip: However, since Python 3.12 is still very new and the possibility of encountering compatibility issues is still high, I recommend the following:
i.e., go to the folder you want to create the env in and execute: Then activate that env you just created Then install the libraries you wanted to have on that env, i.e. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Working on updates for 2024
Main goals:
See the branch (work in progress) - https://github.com/mrdbourke/zero-to-mastery-ml/tree/updates-2023 (this branch will get merged into
master
once the changes are finished)TODO
Working on
tf.keras
here: https://github.com/keras-team/keras-core/issues/223Done
Beta Was this translation helpful? Give feedback.
All reactions