๐ŸŒฝ Crop Classification Model

๐ŸŒฝ Crop Classification Model

choosing a classification model


2 min read

This Kaggle dataset contains soil characteristics, used to recommend what type of farm crop to plant in that soil with a machine learning classification model.

I created a baseline by comparing the performance of 5 different classification models, measuring accuracy.

First setup, a dictionary of models.

# Put models in a dictionary
models = {"KNN": KNeighborsClassifier(),
          "Logistic Regression": LogisticRegression(), 
          "Random Forest": RandomForestClassifier(),
          "GradientBoost": GradientBoostingClassifier(),
          "GaussianNB": GaussianNB(),
# Create function to fit and score models
def fit_and_score(models, X_train, X_test, y_train, y_test):
    Fits and evaluates given machine learning models.
    models : a dict of different Scikit-Learn machine learning models
    X_train : training data
    X_test : testing data
    y_train : labels assosciated with training data
    y_test : labels assosciated with test data
    # Random seed for reproducible results
    # Make a list to keep model scores
    model_scores = {}
    # Loop through models
    for name, model in models.items():
        # Fit the model to the data
        model.fit(X_train, y_train)
        # Evaluate the model and append its score to model_scores
        model_scores[name] = model.score(X_test, y_test)
    return model_scores
{'KNN': 0.9568181818181818,
 'Logistic Regression': 0.9636363636363636,
 'Random Forest': 0.9931818181818182,
 'GradientBoost': 0.9818181818181818,
 'GaussianNB': 0.9954545454545455}

The baseline scores show the Gaussian and Random Forest models performing the best.

Model Comparison

Next steps of the model will be selection of either Gaussian or Random Forest models and performing cross validation grid search hyper parameter tuning.