# 🌽 Crop Classification Model

This Kaggle [dataset](https://www.kaggle.com/datasets/varshitanalluri/crop-recommendation-dataset) contains soil characteristics, used to recommend what type of farm crop to plant in that soil with a machine learning classification model.

I created a baseline by comparing the performance of 5 different classification models, measuring accuracy.

First setup, a dictionary of models.

```python
# Put models in a dictionary
models = {"KNN": KNeighborsClassifier(),
          "Logistic Regression": LogisticRegression(), 
          "Random Forest": RandomForestClassifier(),
          "GradientBoost": GradientBoostingClassifier(),
          "GaussianNB": GaussianNB(),
          }
```

```python
# Create function to fit and score models
def fit_and_score(models, X_train, X_test, y_train, y_test):
    """
    Fits and evaluates given machine learning models.
    models : a dict of different Scikit-Learn machine learning models
    X_train : training data
    X_test : testing data
    y_train : labels assosciated with training data
    y_test : labels assosciated with test data
    """
    # Random seed for reproducible results
    np.random.seed(42)
    # Make a list to keep model scores
    model_scores = {}
    # Loop through models
    for name, model in models.items():
        # Fit the model to the data
        model.fit(X_train, y_train)
        # Evaluate the model and append its score to model_scores
        model_scores[name] = model.score(X_test, y_test)
    return model_scores
```

```python
{'KNN': 0.9568181818181818,
 'Logistic Regression': 0.9636363636363636,
 'Random Forest': 0.9931818181818182,
 'GradientBoost': 0.9818181818181818,
 'GaussianNB': 0.9954545454545455}
```

The baseline scores show the Gaussian and Random Forest models performing the best.

![Model Comparison](https://cdn.hashnode.com/res/hashnode/image/upload/v1717769682064/ece9cb4a-7fef-4a58-82a5-ce5a79fa7896.png align="center")

Next steps of the model will be selection of either Gaussian or Random Forest models and performing cross validation grid search hyper parameter tuning.