Spaceship Titanic classification model

Photo by NASA on Unsplash

Spaceship Titanic classification model

Random Forest and XGBoost classification prediction python model. Predicting which passengers will be transported to an alternate dimension.

Using the Kaggle Spaceship Titanic dataset, I created a machine learning model in python to predict which passengers in the data would be transported to an alternate dimension.

Check out the Kaggle notebook

Data Exploration

The first step was loading and performing some data exploration

A pair plot is a helpful visual to spot correlations.

Here is a visual of the age of those transported

During the data exploration I used the get_dummies function on the 'HomePlanet' and 'Destination' columns. I also performed other methods to clean and transform the data.

df_data_final = pd.get_dummies(df_data_final, columns=['HomePlanet', 'Destination'], drop_first=False)

I created a tableau dashboard to help visualize the data. This wasn't necessary but helped me see and interact with the data.

Creating the model

The target variable is the 'Transported' column, the rest of the columns are the features.

# Define the y (target) variable.
y = df_data_final['Transported']

# Define the X (predictor) variables.
X = df_data_final.drop(['Transported'], axis = 1)

Results

Random Forest Model Accuracy: 0.781048758049678

XGBoost Model Accuracy: 0.7828886844526219

The XGBoost model performed slightly better on the training data.

If you are interested, the Kaggle notebook can be found here