# Visualizing text data

As part of exploring the Kaggle [dataset](https://www.kaggle.com/competitions/nlp-getting-started/data) of tweets used to train and predict whether a tweet was about a real disaster or not, I explored a couple ways of visualizing the text data.

### Python Bar Chart

Part of the dataset is a column for the tweet keyword. I created a sorted bar chart to display top keywords. 'Fatalities' is the top keyword by count with a number words very close.

```python
# Plot tweet keywords
plt.bar(keyword_df['keyword'].head(20), keyword_df['count'].head(20), color='green')
plt.xticks(rotation = 90)
plt.ylabel('Count')
plt.title('Top keywords')
plt.show()
```

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1705519570059/5cdca87a-0526-4924-a00d-f4f3561b1775.png align="center")

### Python WordCloud

In python, I created a world cloud of the keywords.

```python
# Create word cloud visual of keywords
from wordcloud import WordCloud
word_frequencies = keyword_df['keyword'].value_counts().to_dict()

# Generate the word cloud with frequencies
wordcloud = WordCloud(width=800, height=400, background_color='white')
wordcloud.generate_from_frequencies(word_frequencies)

# Display the word cloud using matplotlib
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
```

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1705519993725/9e034693-ef13-4363-a02e-319e3717d622.png align="center")

This created a neat visual where larger words are based on the count of keywords.

### Tableau Word Cloud and TreeMap

Finally, I explored visualizing in [Tableau](https://public.tableau.com/app/profile/donald.tucker4155/viz/Visualizingkeywords/TreeMap), but since the top keywords have counts very close together, the word cloud looked more like a list of words.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1705520396488/1878a689-e6e2-4b33-8a0d-2e3c506f7ea3.png align="center")

I followed this video in creating the visual.

%[https://www.youtube.com/watch?v=UHOMH5DTq14] 

Changing the tableau chart into a TreeMap helped visualize a little better but it isn't as helpful as I would like.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1705520620718/0f9a0d7d-2088-4127-9055-491160e55d24.png align="center")

You can view the Kaggle [notebook](https://www.kaggle.com/code/glenn23/nlp-tweets-classification-use-sequential-api) where the python charts are used in a NLP classification model.
