Analysis of Heavy Metal Lyrics - Part 3: Word Clouds
This article is the third part of the lyrical analysis heavy metal lyrics.
If you’re interested in seeing the full code, check out the
original notebook.
In the next article we’ll use clustering and graph methods to visualize
the genre and lyrical data as a network.
Word clouds are a fun and oftentimes helpful technique for visualizing natural language data.
They can show words scaled by any metric, although term frequency (TF) and
term-frequency-inverse-document-frequency
(TF-IDF)
are the most common metrics.
For a multi-class or multi-label classification problem,
word clouds can highlight the similarities and differences
between separate classes by treating each class as its own document to compare with all others.
The word clouds seen here were made with the WordCloud generator by
amueller,
with pre-processing done via gensim and nltk.
In the case of heavy metal genre classification, term frequency alone would not be very illuminating:
the genres visualized here share a lot of common themes.
TF-IDF does much better at picking out the words that are unique to a genre:
black metal lyrics deal with topics like the occult, religion, and nature;
death metal focuses on the obscene and horrifying;
heavy metal revolves around themes more familiar to rock and pop;
power metal adopts the vocabulary of fantasies and histories;
and thrash metal sings of violence and war.
The full corpus word cloud shows themes common to all heavy metal genres.
Imports
Show code
Data
Show code
Functions for creating and visualizing word clouds
Show code
Word clouds for genres
Here we split the full dataframe by genre, so each document consists of all the lyrics for that genre.
Show code
Word clouds for bands
We can likewise build word clouds for individual bands.
Here are word clouds for the top-ten bands by number of album reviews.
You can see more artist-specific word clouds by clicking on any of the bands included in the
lyrics dataset dashboard).