Analysis of Heavy Metal Lyrics and Reviews
Introduction
In this project, I take a look at heavy metal artists and their lyrical content.
The core data set combines artist information, including genre labels, and album reviews from
The Metal-Archives (MA) and song lyrics from DarkLyrics
(DL). The data collection begins with the metallum_ids.py
script, which reads through the complete list of
album reviews sorted by artist in order to build a csv table
of artist names and id numbers for artists with at least one album review (/data/ids.csv
). Artist information and
full-text album reviews are then scraped by metallum.py
and saved into json files (/data/bands.zip
). The DL
scraping tool darklyrics_fast.py
searches DL for the corresponding album lyrics and adds them to the json files.
Finally, the data set is split by create_dataframes.py
into a csv table of album reviews and a csv table of song
lyrics (/data/data.zip
).
Analyses
The articles below provide insights on the history of heavy metal albums, and linguistic properties of metal lyrics.
Exploration of artists and album reviews
A data-driven discussion of the history and global demographics of the heavy metal music industry and its many genres. This notebook also provides statistical insights on the sentiments of MA users as expressed through online album reviews.
Neural network album review score prediction
Predicting review scores from text using a convolution neural network and GloVe word embeddings.
Brief overview of the lyrics data set.
Comparison of lexical diversity measures and what they tell us about artists and genres.
Concise visualizations of song lyrics from different genres.
Processing data for generating network graphs with Gephi.
This notebook presents the multi-label problem of genre classification based on lyrics. Different approaches and preprocessing steps are discussed, and various machine learning models are compared via cross-validation to demonstrate possible solutions.
Machine learning scripts
For the genre classifier tool (see link at the bottom of page), a number of machine learning models were tuned and
trained to assign genre tags to text inputs of arbitrary length. As discussed in the machine learning notebook above,
these models are incorporated into pipelines that also vectorize (and oversample, when training) the data. The
relevant scripts are located in analyses/lyrics/scripts
and are configured by the corresponding .yaml
files in
analyses/lyrics
. The genre_classification_tuning.py
script tunes the models using cross-validation to determine
optimal hyperparameters. The genre_classification_train.py
script is used to train the model, given those optimal
hyperparameters, and genre_classification_test.py
can be used to test the pipeline for functionality before
deploying it to the genre classifier tool.
Interactive webpages
Source code for these webpages can be found in the pdqnguyen/metallyrics-web repository.
Explore the lyrics and album reviews data sets through interactive scatter plots and swarm plots.
Network graph of heavy metal bands
See how genre associations and lyrical similarity connect the disparate world of heavy metal artists.
Global and U.S. maps of heavy metal bands
Explore the world of heavy metal through choropleth maps.
Interactive genre classifier tool
Enter any text you want and see what heavy metal genres it fits in best.