Why The black Dahlia Murder?In respect of your upcoming album Verminous, publication April 17, 2020, and because they are one of my favorite death metal bands of all time, I made decision to analysis The black color Dahlia Murder’s (BDM) lyrics to see just how they have developed over time. Earlier in 2006, mine buddy shed a CD of his favorite steel songs, and also “Miasma” from BDM’s album Miasma was on it. As soon as I heard that song, ns was hooked for life and have checked out them perform live 5 times. Created in 2001, BDM conveniently rose up as one of the most renowned American fatality metal bands. Their last seven albums charted ~ above the Billboard 200 with their fifth album, Ritual, peaking in ~ 31 in 2011!



The subject-mater is dark and also morose, and also the music feels choose the embodiment of rage and depression, but as one angsty teenager in the beforehand 2000's, the energy and also the fury of fatality metal to win a chord v me. I can relate to it. Come this day, I discover comfort in melodic riffs, double kicks, and also grumbling vocals due to the fact that they put me in a trance-like state, be sure me and focusing my mind. I find death Metal to be the perfect music for coding and working out, and also I like to begin my job headbanging come blast beats.

You are watching: Black dahlia murder lyrics

Getting the Data

I am extracting the text from Genius.com. Because of the blood curdling screams and also the unearthly growls offered in hefty metal, lyrics deserve to sometimes be quite an overwhelming to understand. It is why I usage Genius to decipher them. Genius.com is a platform because that annotating lyrics, and also collecting trivia around music, albums and artists.

Although the website doesn’t enable users come extract lyrics making use of the API, ns am making use of a Genius API wrapper the parses the Genius HTML through BeautifulSoup. The process results in a pandas DataFrame that consists of the song title, url, artist, album and lyrics. Find the wrapper on my GitHub in a file named getLyrics.py.

Lyrics analysis for Medium. Contribute to bendgame/lyrics-analysis advance by producing an account ~ above GitHub.

#import dependenciesfrom getLyrics import GeniusLyricCollector#pass the api token and also artist name right into the lyric collectorg = GeniusLyricCollector(token, 'The black Dahlia Murder')#load the dataframe with the datasongs_df = g.get_artist_songs()#display an initial 5 rowssongs_df.head()To use the wrapper, pass your Genius API token and also the artist right into the GeniusLyricCollector. Then contact get_artist_songs() to populate the DataFrame.



songs_df DataFrameExploring the Albums

Before evaluating data, that is always best to explore and also clean it as needed. Best off the bat, I notification row 2 has actually no album listed, for this reason I recognize I have to clean the data, and I want to include some features too. In order to maintain the original dataframe in situation I have to go earlier to it, i’ll copy it and work ~ above the copy. I check the included albums:

lyrics = songs_df<<\"Lyrics\", 'Album', \"Title\">>.copy()lyrics.Album.unique()



I only want song from your 8 studio albums. Verminous there is no released and also has no lyrics, black on black color is a tribute album and A Cold-Blooded Epitaph is an EP, for this reason I’ll exclude every one of those.

Feature Engineering

Feature engineering is the process of making use of data mining techniques and also domain understanding to extract attributes from life data. I desire to technician some features that will offer me insight into the lexical richness that the albums. I’ll specify lexical richness making use of a few different factors. I’ll include a word count, unique word count, and unique words/word counting (lexical diversity). If it can be a useful measure and also is a basic calculation, the basic problem through calculating lexical diversity this way is its sensitivity to text length. Luckily, the length of every album is fairly similar. Several various ways have been devised to get over the difficulties with the method, but I’ll usage the an easy calculation.

songs_group = lyrics.groupby('Album')album_stats = pd.DataFrame(columns=('Album', 'word_count', 'unique_count', 'lexical_diversity', 'tokens'))i = 0for name, album in songs_group:# produce a list of words for every word the is in an album album_tokens = <>for lyric in album<'Lyrics'>.iteritems():if isinstance(lyric<1>, str): native = lyric<1>.replace('\\n', ' ') words = words.split(' ')album_tokens.extend(words)# discover how numerous words and unique words are in the album unique_count = len(set(album_tokens)) word_count = len(album_tokens)# calculate the lexical richness that the album, which is the lot of unique words family member to # the complete amount of words in the album album_stats.loc = (name, word_count, unique_count, (unique_count / float(word_count)), album_tokens) ns += 1 album_stats<\"released\"> = <2015,2009,2013,2005,2017,2007,2011,2003>album_stats = album_stats.sort_values('released').reset_index().drop(columns=('index'))album_statsI also include the album’s year of release and also sort through the column.

album_stats<\"released\"> = <2015,2009,2013,2005,2017,2007,2011,2003>album_stats = album_stats.sort_values('released').reset_index().drop(columns=('index'))album_stats


During a 2019 interview because that kerrang.com, lyricist, vocalist and also band frontman Trevor Strnad disputed albums, ranking castle worst to best. In last place, he put Miasma, the band’s second studio album. Having actually grown up through the album, ns love it, but he said the album feel unfocused, in component because they were riding together a high native their at an early stage success through Unhallowed. The stats display that Miasma has the fewest words and the fewest distinctive words suggesting that Trevor is correct around the themes and the lyrics being less characterized compared to other albums. He wasn’t able to provide the album the attention he believed it deserved.

Visualizing the Stats using Plotly Express

To visualize the data, I’m utilizing plotly express due to the fact that it produces interactive visualizations in just a few lines that code! below is a histogram mirroring the distribution of word count by album.

#import dependenciesimport plotly.express as px#create plotfig = px.histogram(lyrics, x=\"word_count\", shade = 'Album')#show figfig.show()
Box plots space a an excellent way come visualize the quartile statistics. Using pandas’ describe(), that is basic to generate the raw quartiles too. Making use of Plotly Express, that is feasible to collection the quartile algorithm offered in package plots. It defaults come linear.

#drop songs v no lyricslyrics = lyrics.loc 0>#count words because that each songlyrics<'word_count'> = lyrics<'Lyrics'>.apply(lambda x : len(x.split()))#count distinct wordslyrics<'unique_count'> = lyrics<'Lyrics'>.apply(lambda x: len(set(w because that w in x.split())))#calculate vocabulary diversitylyrics<'lexical_diversity'> = lyrics<'unique_count'> / lyrics<'word_count'>#display the quartile numeric data.lyrics.describe()fig = px.box(df, x=\"Album\", y=\"word_count\", shade = \"Album\")fig2 = px.box(lyrics, x=\"Album\", y=\"unique_count\", color = \"Album\")fig3 = px.box(lyrics, x=\"Album\", y=\"lexical_diversity\", shade = \"Album\")fig.update_traces(quartilemethod=\"linear\") # or \"inclusive\", or \"exclusive\"fig2.update_traces(quartilemethod=\"linear\") # or \"inclusive\", or \"exclusive\"fig3.update_traces(quartilemethod=\"linear\") # or \"inclusive\", or \"exclusive\"fig.show()
Once again, we can see the Album Miasma doesn’t have actually quite the selection and diversity of the other albums. Looking at the albums that have actually lower lexical diversity, I an alert that those are every one of the albums with well identified themes: Nocturnal, Ritual, Abysmal, and also Unhallowed. Ritual for example, is their highest charting album, but has a reduced lexical diversity than various other albums. This is likely as result of songs like “Oh good Burning Nullifier” that have actually repeating patterns, prefer chants, which are thematic to the album.

Exploring the Lyrics using SpaCy

SpaCy is “an industrial-strength” organic Language handling library because that Python. The was arisen using Cython under the hood, making it fast and also and effective for massive text processing jobs like Parts-of-speech (PoS) tagging and also named entity acknowledgment (NER). I’m going to use the component of speech tagger to acquire a deeper look into the lyrics and themes offered in the albums.

If you’re new to spaCy, ns recommend discovering their installation page since they have actually a most installation options. SpaCy is compatible v Python 2.7 / 3.5+ and also runs top top Unix/Linux, macOS/OS X and Windows. The latest spaCy releases are available via pip and also conda. Otherwise, these commands will get you up and running.

pip install -U spacypip install -U spacy-lookups-datapython -m spacy download en_core_web_smUsing spaCy, i can separation the text right into tokens, and also use the part of speech tagger to identify all of the nouns, verbs and also adjectives. I’m going to take it a look at the top 15 adjectives per album to see what fads emerge. The is essential to note that spaCy is powerful, yet not perfect. It provides statistical models come predict parts of decided or named entities.

import spacy# pack a medium-sized language modelnlp = spacy.load(\"en_core_web_sm\")Unhallowed = album_stats<'tokens'>.loc == 'Unhallowed'>a1 = nlp(str(Unhallowed<0>))Instead of examining the life word, i’m going to analysis the lemmas. Lemmatisation is a technique that normalizes the language through reducing the word to a root kind while preserving that that is a actual word. Another commonly supplied normalization an approach is referred to as Stemming.

#create a brand-new dataframeunhallowed = pd.DataFrame(columns=(\"token\", \"pos\", \"lemma\"))# map through frequency countpos_count = i = 0for token in a1: unhallowed.loc = (token, token.pos_, token.lemma_) i+=1 #locate the components of speech to keep and also reset the index unhallowed = unhallowed.loc.isin(<'PROPN', 'NOUN', 'VERB', 'AUX',' DET', 'ADJ', 'ADP'>)>.reset_index().drop(columns =('index'))Notice in the over code snippet, I produce a dataframe and also populate the with words token, the part of decided tag, and also the lemma. When I calculate the counts for ADJ, I’ll use the lemma instead of the token.

unhallowed_freq = Counter(unhallowed<'lemma'>.loc == 'ADJ'>)#call the 15 most frequentunhallowed_adjs = unhallowed_freq.most_common(15)unhallowed_adjs
15 most typical AdjectivesRightly so, that is clear the death, darkness and also human fragility are major themes in the BDM’s music. The word “dead” top the list for all albums other than Unhallowed and Nocturnal, 2 of their beforehand hit albums. Return a many the same words display up in every album, there room a few thematic differences. For example, in Nocturnal I can see the native “unholy”, “wretched,” and also “necromantic” appear multiple time which room thematically representative that a world cast into eternal darkness, prefer the album conveys. Using a plotly to express bar chart, the data is straightforward to visualize.

See more: Why Was He Born So Beautiful Why Was He Born At All Over, Why Was She Born So Beautiful

Wrapping up

Although the lyrics are only a small portion of what i enjoy about death metal, lock are interesting to analyze. Even without running methods like K-means or LDA, spaCy can aid identify themes in the message using the functions like the tagger and also named entity recognition. Looking in ~ The black color Dahlia Murder’s lyrics, they stand for anger and depravity, depression and insanity. While some might shy away or feeling putrefied by the descriptions of malice, I take on it and also the music, and I am forever a fan of the fast twin kicks, the hypnotic melodic riffs, and the exceptional vocal variety of the black Dahlia Murder. Ns can’t wait to hear what they bring forth in Verminous.