Python Project - Music Genre Classification
Posted by Superadmin on August 22 2020 11:00:44

Python Project - Music Genre Classification

BY  · UPDATED · AUGUST 6, 2020

 

Music Genre Classification – Automatically classify different musical genres

In this tutorial we are going to develop a deep learning project to automatically classify different musical genres from audio files. We will classify these audio files using their low-level features of frequency and time domain.

For this project we need a dataset of audio tracks having similar size and similar frequency range. GTZAN genre classification dataset is the most recommended dataset for the music genre classification project and it was collected for this task only.

music genre classifier model

Music Genre Classification

About the dataset:

The GTZAN genre collection dataset was collected in 2000-2001. It consists of 1000 audio files each having 30 seconds duration. There are 10 classes ( 10 music genres) each containing 100 audio tracks. Each track is in .wav format. It contains audio files of the following 10 genres:

Music Genre Classification approach:

There are various methods to perform classification on this dataset. Some of these approaches are:

We will use K-nearest neighbors algorithm because in various researches it has shown the best results for this problem.

K-Nearest Neighbors is a popular machine learning algorithm for regression and classification. It makes predictions on data points based on their similarity measures i.e distance between them.

Feature Extraction:

The first step for music genre classification project would be to extract features and components from the audio files. It includes identifying the linguistic content and discarding noise.

Mel Frequency Cepstral Coefficients:

These are state-of-the-art features used in automatic speech and speech recognition studies. There are a set of steps for generation of these features:

Steps to build Music Genre Classification:

Download the GTZAN dataset from the following link:

GTZAN dataset

Create a new python file “music_genre.py” and paste the code described in the steps below:


1. Imports:

  1. from python_speech_features import mfcc
  2. import scipy.io.wavfile as wav
  3. import numpy as np
  4.  
  5. from tempfile import TemporaryFile
  6. import os
  7. import pickle
  8. import random
  9. import operator
  10.  
  11. import math
  12. import numpy as np

2. Define a function to get the distance between feature vectors and find neighbors:

  1. def getNeighbors(trainingSet, instance, k):
  2. distances = [] 
  3. for x in range (len(trainingSet)):
  4. dist = distance(trainingSet[x], instance, k )+ distance(instance, trainingSet[x], k) 
  5. distances.append((trainingSet[x][2], dist)) 
  6. distances.sort(key=operator.itemgetter(1)) 
  7. neighbors = [] 
  8. for x in range(k):
  9. neighbors.append(distances[x][0])
  10. return neighbors

3. Identify the nearest neighbors:

  1. def nearestClass(neighbors):
  2. classVote = {}
  3. for x in range(len(neighbors)):
  4. response = neighbors[x]
  5. if response in classVote:
  6. classVote[response]+=1
  7. else:
  8. classVote[response]=1
  9. sorter = sorted(classVote.items(), key = operator.itemgetter(1), reverse=True)
  10. return sorter[0][0]

4. Define a function for model evaluation:

  1. def getAccuracy(testSet, predictions):
  2. correct = 0
  3. for x in range (len(testSet)):
  4. if testSet[x][-1]==predictions[x]:
  5. correct+=1
  6. return 1.0*correct/len(testSet)

5. Extract features from the dataset and dump these features into a binary .dat file “my.dat”:

  1. directory = "__path_to_dataset__"
  2. f= open("my.dat" ,'wb')
  3. i=0
  4. for folder in os.listdir(directory):
  5. i+=1
  6. if i==11 :
  7. break
  8. for file in os.listdir(directory+folder):
  9. (rate,sig) = wav.read(directory+folder+"/"+file)
  10. mfcc_feat = mfcc(sig,rate ,winlen=0.020, appendEnergy = False)
  11. covariance = np.cov(np.matrix.transpose(mfcc_feat))
  12. mean_matrix = mfcc_feat.mean(0)
  13. feature = (mean_matrix , covariance , i)
  14. pickle.dump(feature , f)
  15. f.close()

6. Train and test split on the dataset:

  1. dataset = []
  2. def loadDataset(filename , split , trSet , teSet):
  3. with open("my.dat" , 'rb') as f:
  4. while True:
  5. try:
  6. dataset.append(pickle.load(f))
  7. except EOFError:
  8. f.close()
  9. break
  10. for x in range(len(dataset)):
  11. if random.random() <split :
  12. trSet.append(dataset[x])
  13. else:
  14. teSet.append(dataset[x])
  15. trainingSet = []
  16. testSet = []
  17. loadDataset("my.dat" , 0.66, trainingSet, testSet)

7. Make prediction using KNN and get the accuracy on test data:

  1. leng = len(testSet)
  2. predictions = []
  3. for x in range (leng):
  4. predictions.append(nearestClass(getNeighbors(trainingSet ,testSet[x] , 5)))
  5. accuracy1 = getAccuracy(testSet , predictions)
  6. print(accuracy1)

 

music genre code

Test the classifier with new audio file

Save the new audio file in the present directory. Make a new file test.py and paste the below script:

  1. from python_speech_features import mfcc
  2. import scipy.io.wavfile as wav
  3. import numpy as np
  4. from tempfile import TemporaryFile
  5. import os
  6. import pickle
  7. import random
  8. import operator
  9. import math
  10. import numpy as np
  11. from collections import defaultdict
  12. dataset = []
  13. def loadDataset(filename):
  14. with open("my.dat" , 'rb') as f:
  15. while True:
  16. try:
  17. dataset.append(pickle.load(f))
  18. except EOFError:
  19. f.close()
  20. break
  21. loadDataset("my.dat")
  22. def distance(instance1 , instance2 , k ):
  23. distance =0
  24. mm1 = instance1[0]
  25. cm1 = instance1[1]
  26. mm2 = instance2[0]
  27. cm2 = instance2[1]
  28. distance = np.trace(np.dot(np.linalg.inv(cm2), cm1))
  29. distance+=(np.dot(np.dot((mm2-mm1).transpose() , np.linalg.inv(cm2)) , mm2-mm1 ))
  30. distance+= np.log(np.linalg.det(cm2)) - np.log(np.linalg.det(cm1))
  31. distance-= k
  32. return distance
  33. def getNeighbors(trainingSet , instance , k):
  34. distances =[]
  35. for x in range (len(trainingSet)):
  36. dist = distance(trainingSet[x], instance, k )+ distance(instance, trainingSet[x], k)
  37. distances.append((trainingSet[x][2], dist))
  38. distances.sort(key=operator.itemgetter(1))
  39. neighbors = []
  40. for x in range(k):
  41. neighbors.append(distances[x][0])
  42. return neighbors
  43. def nearestClass(neighbors):
  44. classVote ={}
  45. for x in range(len(neighbors)):
  46. response = neighbors[x]
  47. if response in classVote:
  48. classVote[response]+=1
  49. else:
  50. classVote[response]=1
  51. sorter = sorted(classVote.items(), key = operator.itemgetter(1), reverse=True)
  52. return sorter[0][0]
  53. results=defaultdict(int)
  54. i=1
  55. for folder in os.listdir("./musics/wav_genres/"):
  56. results[i]=folder
  57. i+=1
  58. (rate,sig)=wav.read("__path_to_new_audio_file_")
  59. mfcc_feat=mfcc(sig,rate,winlen=0.020,appendEnergy=False)
  60. covariance = np.cov(np.matrix.transpose(mfcc_feat))
  61. mean_matrix = mfcc_feat.mean(0)
  62. feature=(mean_matrix,covariance,0)
  63. pred=nearestClass(getNeighbors(dataset ,feature , 5))
  64. print(results[pred])

Now, run this script to get the prediction:

  1. python3 test.py

music genre test model

Summary:

In this music genre classification project, we have developed a classifier on audio files to predict its genre. We work through this project on GTZAN music genre classification dataset. This tutorial explains how to extract important features from audio files. In this deep learning project we have implemented a K nearest neighbor using a count of K as 5.