Training a WaveNet Model with Spell.run and Tensorflow.js

In order to create the audio for my project “Audiographs from Unreachable Places,” I used Spell.run to train a WaveNet model. Here is a step-by-step tutorial of how you can do the same.

Gathering Data and Set Up

Collect your audio files in 30-second samples. Ideally, you want 500-5000. I used 260, so my output was of lower quality, but it still produced something usable. Save these files into a folder called soundFiles.

If you don’t have a Spell account, create one here. Otherwise, install Spell via your command line and log in with the following commands:

$ pip install spell
$ spell
$ spell login

Next, you want to clone the Tensorflow implementation of WaveNet. Run the following commands:

$ git clone https://github.com/ibab/tensorflow-wavenet.git       
$ cd tensorflow-wavenet

Move the soundFiles folder into the tensorflow-wavenet folder and commit the changes.

$ git add soundFiles
 $ git commit -m "adding files"

You can check to make sure your sound files are all there:

$ cd soundFiles
$ ls

You should see a long list of .wav files. Then use $ git status to ensure that your branch is up to date with origin/master.

Mounting the Spell Run

Once everything is set up, you can now run the following command to train the WaveNet model:

$ spell run --machine-type k80 --framework tensorflow  --pip librosa  --python2 "python train.py --data_dir=soundFiles"  --apt libsndfile1

If all goes well, you should see this pop up in Terminal:

Screen Shot 2020-01-20 at 10.32.11 AM.png

Now you can log into the web side of Spell and click on the Runs tab on the left-side column. You’ll be able to see your run details and how long your model has been running for.

Screen Shot 2020-01-20 at 10.36.21 AM.png

 Once your model has run for as long as you want, you can stop it. I stopped my training run after 15 hours and 8 minutes, due to resource constraints. Click on your run number and scroll down to Outputs, and then navigate to the following the path:

Outputs/runs/YOURRUNNUMBER/logdir/train/DATEOFYOURRUN/
Screen Shot 2020-01-20 at 11.12.22 AM.png

You will need to copy this path for the next step.

Next, you want to generate your audio. You can do that with the following command:

$ spell run --machine-type K80 --pip librosa --apt libsndfile1 --mount runs/276/logdir/train/2019-12-04T02-12-36:checkpoints --python2 'python generate.py --wav_out_path=test.wav --samples 160000  checkpoints/model.ckpt-22900'

where --mount runs/276/logdir/train/2019-12-04T02-12-36:checkpoints corresponds to the path you copied in the previous step, with :checkpoints added on, --samples 160000 referring to the length of your sample (16000 is one second, so this command will generate 10 seconds of audio), and checkpoints/model.ckpt-22900 referring to the highest checkpoint listed in the Outputs/runs/YOURRUNNUMBER/logdir/train/DATEOFYOURRUN/ folder (which will take the form of model.ckpt-SOMENUMBER).

Allow this run to run until completion. Generating 10 seconds of audio took roughly 1 hour 28 minutes. Once completed, click on the run number and scroll down to the Outputs section. Your audio file will be there, called “test.wav,” and you can download it and open it on your audio player of choice.

Congrats, you’ve just trained a WaveNet model with Spell!