Training a WaveNet Model with Spell.run and Tensorflow.js
In order to create the audio for my project “Audiographs from Unreachable Places,” I used Spell.run to train a WaveNet model. Here is a step-by-step tutorial of how you can do the same.
Gathering Data and Set Up
Collect your audio files in 30-second samples. Ideally, you want 500-5000. I used 260, so my output was of lower quality, but it still produced something usable. Save these files into a folder called soundFiles.
If you don’t have a Spell account, create one here. Otherwise, install Spell via your command line and log in with the following commands:
$ pip install spell
$ spell
$ spell login
Next, you want to clone the Tensorflow implementation of WaveNet. Run the following commands:
$ git clone https://github.com/ibab/tensorflow-wavenet.git
$ cd tensorflow-wavenet
Move the soundFiles folder into the tensorflow-wavenet folder and commit the changes.
$ git add soundFiles
$ git commit -m "adding files"
You can check to make sure your sound files are all there:
$ cd soundFiles
$ ls
You should see a long list of .wav files. Then use $ git status to ensure that your branch is up to date with origin/master.
Mounting the Spell Run
Once everything is set up, you can now run the following command to train the WaveNet model:
$ spell run --machine-type k80 --framework tensorflow --pip librosa --python2 "python train.py --data_dir=soundFiles" --apt libsndfile1
If all goes well, you should see this pop up in Terminal:
Now you can log into the web side of Spell and click on the Runs tab on the left-side column. You’ll be able to see your run details and how long your model has been running for.
Once your model has run for as long as you want, you can stop it. I stopped my training run after 15 hours and 8 minutes, due to resource constraints. Click on your run number and scroll down to Outputs, and then navigate to the following the path:
Outputs/runs/YOURRUNNUMBER/logdir/train/DATEOFYOURRUN/
You will need to copy this path for the next step.
Next, you want to generate your audio. You can do that with the following command:
$ spell run --machine-type K80 --pip librosa --apt libsndfile1 --mount runs/276/logdir/train/2019-12-04T02-12-36:checkpoints --python2 'python generate.py --wav_out_path=test.wav --samples 160000 checkpoints/model.ckpt-22900'
where --mount runs/276/logdir/train/2019-12-04T02-12-36:checkpoints corresponds to the path you copied in the previous step, with :checkpoints added on, --samples 160000 referring to the length of your sample (16000 is one second, so this command will generate 10 seconds of audio), and checkpoints/model.ckpt-22900 referring to the highest checkpoint listed in the Outputs/runs/YOURRUNNUMBER/logdir/train/DATEOFYOURRUN/ folder (which will take the form of model.ckpt-SOMENUMBER).
Allow this run to run until completion. Generating 10 seconds of audio took roughly 1 hour 28 minutes. Once completed, click on the run number and scroll down to the Outputs section. Your audio file will be there, called “test.wav,” and you can download it and open it on your audio player of choice.