Audiographs from
Unreachable Places
TASK: Use machine learning models to create novel audiovisual content.
DURATION: Six weeks
MY ROLE: Design, conceptualization and implementation
TEAM: Self
SKILLS: Generative audio, data collection and preparation for machine learning, working with StyleGAN models, p5.js, RunwayML, adaptation of Tensorflow WaveNet model for spell.run.
Overview:
An immersive art piece that uses generative sound and latent space wandering to allow visitors to experience nearly lifelike places that exist outside of our own reality.
Process
I used two machine learning models to generate the audio and video: StyleGAN (NVIDIA) and WaveNet (DeepMind). StyleGAN creates high-quality reproductions of various checkpoints, from human faces to landscapes. WaveNet is a generative model for raw audio, often used in human-like speech generation.
To create the latent space wandering video, I used the forests.ckpt checkpoint on the StyleGAN model in RunwayML, and used a p5.js sketch to set the parameters of a Perlin noise walk, which generated the imagery and animation.
To create the audio, I gathered audio and trained a WaveNet model. Using sound files from BBC Sound Effects and FreeSound, I parsed each sample into 30-second wav files using the Sound eXchange command utility, resulting in 500 usable sound files.
I used the Tensorflow implementation of WaveNet, and used spell.run as my remote GPU. I trained the model for a total of 15 hours (due to resource contraints, I was unable to train it for longer) and was able to successfully generate an audio file.
My project was featured by spell.run: