Audiographs from Unreachable Places

Audiographs from

Unreachable Places

TASK: Use machine learning models to create novel audiovisual content.

DURATION: Six weeks

MY ROLE: Design, conceptualization and implementation

TEAM: Self

SKILLS: Generative audio, data collection and preparation for machine learning, working with StyleGAN models, p5.js, RunwayML, adaptation of Tensorflow WaveNet model for spell.run.

Overview:

An immersive art piece that uses generative sound and latent space wandering to allow visitors to experience nearly lifelike places that exist outside of our own reality.

Process

I used two machine learning models to generate the audio and video: StyleGAN (NVIDIA) and WaveNet (DeepMind). StyleGAN creates high-quality reproductions of various checkpoints, from human faces to landscapes. WaveNet is a generative model for raw audio, often used in human-like speech generation.

To create the latent space wandering video, I used the forests.ckpt checkpoint on the StyleGAN model in RunwayML, and used a p5.js sketch to set the parameters of a Perlin noise walk, which generated the imagery and animation.

To create the audio, I gathered audio and trained a WaveNet model. Using sound files from BBC Sound Effects and FreeSound, I parsed each sample into 30-second wav files using the Sound eXchange command utility, resulting in 500 usable sound files.

I used the Tensorflow implementation of WaveNet, and used spell.run as my remote GPU. I trained the model for a total of 15 hours (due to resource contraints, I was unable to train it for longer) and was able to successfully generate an audio file.

My project was featured by spell.run: