At the tail end of 2018, Henry and I were having an idle conversation about about smart speakers and generative sound. Henry had been working on the BBC’s first experiments at producing a skill for the Amazon Alexa, and we’d been talking for ages about how the rather transactional pattern of interaction that Alexa affords is a bit underwhelming. As people with an interest in sound and computational art, we were pretty excited about the idea of a fairly ubiquitous, programmable, connected speaker, and the ways that it could be used creatively, so we decided to investigate how far the device could be pushed in the service of sonic experimentation.

Henry, seizing this inspiration, immediately went away and converted “In B Flat” by Darren Solomon into an Alexa skill, and we (assisted by our colleague Lei He) assembled eight Amazon Echoes for a performance:

This generated a surprising amount of interest online, so we continued this line of enquiry with added enthusiasm! I became especially interested in how creative applications of new technologies could be used critically: to expose and illuminate the hidden workings of these technologies and their entanglements with society more generally. Around this time I’d revisited Alvin Lucier’s piece “I am sitting in a room” – which uses feedback and repeated overdubbing to slowly transform a spoken word piece into a resonant drone – revealing the material properties of the tape and acoustic environment through repetition and feedback. I wondered if the same approach could be used to better understand the ‘resonances’ or biases of a voice assistant, by repeatedly applying its speech synthesis and speech recognition in turn, and observing how the text deteriorates at each step:

The results were interesting, and Henry then riffed further on this, producing a pair of Alexa skills allowing two echoes to carry out a dialogue which slowly ‘disintegrates’. He christened this “I am Running in the Cloud”, and it, again piqued the curiosity of quite a few people, including the folks at Language Log, which I found particularly pleasing (see Henry’s write-up of this for more).

One thing we realised over the course of these investigations, is that these sorts of playful experiments often have huge practical value, allowing us to discover the ‘edges’ of a technologies capabilities more easily then when we just use them for the sorts of things for which they were designed.

(Photo by Ric Leeson, © BBC 2018)

We were interested in whether this technique could be extended, and in particular, whether working with creative people who were unfamiliar with the technology itself could aid in this process. To investigate, we enlisted sound artists and musicians Graham Dunning, Natalie Sharp, Lia Mice and Wesley Goatley to spend a day with us devising new works using smart speaker technology, before giving a public performance of what we’d produced:

The results were a great success, leading to ideas we’d never have thought of ourselves, as well as illuminating the affordances and limitations of the technology in ways we wouldn’t necessarily have discovered through our day-to-day work – and the principle of “working with artists to explore the affordances of new technologies” became central to our unofficial IRFS manifesto. Since then, Jakub and I travelled to Sonar+D in Barcelona to talk about our work and to give a workshop for musicians and artists on design and prototyping with smart speakers. We also continued our investigations of musical uses of conversational interfaces – supervising PhD students Lizzie Wilson and Jorge del Bosque on an internship in which they developed a speech-controlled algorithmic music system called Unspoken Word, which was exhibited at Ars Electronica 2018.