A semantic approach to autonomous mixing #arp13

Shakingthrough was the source of the audio stems used in Brecht’s detailed experiments into autonomous semantically assisted mixing.

DE MAN, BRECHT (Queen Mary University of London)

A semantic approach to autonomous mixing

[abstract] There is a clear need for systems that take care of the mixing stage of music production for live and recording situations. The democratisation of music technology has allowed musicians to produce and distribute their own content at very little cost, but in order to deliver high quality material a skilled mixing engineer is still needed, among others. Mixing multichannel audio comprises many expert but non-artistic tasks that, once accurately described, can be implemented in software or hardware. By obtaining a high quality mix fast and autonomously, studio or home recording becomes more affordable for musicians, smaller music venues are freed of the need for expert operators for their front of house and monitor systems, and both audio engineers and musicians can increase their productivity and focus on the creative aspects of music production.  Current automatic mixing systems already show adequate performance using basic extracted audio features or machine learning techniques, and sometimes outperform amateur mixing engineers.

However, few intelligent systems seem to take semantic, high-level information into account. The applied processing is dependent on low-level signal features, but no information is given (or extracted) about the band, recording conditions, and playback conditions, to name a few. This information, which can be provided by an amateur end user at little cost, could significantly increase the performance of such semi-autonomous mixing system. Moreover, using feature extraction for instrument and even genre recognition, a fully autonomous system could be designed.

Many sources, among which numerous audio engineering books and websites, report standard processor settings. These settings depend on the engineer’s style and taste, the band’s and song’s characteristics, and to some extent the characteristics of the signals. This involves preferential values for relative level, panning, equalising, dynamic range compression, and time-based effects.

In this paper a synthesis is made of the ‘best practices’ derived from a broad selection of audio engineering literature and expert interviews, to constitute a set of rules that define to the greatest possible extent the actions and choices audio engineers make, given a song with certain characteristics. Rule-based processing is then applied to reference material (raw tracks) to validate the semantic approach. A formal comparison with state-of-the-art automatic mixing systems as well as human mixes as well as an unprocessed version is conducted, and future directions are identified. 

[JB comment – there was so much detail (and detailed audio) in Brecht’s presentation that I confess I didn’t manage to blog it in real time. The work is excellent and could, after further development, provide an obvious commercial/end user benefit. You can listen to the results of the automixing experiments, and find out more about Brecht’s work, on his own website – http://brechtdeman.com/research.html]