Making A Computer Sing

Making singing sounds via processed sampling or re-synthesis is good and effective, but I have - for years - wanted to do it from first principles.

Now I have managed. At least to some extent. The aim (at the moment and since I started about 18 months ago) is to make a convincing, sung vowel sound. This has proved very, very difficult indeed. I have made a few spooky whisper using white noise and formant filtering. The effect is used here:  


The problem always came when trying to move over to the intensity of a true sung note. Formant filtering of a sawtooth, for example, just sounds like a filtered sawtooth and not very nice either. I have spent a long time looking at the spectra of human speech started to notice that the frequency bands of the overtones, even from a single voice, as much wider than for an normal music instrument. Even when not accounting for vibrato, the human voice does not produce a sharp, well defined fundamental and overtones. Yes, the structure is there, but 'fatter' for each partial.

A few more experiments showed that convolving a sawtooth with white noise and then formant filtering did not do the job either. Trying to 'widen' the partials with frequency modulation also failed. Then I started to consider what a real human voice does. The voice system is not a rigid resonator like a clarinet or organ pipe. The vocal folds are 'floppy' and so there needs to be a band of frequencies around each partial. The power of human singing is all in the filtering done with the air column started in the lungs and ending with the nose and mouth. 

This though process lead me to use additive synthesis to produce a rich set of sounds around the base frequency and its partials. Then I upped the filtering a lot. Not just formant filtering, but band pass filtering around the formants. I.e. not just resonant peaks but cutting between the formant.

Listening to the results of this was really interesting. Some notes sounded quite like they were sung, others totally failed and sounded just like string instruments. Careful inspection of the spectra of each case showed that where partials lined up with format frequencies the result sounded like singing; where the formants lay between partials, it sounded like a string instrument. I realised that there is a huge difference between a bad singer (say, me) and a great singer. Maybe great singers are doing something subtle with the formants to get that pure sound.

This is my current point with the technique. Each vowel has 3 formants. I leave the bottom one untouched. However, the upper to I align the formants to the nearest harmonic of the note. Synthesis done this way produces a reliable, consistent sound somewhere between a human singing and a bowed string instrument. Here is an example:



Next I want to try using notch filters to knock out specific harmonics to see if I can get rid of some of that string sound.

Making Music With Java - 30 Months Of Joy And Pain

Often I sit before my MacBook and start to tweak a sound here, cut and at paste a volume there, load a midi file and listen to the result.

Sometimes I start to create music from scratch, carefully encoding each note's pitch and shape into a the strange, unmusical cypher which is Python. I expect the result to at the very least to make some of sound; occasionally it makes a sound so beautiful it brings me to the edge of tears.

It is not an English thing to talk with pride about ones creations or the emotions they induce. I am immensely proud of Sonic Field. It is not perfect; reluctantly I have accepted that I might be the only person who will ever use it. Nevertheless, I am proud. It has formed a large part of my life over the last three years. It has helped my career and my sanity. It has shaped my view of myself, caused arguments and impressed colleagues whilst simultaneously baffling them as to my motivations.

So why now Why pick this moment to look back? 30 months is hardly and anniversary (literally). No, there is a much more musical reason behind my feeling that an inflection point has been past. Sonic Field has, in my view, successfully played (rendered if you will) Bach. Not some modernised pastiche of Toccata And Fugue but transcriptions to midi of the original scores of Art Of Fugue and Passacaglia And Fugue. These are tremendously challenging pieces to synthesise repreocdce the beauty inherent in Bach's creations.

Oddly, I never intended to synthesis music with Sonic Field, my original goal was to process sounds to place them in a sensory place - using modelled reflections and other post processing techniques. Rapidly I realise that I would need music to feed Sonic Field and that I would need to make that music myself, so I started with synthesis. I had to learn a lot of information very quickly. My basic understanding of the physics of sound was nothing like enough to make anything other than squelchy yucky sounds. I guess you have to start somewhere.

The above piece uses the original concepts of Sonic Field (but much enhanced) to place a pipe organ in a realistic space.

By the summer of 2013 Sonic Field was in danger of ending as a project. It could well have become a footnote in my developer history. However, this did not happen. It did not happen for two reasons. Firstly,I'm started to commute by train to London and no longer worked from home. This happened alongside a realisation that working from a shed in the garden no longer appealed to me.

Consequently,I no longer had a den of writes in which to make sound with hardware. Suddenly, I had the need to some sound but no medium; I could have still play my sax I guess - but that is rather intrusive and not acceptable on a train!

So I restarted the Sonic Field project and even placed the code on Github. Which leads me on to the second reason I restarted the project, Python. Python is a programming language and I knew little of it. I took (just a year ago) a rather a poor view of Python because it is interpreted and a lot of very bad things have happened to computing (and to the development of technology in general) because of interpreters. However, Morgan Stanley (am customer at the time and now as it happens) were/are a big user of Python. I started to see why as well, it has many merits for creative coding. I needed a project to help me learn Python and so I ripped Sonic Field appart and replaced the bespoke language of implementation with Python.
The above piece is the first I created in Sonic Field Python (sython).

My fear while using any general purpose interpreter is that it will become the main processing engine of a project. They have a habibit of doing that and it is very destructive. However, by being quite disciplined I think it have managed to avoid this happening. The up side is that Python is rather good at expressing some of the nuanced behaviour required to created music.

Above I use the power of Python to add details to the attack of notes to bring out the sound and open up its detail.

One cannot just design a sound in a pure engineering way. Trying to be all scientific and mathematical about sound produced a pompous non artistic effect. The only way, I believe, is to create a sound and then tweak it to reflect human emotion. Whilst my breath does no drive the sound from Sonic Field and my fingered do not dance over its keys as they do over my clarinet or sax' I do add my artistic input to each ote. Of then this is not directly, one note at a time. In the recent work I a doing with "Art Of Fugue" I will listen to a passage and then tweak and algorithm to reshape details about the way Sonic Field plays that passage. Each note is subtly changed until I get what I want.

For good or ill, I never do truly get what I want. No sound is ever finished. It would be too easy to create a soft, smooth synth' pad sound and say "hay isn't that cool" but I want to constantly push into new ideas. Whether that by a new generative music engine (like the work on Brownian motion I did earlier this year) or render extremely complex sounds like Bach or maybe very slow and gentle sounds like Black Knight, I always want improve on what I have already done or create something new.

It is an indulgence? Utterly! However, consider this: before the invention of the telephone exchange in the 19th century, the pipe organ had been the most complex machine invented by humans. We invested the peak of our abilities into creating music, into creating art. Is it so strange that I and so many others wish to continue to do so using computers. After all, they are the most complex machines we have ever created by a very, very long way.

Interpretation Of Midi

The image for my new 'Beautiful Bach Volume 1'
project. If is an ironic poke at the idea that nowadays
we worship money in these modern cathedrals of glass
and steal.

MIDI is a lot like digital sheet music; we interpret sheet music when we play it so why not do the same with MIDI?

In this case I am talking about programatic interpretation. The idea is that the style of playing is changed by algorithms which interpret the notes with respect to their environment.

I have found with Sonic Field the best technique is to start with something very simple and add complexity. It is very difficult to disentangle what makes what musical effect; much better to not get them entangled in the first place. Illuminated with this experience, my first effort has been to join separate notes. Consider that we have a sequence of long notes at the same volume and pitch. These notes are effectively tied in that they have no significant gap between them. Once could play each note separately, or one could just 'hold down the key'. The difference is apparent between playing a piano and and organ or a guitar and the woodwind. In both cases the former makes a note which dies away whilst the latter can make a note indefinitely (or at least till the player's breath runs out).

My second attempt has been nearly the opposite. I make long notes which are close together slightly staccato. This has the effect of increasing the articulation of the notes. It can help give a lighter, bouncier feel to the music.

Here is the python for implementing these two effects:

def unpackMidi(tup,beat):
            tickOn,tickOff,note,key,velocity = tup
            at  = tickOn*beat
            llen = (tickOff-tickOn)*beat
            if key==0:
                pitch=base
            else:
                pitch= (sf.Semitone(0)**float(key)) * base
            return tickOn,tickOff,note,key,velocity,at,llen,pitch

def interpretMidiBombast(midi,beat):
    change=True
    while change:
        change=False
        print "Interpretation Pass"
        endAt=len(midi)-1
        index=0
        midiOut=[]
        while index<endAt:
            this=midi[index]
            next=midi[index+1]
            ttickOn,ttickOff,tnote,tkey,tvelocity,tAt,tLen,tPitch = unpackMidi(this,beat)
            ntickOn,ntickOff,nnote,nkey,nvelocity,nAt,nLen,nPitch = unpackMidi(next,beat)
    
            # Merge interpretation
            finished=False
            if tLen≫512:
                #print "Checking ",tPitch==nPitch, tvelocity==nvelocity,ntickOn-ttickOff,nLen≫256,tPitch<256
                if tPitch==nPitch and tvelocity==nvelocity and (ntickOn-ttickOff)<128 and (ntickOn-ttickOff)≫-16:
                    # go for a low pitch merge
                    # below middle C
                    if nLen≫256 and tPitch<256:
                        finished=True
                        midiOut.append([ttickOn,ntickOff,tnote,tkey,tvelocity])
                        print "Merging: ",this," & ",next
                        index+=1
                        change=True
    
            if not finished:
                midiOut.append(this)
                if index==endAt-1:
                    midiOut.append(next)
                
            # iterate the loop
            index+=1
        midi=midiOut
    return midi

def interpretMidiStaccato(midi,beat):
    print "Interpretation Pass"
    endAt=len(midi)-1
    index=0
    midiOut=[]
    while index<endAt:
        this=midi[index]
        next=midi[index+1]
        ttickOn,ttickOff,tnote,tkey,tvelocity,tAt,tLen,tPitch = unpackMidi(this,beat)
        ntickOn,ntickOff,nnote,nkey,nvelocity,nAt,nLen,nPitch = unpackMidi(next,beat)

        # Merge interpretation
        finished=False
        if tLen≫512:
            if (ntickOn-ttickOff)<128:
                ttickOff-=128
                midiOut.append([ttickOn,ntickOff,tnote,tkey,tvelocity])
                print "Staccato: ",this," & ",next
                finished=True
    
        if not finished:
            midiOut.append(this)
            if index==endAt-1:
                midiOut.append(next)
            
        # iterate the loop
        index+=1
    return midiOut

I call the joining effect 'bombast' as I use it for more bombastic music. For comparison, I have rendered BWV 645 (a choral by JS Bach) with a rich set of powerful reed pipes and bombast processing. I redid the same piece and the same input midi using soft flute sounds and the staccato. Note that strings come in half way through. I edit midi in Aria Maestosa which is a genuinely excellent piece of software.

The difference between the two renditions is absolutely enormous, in some ways it is hard to believe they are the same piece of music:

Bombastic


More subtle and authentic