Adara vs D.I.Y Audiobook Production ~ Sep 25
In this video I tell you if I completed my ‘audiobook in 6 weeks’ challenge, and how I pinned down my process for recording and editing Drawing Red.
Video Description: Shows a desk, with a laptop and monitor on a stand. There is an anime avatar representing Adara. Adara has purple hair and light blue bangs. Sometimes she has white wings, wears a grey hoodie or a blue jacket. Intermittent generic backgrounds of libraries and cosy work spaces. The ending screen shows a banner advertising Drawing Red, with a link to www.adaraspence.com/books. The paperback and ebook cover of Drawing Red shows a teenage girl drawing in a notebook with a large protective red and white wolf standing protectively behind her. It's set in a moonlit forest. The hardback cover is black with a grey outline of a tree on it.
Video Transcript
At the start of the UK summer school holidays, I set myself the challenge to record and produce an audio book within 6 weeks. So, before the start of the new academic year, did I do it?
I'm an indie author with a single debut title to my name and very little budget, but with levels of perfectionism off the charts, I decided very quickly that my high standards and overly critical attitude would push fresh, bright-eyed, potential, experience-seeking narrators away. But for some reason, I thought it would be valid to apply that to myself. So, I began to painstakingly learn how to perform, sound, treat my space, record, invest in studio grade headphones, and how to listen to my work critically, and repair audio waves down to 0.03 of a second.
That is to say, my perfectionism stepped in, then immediately heads with my chronic illnesses. My delivery and volume control was all over the place. I was out of breath. I would lose my voice because I'm quite a silent person normally. I would feel faint thanks to my POTS. I would have brain fog thanks to the POTS and chronic fatigue. And I had the onset of migraines from looking at screens too long. I had to admit defeat and take a step back for my own mental health. That was tough to come to terms with. But after 2 days of non-stop editing from morning to night, I had only 11 minutes of clean audio to show for it because I'd miss out a word or say a word wrong. And that would mean pulling together my duvet fort again and hoping I could just slot in the word without it sounding like I'd run a marathon. Other people may have the temperament and physical constitution to do it and be much better suited to it than me. But I decided that in my particular circumstances, I needed to cut down on errors and the post production in editing. I didn't have the funds to hire an editor, and I was still having issues with losing my voice from all the speaking. But I wanted to be the one to record the book. I'd heard it in my head for 5 years already, and I knew how I wanted the lines to be delivered.
My next avenue of investigation was producing my very own voice clone. You heard that right. I had my own Frankenstein, “It's alive!” moment. I wouldn't have human error creeping in, and there would be little to no mistakes like mouth noises or siblings or high background room tone to contend with. My mom was fooled over the phone and genuinely thought it was me speaking. Despite me telling her what she was about to hear was the clone and then the clone telling her it was the clone, she was still waiting for it to start speaking afterwards. It also opens up a whole host of ethical and security questions which could be an entire video on its own. And in terms of production time, I had turned from an editor into an accent coach because every couple of sentences it would get stuck on a word either lapsing randomly into American or very posh received pronunciation British. I even at one point tried to retrain it from scratch to see if that would fix the accent issues, but no, the technology isn't there yet.
So, I abandoned that and instead I tried to use licensed voice clones from other people. So, these are their own voice clones only trained by them as well and they license it in exchange for pay. So, for every 1,000 characters generated, they would receive a payment. I figured that if I chose someone else's voice, then it would remove my accent issues, and that would be the way to go. But they were so incredibly monotone. I tried at one point to liven it up with a full cast version, but then I found that my testers got distracted whenever the narrator would speak again and would only listen whenever a character spoke. Good for a radio drama, not good for a book where the main character has issues speaking. Now, at this point, it is text to speech.
However, I discovered something that I'd previously overlooked, which is advanced voice changing, otherwise known as speech to speech. Imagine talking into a megaphone, and with the press of a button, you can change the voice that's put out at the other end. It's called actors mode. So imagine I find the voice of an 80year-old American man and I decide to use that voice. All of a sudden that 80year-old American man would be sounding like 80year-old northern English. I got excited and thought hey I am going to apply this to my own voice clone. Is this once again a way for me to be using my own voice? So I tried it. The sound was crunchy. I can't put it any other way.
It wasn't as professional as the others who had trained who had professional studios and audio engineers to help them with their training data. Every inflection and breath was mine but still the accent it was better but not as much. So I went back to using the other voice and that was eerie. It transformed into a strange sort of hybrid posh northern. It was kind of scary. Mr. Spence didn't really know what to make of that. The volume control was all over the place. That had been reintroduced, as had some of the mouth sounds.
So, despite being promised the result would be exported to ACX standards with speech to speech, it's kind of too good and it kind of defeated the purpose for what I was needing it to do. And it would have introduced me once again having to re-record things for volume and trying to edit. I downloaded chapter to test because it promised to export to ACX standards that would pass. I ran it through an ACX check plugin. ACX is the store that you go through mainly to get your book onto Audible. I ran it through an ACX check plugin. It failed.
This brought me back to a thought I'd had at the very start of the process, which was, why can't I use the same audio setup I use for streaming? Every single tutorial I'd found told me not to use noise reducers, noise gates, declicker programs, etc., or for dessing for sibilance because they would introduce more problems later and that every single issue had to be fixed manually. But this sounds perfectly fine to me. The best result actually that I've gotten even with my um studio monitoring professional headphones. Add in at this point I also have a PC to buy and all the well the parts to build one. and I realized I could be putting my money towards something else other than the software.
I'm happy that I explored what I explored, but I've come full circle, which is how we got to where we are now. I'm no longer going to fork out money to a website and put pressure on myself to finish something before the next payment is due when I could be rooting my funds to something else, namely the parts I need to build a PC. So, I'm dropping the idea of selling my audio book on audiobook stores and the standards that are binding me. Instead, I can bring you what I believe is the cleanest delivery in my power and have me, the VTuber model, read it to you.
One plus about this is my stamina at reading and speaking is so much better. I've recorded the first chapter and it looks and sounds pretty good. My plan is to make the first chapter free to listen to. Then the rest of the audio book will be behind a very small pay wall to access the unlisted playlist to listen to the rest. My want for an audio version has always been about accessibility. And this seems like a good compromise all round. I'm weighing up also having one long video with all the chapters combined.
As for platforms, I'm currently investigating PayHip and Kofi. If you have any thoughts on those, please drop them below if you have any experience, any recommendations.
I started out by looking at the standards and thinking, why can't I do it my way? Then I remember I'm an indie author. I can do it my way.
My six weeks challenge, I failed. I didn't produce an audio book in that time, but it gave me the chance to explore and learn, to see the pros and cons of various different avenues, and to choose a path that I feel comfortable with. This has been my eighth attempt at recording and editing an audiobook video because each time I changed my mind about something in the process within 24 hours.
The software I had been using was Elevenlabs. And I always made sure to choose voice clones that I knew was solely created and trained by someone with consent to be licensed where I knew they would be paid from my usage. It's a pretty nifty form of passive income if you can get your voice clone clean enough. And my voice clone was not used in the recording of this video. You weren't hoodwinked. Bye!
Credits - Art
Chibi PNGtuber model: Created by vtuber_graphics on Fiverr
Vtube model: @hayukiiii on X
Images: Adara Spence or canva.com
Credits - Music
- Ending theme – Adara Spence
- Beloved - Sakura Girl https://soundcloud.com/sakuragirl_official
Creative Commons — Attribution 3.0 Unported — CC BY 3.0
Free Download / Stream: https://bit.ly/3ji1zZc
Music promoted by Audio Library https://youtu.be/omTgn4GQcKA
- Soon We'll Fly by Ghostrifter Official https://bit.ly/ghostrifter-sc
Creative Commons — Attribution-ShareAlike 3.0 Unported — CC BY-SA 3.0
Free Download / Stream: https://bit.ly/35reep7
Music promoted by Audio Library https://youtu.be/q2vomZqSJuE
- Piano and Guitar music - Adara Spence and Mr Spence