Record the audio

Overview

Our goal in this step is to capture spoken audio for each slide.

While it’s certainly possible to dictate and record this audio with your own voice, that process can be extremely time consuming, finicky, and physically taxing, requiring a perfectly quiet environment in which to record. It can also be challenging to go back and re-record any of the audio clips yourself at a later time; any environmental changes between the two takes can show up in the audio. It’s easier and more scalable to use a text-to-speech approach here.

First we’ll record the text-to-speech audio clip for each separate slide. We’ll go back and add these audio clips to the presentation in a later step.

To convert text to audio…

First open your text-to-speech application and your presentation storyboard document.

Storyboard document and text-to-speech application

Copy the text for one slide from your presentation storyboard and paste it into your text-to-speech application.

Copying text from storyboard document and pasting into text-to-speech application

Start the text-to-speech process in your text-to-speech application.

Then save the resulting audio clip to a local folder for later use.

Then repeat this process for each slide in your presentation.

Tools

You will need to select a text-to-speech application to use for your video.

Depending on that choice, you may also need to record your system audio and/or convert the clip from MP4 format to an audio format like M4A or WAV.

Text to speech applications

You can use any text-to-speech application for this step, including:

Google Cloud text to speech
- We are currently evaluating Google Cloud text to speech as our primary text-to-speech recommendation. 🏆
- This is a high-quality tool that Jack Henry employees should be able to use for free.
- Allows you to download the audio clip directly instead of having to record system audio then convert that to audio format.
- The Google Cloud environment has not been fully set up for Jack Henry employees yet, so we’re currently using a free trial on a personal account to evaluate this. We’ll update this page once we know more.
Microsoft Word’s immersive reader
- Free for Jack Henry employees to use.
- No direct download. You have to record your system audio, save that as an MP4, then convert that to audio format.
- Oddly enough, the browser-based version of Word does better text-to-speech than the desktop version.
Amazon Polly, Speechify, etc
- Requires paid license.
- Unsure if a direct download feature is available or not with these tools.

Capturing system audio

If you’re using a text-to-speech application like Google Cloud text to speech that allows you to directly download the text-to-speech audio clip, then you can skip this step.

However, some text-to-speech applications like the immersive reader in Microsoft Word do not offer the ability to download an audio clip. If you use those tools, you’ll have to record your system audio to capture the text-to-speech audio clip.

If you’re going this route, then use SnagIt to record the text-to-speech as system audio. We all have licenses for SnagIt at Jack Henry.

That process looks something like this:

Cue up the text-to-speech application to read your text aloud, but don’t start it yet.
Open SnagIt and switch to the video capture tab.
- Make sure that the Record System Audio box is checked. We want to capture your system audio here.
- Make sure that the Record Microphone box is not checked. You don’t want to hear your environment’s ambient sound in this clip.
- Make sure that the Record Webcam box is not checked. We’re only interested in audio here, not video.
Start recording in SnagIt.
Switch to your text-to-speech application and start it reading your text.
Once the text has been spoken aloud, switch back to SnagIt and stop the recording.
In the SnagIt capture window, press the Share button at the top-right of the screen.
Select MP4 as the export format and save the file locally.

Converting from MP4 to audio format

When we add our audio clips to the presentation in the next step, those clips need to be in an audio format like M4A or WAV. If you use Google Cloud text to speech, it allows you to download the audio clips in WAV format.

But if you use SnagIt to record your system audio, you’re actually capturing the system video along with the system audio, so when you go to export that, your only options are MP4 or animated GIF.

So using SnagIt to record our audio clips means that we need to convert the MP4 video+audio captures to an audio format like M4A or WAV.

If you’re on Windows, VLC is a good option. You will need to download and install this app on your Windows machine.
If you’re on MacOs, the Quicktime Player is a good option. This app is included in MacOs.

With either tool, open the MP4 file, then look for an option to export it in an audio-only format, such as M4A or WAV.

Examples

Google Cloud text-to-speech

We are evaluating Google Cloud text-to-speech functionality and are waiting for Jack Henry Ground Control to set up the GCP environment for all of us to be able to use this feature. So far it looks very promising. This will likely be our primary TTS recommendation. As such, these instructions will change once we have our GCP environment fully available.

Using Google Cloud text to speech

Go to the Google Cloud console text to speech site.
Copy the text for a slide and paste that into the box in the Enter text or ssml section.
- You can have it read your text as-is, or you can use special text-to-speech markup called ssml if you want more control over how your text is read, including pitch, speed, emphasis, etc.
Under Setup configuration, Select a Voice. The Studio voices are beta, but are the best.
Press the Synthesize button. This converts your text to an audio clip.
Once the synthesis completes, press the DOWNLOAD button to download your audio clip in WAV format.

Using Microsoft Word’s immersive reader for text to speech

First, create a new document in the browser-based version of Microsoft Word.
- Oddly enough, the browser-based version of Word does better text-to-speech than the desktop version.
Generate the spoken text audio for each slide:
- Copy the text for slide #1 from the video script text and paste that into the Word document. Overwrite the text from the previous iteration if there is any.
- In Word, switch to View > Immersive Reader.
- In the immersive viewer, click the Voice Settings icon at the bottom of the page (the icon that looks like a speaker with a gear).
- In the popup, select either a Male or Female voice. Most jhVids videos tend to use the female voice.
- Before you play the audio clip, first open SnagIt and begin the process for recording a video.
  - As you record video from SnagIt, be sure that the Record System Audio toggle is on and the Record Microphone toggle is off. We want it to record the sound from Word (system audio) but not the ambient sounds in your workspace (microphone).
- Begin recording video in SnagIt. It gives you a 3-second countdown.
- Quickly switch to Word and press play to have it start reading your script using the selected voice.
- After the clip has been fully read, stop the SnagIt recording.
- Save the SnagIt video recording as an MP4 file named Audio - Slide XX.mp4, where XX is the 2-digit slide number.
- Repeat this process until you have an MP4 for each slide’s audio.
In the previous step we used SnagIt to export an MP4 file with the audio for each slide. But before we can import this audio into the presentation (in a later step), we first need to convert it to an audio format that Powerpoint can support, such as the M4A format.
- The following instructions are for QuickTime. If you’re using a different tool, look up that tool’s process for opening an MP4 file and exporting it in M4A format.
- Open the Quicktime Player.
- Open all of the Audio - Slide XX.mp4 files that you created in the previous step.
- For each open file:
  - Select File > Export As, then select Audio Only….
  - Save the M4A version of the file into the same folder as your MP4 files.
  - Close that file.
  - Repeat this process for each audio MP4.

Next step: Create a poster image

Support options

Have questions on this topic?

Join the jhVids team in Microsoft Teams to connect with the community.

See something in this page that needs to change?

Send us feedback on this page.

Please ignore this field

Did this page help you?

Last updated Mon Dec 29 2025