Introduction

The idea is to use voice-to-text engines combined with old radio technology. So, for example, instead of listening to a shortwave radio channel for 24 hours to see if there is ever anything interesting happening there, I could set up my “Listener” to run for a day, then I review a date & time stamped transcript of what that station broadcast, with links to go listen to the audio of the interesting parts.

That’s it. That’s the idea.

I’ve always wanted to explore old radio hardware, but I don’t have the patience to sit and listen for hours and hours to hear what’s there.

Plan

Google has a speech-to-text service with a free tier that will take care of that part of the project at the start. If I need to add other capabilities or providers later, that’s fine.

Step 1

The first thing I need is a process that monitors a channel (I’ll start with the microphone on my headset). If there’s no sound, nothing happens. If there’s sound, I get a 15-second file with the date and time as filename. If there’s continuous sound, like if I start reading out of a book out loud, the process creates 15-second files every 15 seconds as long as there’s sound.

Step 2

I need a script to take a directory with some 15-second sound files and send them to the Google speech-to-text API, throttled, if necessary. I’ll save the return payload in the same directory with the same date and time stamp filename as the audio file, but with a txt extension. If there’s no text returned, I’ll trust that the sound clip is static or inaudible and flag the sound file for deletion.

Step 3

Instead of monitoring my headset, I need to plug an external source like a radio into the sound card of my computer and monitor that. I’d like to get a scanner so that I can monitor police/fire/ambulance channels as well as listen to unclaimed frequencies to see if anyone ever transmits on them. Later, I’d like to get a shortwave receiver and pipe that into the listener, because generating English transcripts for foreign-language content sounds interesting.

Step 4

I need a database, probably Postgres, to store the audio files and their related text transcripts. This will let me full text index the transcripts, compare content between channels, and keep track of stats like “word length by channel”, or “channel traffic volume over time”. The long-term goal is to get the machine to listen to long hours of static or silence, then point me to the interesting parts.

Step 5

Scale up. Instead of using a single machine to monitor one channel, get an array of receivers to monitor multiple channels. Pipe all the text into the database and look for relationships between channels.

Maybe Later

Keep a channel-specific concordance for a while, then alert on cases where a word is used for the first time on a channel.
Instead of running this all locally, put all of this on hosted online services.
Build small Arduino / Raspberry Pi receivers that record and store local point source radio.
Instead of discarding sound files without transcript content, listen to them to look for patterns.