I have something sort of similar to this that uses Whisper and ESP32's to control things in my self-hosted local-only "smart home" automations.
Archive: https://archive.today/BtEL9
From the post:
>If you’re not worried about corporate surveillance bots scraping your shopping list and manipulating you through marketing, you can buy any number of off-the-shelf smart speakers for your home. Alternatively, you can roll your own like [arpy8] did, and keep your life a little more private.
The build is based around an ESP32 microcontroller. It connects to the ‘net via its inbuilt Wi-Fi connection, and listens out for your voice with an INMP441 omnidirectional microphone module. The audio data is trucked off to a backend server running a Whisper speech-to-text model. The text is then passed to Google’s Gemini 2.5 Flash large language model. The response generated is passed to the Piper Neural Voice text-to-speech engine, sent back to the ESP32, and spat out via the device’s DAC output and a speaker attached to an LM386 amplifier. Basically, anything you could ask Gemini, you can do with this device.
I have something sort of similar to this that uses Whisper and ESP32's to control things in my self-hosted local-only "smart home" automations.
Archive: https://archive.today/BtEL9
From the post:
>>If you’re not worried about corporate surveillance bots scraping your shopping list and manipulating you through marketing, you can buy any number of off-the-shelf smart speakers for your home. Alternatively, you can roll your own like [arpy8] did, and keep your life a little more private.
The build is based around an ESP32 microcontroller. It connects to the ‘net via its inbuilt Wi-Fi connection, and listens out for your voice with an INMP441 omnidirectional microphone module. The audio data is trucked off to a backend server running a Whisper speech-to-text model. The text is then passed to Google’s Gemini 2.5 Flash large language model. The response generated is passed to the Piper Neural Voice text-to-speech engine, sent back to the ESP32, and spat out via the device’s DAC output and a speaker attached to an LM386 amplifier. Basically, anything you could ask Gemini, you can do with this device.
Login or register