[Solved] Connecting WIFI and Recording with ESP32

So are there any demos or examples of how to connect to WiFi and/or recording audio using just the ESP32. In my project I’d like to record frames of PCM data and transmit the audio over mqtt essentially using it as a remote microphone, but don’t see anythin about actually reading the pcm data from the mic array.

You can find an example reading the mics from the ESP32 here.

Also, here is the guide on Hackster to setup your development environment and program the ESP32 on the MATRIX Voice.

espressif has further examples on how to connect wifi / bluetooth and some example protocols, one is for aws-iot and it seems using mqtt

To properly be self-sufficient programming the ESP32, one needs to see the interconnections between the ESP32 connects to the Voice, not just the Expressif pinouts. Where can that be found?

with my limited knowledge about the voice and esp32 programming, i think the optional esp is only communication with the voice over spi. So to use the voice components from the ESP32 the hal-esp should be enough to start with. bluetooth and wifi are build in the esp32 chip, so this should be standard.

but i agree, a more detailed hardware diagramm for both the voice and creator would be nice.


Hi there,

I am trying to achieve this as well, but a bit struggling with samples and such.
Have you finished a project with this?

I can send sound over to the MQTT broker, but I want to use is as an Audio Server for Snips.
I have got the buffer size and all correct, the only thing left is that is’t not good sound but just noise…

Here’s the mainloop, I am using this library: https://github.com/256dpi/esp-mqtt.
I know the code is not really well written now, but I am first trying to make it work.
The header for the wave is hardcoded and other values as well, but running this publishes exactly the same as the Snips Audio Server to the mqtt broker. Only, it’s noise so I am doing something terribly wrong with sampling rate (set to 16000) or other stuff. If needed I can paste the whole, code, including all the functions. A lot is copied from the example applications for now

void cpp_loop(void)

    ESP_ERROR_CHECK(esp_event_loop_init(event_handler, NULL));

    esp_mqtt_init(mqtt_status_cb, mqtt_message_cb, 1000, 2000);
    hal::WishboneBus wb;


    //setup everloop
    hal::Everloop everloop;
    hal::EverloopImage image1d;
    //setup mics
    mics.CalculateDelays(0, 0, 1000, 320 * 1000);  

    char header[44] = {82,73,70,70,36,2,0,0,87,65,86,69,102,109,116,32,16,0,0,0,1,0,1,0,128,62,0,0,0,125,0,0,2,0,16,0,100,97,116,97,0,2,0,0};

    char input[556];

    for (int i=0;i<44;++i){
    //Set lastRead to the current time
    auto last_read = esp_timer_get_time();
    bool send = true;
    while (send) {
      if (connected) {
        //Get the current time
        auto now = esp_timer_get_time();
        //If the current time - lastRead is greater than RATE/CHUNK microseconds we need to take a sample
        //multiplied by 10 to be sure we have a whole number NumberOfSamples = 256
//        if ( (now - last_read)*10 > (RATE / mics.NumberOfSamples())*10) {
            last_read = now;
            //NumberOfSamples() = kMicarrayBufferSize / kMicrophoneChannels = 4096 / 8 = 512
            //After every Read, we have 512 samples
            uint32_t index = 0;
            //send first part   
            for (uint32_t s = 0; s < 512; s++) {
                //0 .. 127
                index = 44+s; //44+0 .. 44+127
                input[index] = mics.Beam(s);
            esp_mqtt_publish("hermes/audioServer/voice/audioFrame", (uint8_t *)input, 556,0, false);
//        }

Seeing this I realize is still have same buggy leftovers, but I made so much rewrite I was a bit lost :wink:

Small victory update!

I have managed to record a correct soundfile from the mqtt broker!
Below is the small piece of code which send the raw data to the broker:

    uint16_t buffer[256];

    while (true) {
      if (connected) {
        //NumberOfSamples() = kMicarrayBufferSize / kMicrophoneChannels = 2048 / 8 = 256
        for (uint32_t s = 0; s < mics.NumberOfSamples(); s++) {
            buffer[s] = mics.Beam(s);
        esp_mqtt_publish("hermes/audioServer/voice/audioFrame", (uint8_t *)buffer, sizeof(buffer),0, false);

The buffer which is send, had 512 bytes of raw data.
I record it with a mqtt client, on message I put the payload in a buffer and after a test of 3 seconds the data is saved as a wav. This wave file is good quality!

The only thing left now is to put a wave header on every message, so every message is a small wave file of 556 bytes long. Must be doable

1 Like

Great job, maybe, for performance, it will make more sense to convert or add headers on server side. or you write your own mqtt which is doing this.

I think I will create an option on the config, raw data or with a wave header.
With a wave header, you have no extra configuration for Snips.
Without, you can use is more versatile and do with the raw audio as you please.

I have successfully created the audio server now, see my video.

I need to work on my english pronunciation …

1 Like

Fantastic! Let us know when you get some code up. Hopefully my matrix will accept a flash again…

Yes, code is now a big mess.
I will post it on my repo when I’ve done my housekeeping. I am not a c++ developer, so things can probably improved a lot :slight_smile:

Here is the first version of the code: