Matrix Voice ESP32 MQTT Audio Streamer

I don’t think you are using my repo right?
Since I have no updater.cpp.

But the main issue is that the guide has some partitions settings which are too low for the streamer.
That is because there is more functionality and the bin file is bigger.
I suggest resetting the Matrix Voice and use the partitions from the Streamer repo.

Let me know if you need some guidance

1 Like

Thank you for your reply.

You are correct, per your suggestion I reset the Matrix and tried steps outlined here however I am having problems with the setup directions (I am new to VS Code and PlatformIO which may be the biggest problem).

Setup:

VS Code (Windows 10) with PlatformIO

RPI-3b with Raspbian Buster Lite

When I run deploy.sh from a VS Code terminal I get the following error:

VS Code Explorer does not show the OTABuilder folder (the OTABuilder folder exists, the build folder does not exist). To open the project I have select the PlatformIO folder where Platformio.ini resides.

Thank you for your assistance and patience.

Ah, I see you did not build the OTA but the same bootloader.bin can be found in the MatrixVoiceAudioServer folder.
You can change the path in the deploy script or create folders and copy the bootloader bin as specified in the script.
That is a bit sloppy, sorry for that.

I have the Matrix setup, and everything is working except the sound output (I have a pair of speakers plugged into the jack.

I have tried both {“amp_output”:“0”} and {“amp_output”:“1”} with the same results.

The serial monitor outputs “Play Audio” for each playBytes received.

MQTT monitor:

Thanks!

image001.jpg

Well, I see the playFinished message as well, so the audio is being played somewhere.
It corresponds with the last playBytes message in this case.

The problem here is that you have attached the Matrix Voice to your Pi, I can derive that from the fact that you mention that the serial monitor outputs “Play Audio”.
When the Matrix Voice is attached, the jack will not work (I do not know why)

Please try removing it from the Pi and power it via USB cable.

For jack, use {“amp_output”:“1”}. (Actually anything other than “0” will do)

Removing it from the Pi fixed it.

Thank you very much!

image001.jpg

1 Like

The Audio Streamer performs very well with Rhasspy except the jack audio output is very low and has crackling noise. I have tried several different speakers with the same noise on all of them. Any suggestions?
Thanks.

You mean a crackling noise when playing audio or a constant noise?

Crackling noise when playing audio.

image001.jpg

Ah ok, it kinda depends on your network connection as well.
Also, if the audio is not 44100 16 bit, the code needs to resample. That also leads to some quality issues.
It is not the fastest hard,- and software for high quality audio streaming.

Hi skorol,

I was busy with implementing websockets.
But I am not sure on how to proceed, I can send audio to the server, but is there some other tool to capture the results?
The Matrix Voice only handles the output of the audio, should it somehow respond to the result as well? What was your idea on this?

1 Like

Hi @Romkabouter,

You can test it with the following web socket server: https://github.com/alphacep/vosk-server/blob/master/websocket/asr_server.py It’s based on Kaldi and requires python3. Models for different languages could be downloaded here: https://alphacephei.com/vosk/models. Just unzip the archive and specify a path to your model via env var or directly here: https://github.com/alphacep/vosk-server/blob/master/websocket/asr_server.py#L21.

There’s a docker image as well. By default it’s packed with a Russian model. So if you want to use your own, you need to rebuild it from their dockerfiles.

Python code is small and pretty straightforward. But let me know if you have any questions.

Regarding the output: vosk-server returns json with intermediate and final results. You can see the detailed output by running different code snippets from https://github.com/alphacep/vosk-api/tree/master/python/example (yes, it might be executed without a server; just install vosk PyPi package).

Anyway, they have a good documentation: https://alphacephei.com/vosk/

Also note that it’s fully open source.

Yes, I have been testing with the docker, because pip3 install vosk did not work (no package)
But, the server accepts data and then sends it back. This worked ok with the docker (kaldi-en) and the test.py. I got output so I know it works.

So, if the Matrix Voice sends data to the socket, it should also accept the response coming from the vosk-server.
My question to you: What should the Matrix Voice do with the response coming from the server?
It can not just be a remote microphone, unless you change the vosk server to broadcast the results.

I have tried with a small webpage connected to the same server, but running test.py only returned results to the test.py socket and not the to “standalone” socket.

I use Matrix Voice to control IoT devices. If we receive a transcribe directly in response from web socket server, we can remove additional mqtt logic and immediately take control over sensors connected to Matrix Voice, e.g. RF/IR modules.

Regarding vosk package: if you tried it on arm architecture, then yes, it’s missing and should be built manually.

I am running it on amd64, but the docker is fine :slight_smile:

So, regarding the response, what is your suggestion? You have the Matrix Voice running stand alone controlling IoT deveices? If so, how?
Also, I do not know how to control those sensors via the esp32 and GPIO pins.

I will first try and find out how to stream audio over a websocket to a simple server, that way I can make sure that part works.
We can move on from there.

I used the following official pinout to connect RF sensor to esp32:

IO12, 5V, GND works fine with the following modules:

But I’d also buy/make an antenna for transmitter to increase distance.

As an RF-control library I used https://github.com/sui77/rc-switch

In regards to lights control, I bought the following RF-switch and controller: https://aliexpress.com/item/33004861185.html?spm=a2g0s.9042311.0.0.19a84c4dwmywdV

This setup allows me to control lights via hands and voice.

1 Like

Looks nice, but I will leave the coding for that to you since I do not have those devices.

What I want to do is send audio and capture the response. I will try to find a way so that it can be adapted to fit your needs.

1 Like

Hi Romkabouter,

Thanks for creating this, it’s so cool!

I posted here basically I think my question is does the MQTT Audio Streamer code allow a platform like Home Assistant to manipulate the Everloop LED’s through MQTT or something?

Thanks.
Richard

Sure it can, the code accepts already a variety of message (one of them is the topic /everloop.
In there, you can add code to handle messages for updating the leds. Currently you can change some colors.

There is already an everloopTask, which probably needs adjustments too.

2 Likes

Hi there,

I have committed some changes to https://github.com/Romkabouter/Matrix-Voice-ESP32-MQTT-Audio-Streamer/tree/websockets

Not finished or anything, but I have succesfully made a connection with this server:

running on my local Mac, if have change the index.js to use this library:

This is the code:

#!/usr/bin/env node
var WebSocketServer = require('websocket').server;
var http = require('http');
const Speaker = require('speaker');
 
var server = http.createServer(function(request, response) {
    console.log((new Date()) + ' Received request for ' + request.url);
    response.writeHead(404);
    response.end();
});
server.listen(2700, function() {
    console.log((new Date()) + ' Server is listening on port 2700');
});
 
wsServer = new WebSocketServer({
    httpServer: server,
    // You should not use autoAcceptConnections for production 
    // applications, as it defeats all standard cross-origin protection 
    // facilities built into the protocol and the browser.  You should 
    // *always* verify the connection's origin and decide whether or not 
    // to accept it. 
    autoAcceptConnections: false
});

// Create the Speaker instance
const speaker = new Speaker({
    channels: 1,          // 2 channels
    bitDepth: 16,         // 16-bit samples
    sampleRate: 16000     // 44,100 Hz sample rate
  });
 
function originIsAllowed(origin) {
  // put logic here to detect whether the specified origin is allowed. 
  return true;
}
 
wsServer.on('request', function(request) {
	
    if (!originIsAllowed(request.origin)) {
      // Make sure we only accept requests from an allowed origin 
      request.reject();
      console.log((new Date()) + ' Connection from origin ' + request.origin + ' rejected.');
      return;
    }
    
    var connection = request.accept('arduino', request.origin);
    console.log((new Date()) + ' Connection accepted.');
    
	connection.on('message', function(message) {
        if (message.type === 'utf8') {
            console.log('Received Message: ' + message.utf8Data);
           // connection.sendUTF(message.utf8Data);
        }
        else if (message.type === 'binary') {
            console.log('Received Binary Message of ' + message.binaryData.length + ' bytes');
           //connection.sendBytes(message.binaryData);
           speaker.write(message.binaryData);
        }
    });
    
	connection.on('close', function(reasonCode, description) {
        console.log((new Date()) + ' Peer ' + connection.remoteAddress + ' disconnected.');
    });
	
	connection.sendUTF("Hallo Client!");
});

Is outputs the binary message to the speaker of my Mac. That works, because I hear to sounds echoing :slight_smile:

After a short while however, it stops sending data. No idea why yet.
I get tons of “…/deps/mpg123/src/output/coreaudio.c:81] warning: Didn’t have any audio data in callback (buffer underflow)” errors

But still, it is a start :slight_smile:

1 Like