Matrix Voice ESP32 MQTT Audio Streamer

Thank you. This was very helpful.

I was able to get the matrix voice working with rhasspy + home assistant via MQTT using rhasspy wake word or the on-device wake word, they both work. However, there is a very long delay between when i speak, and when the action occurs. The wake word works almost immediately, and the lights on the matrix voice turn green, but they stay green for 15 - 20 seconds (and the i can see constant voice data passing over MQTT) then finally the action happens in home assistant, and the lights go back to blue. In your demo it appeared to happen very quickly.

Anyway, this may be a rhasspy problem, or a problem with my setup and i will continue to troubleshoot. If anyone has any ideas let me know, i’ll post back if i’m able to solve the issue.

Check your Rhasspy logs, I had something similar and that appeared to be some timeout on the silence detection.

When I restarted Rhasspy, I never had the issue again however.

Hi @Romkabouter,

Does your solution support sending an audio stream via websockets? If not, would it be complicated to add such support? I have a docker image with Kaldi + websockets server. So wondering if I can reuse your code for this.

Thanks

1 Like

Hi there,

The streamer currently only send audio messages over MQTT, your idea is good so I will go and investigate it :slight_smile:
I have little experience with websockets :wink:

2 Likes

Hi Romkabouter,
I have successfully installed the example using this guide using PlatformIO.
However, when I try to upload the MQTT Audio Streamer it fails. The serial monitor outputs the following:


Any suggestions?
Thanks

1 Like

I don’t think you are using my repo right?
Since I have no updater.cpp.

But the main issue is that the guide has some partitions settings which are too low for the streamer.
That is because there is more functionality and the bin file is bigger.
I suggest resetting the Matrix Voice and use the partitions from the Streamer repo.

Let me know if you need some guidance

1 Like

Thank you for your reply.

You are correct, per your suggestion I reset the Matrix and tried steps outlined here however I am having problems with the setup directions (I am new to VS Code and PlatformIO which may be the biggest problem).

Setup:

VS Code (Windows 10) with PlatformIO

RPI-3b with Raspbian Buster Lite

When I run deploy.sh from a VS Code terminal I get the following error:

VS Code Explorer does not show the OTABuilder folder (the OTABuilder folder exists, the build folder does not exist). To open the project I have select the PlatformIO folder where Platformio.ini resides.

Thank you for your assistance and patience.

Ah, I see you did not build the OTA but the same bootloader.bin can be found in the MatrixVoiceAudioServer folder.
You can change the path in the deploy script or create folders and copy the bootloader bin as specified in the script.
That is a bit sloppy, sorry for that.

I have the Matrix setup, and everything is working except the sound output (I have a pair of speakers plugged into the jack.

I have tried both {“amp_output”:“0”} and {“amp_output”:“1”} with the same results.

The serial monitor outputs “Play Audio” for each playBytes received.

MQTT monitor:

Thanks!

image001.jpg

Well, I see the playFinished message as well, so the audio is being played somewhere.
It corresponds with the last playBytes message in this case.

The problem here is that you have attached the Matrix Voice to your Pi, I can derive that from the fact that you mention that the serial monitor outputs “Play Audio”.
When the Matrix Voice is attached, the jack will not work (I do not know why)

Please try removing it from the Pi and power it via USB cable.

For jack, use {“amp_output”:“1”}. (Actually anything other than “0” will do)

Removing it from the Pi fixed it.

Thank you very much!

image001.jpg

1 Like

The Audio Streamer performs very well with Rhasspy except the jack audio output is very low and has crackling noise. I have tried several different speakers with the same noise on all of them. Any suggestions?
Thanks.

You mean a crackling noise when playing audio or a constant noise?

Crackling noise when playing audio.

image001.jpg

Ah ok, it kinda depends on your network connection as well.
Also, if the audio is not 44100 16 bit, the code needs to resample. That also leads to some quality issues.
It is not the fastest hard,- and software for high quality audio streaming.

Hi skorol,

I was busy with implementing websockets.
But I am not sure on how to proceed, I can send audio to the server, but is there some other tool to capture the results?
The Matrix Voice only handles the output of the audio, should it somehow respond to the result as well? What was your idea on this?

1 Like

Hi @Romkabouter,

You can test it with the following web socket server: https://github.com/alphacep/vosk-server/blob/master/websocket/asr_server.py It’s based on Kaldi and requires python3. Models for different languages could be downloaded here: https://alphacephei.com/vosk/models. Just unzip the archive and specify a path to your model via env var or directly here: https://github.com/alphacep/vosk-server/blob/master/websocket/asr_server.py#L21.

There’s a docker image as well. By default it’s packed with a Russian model. So if you want to use your own, you need to rebuild it from their dockerfiles.

Python code is small and pretty straightforward. But let me know if you have any questions.

Regarding the output: vosk-server returns json with intermediate and final results. You can see the detailed output by running different code snippets from https://github.com/alphacep/vosk-api/tree/master/python/example (yes, it might be executed without a server; just install vosk PyPi package).

Anyway, they have a good documentation: https://alphacephei.com/vosk/

Also note that it’s fully open source.

Yes, I have been testing with the docker, because pip3 install vosk did not work (no package)
But, the server accepts data and then sends it back. This worked ok with the docker (kaldi-en) and the test.py. I got output so I know it works.

So, if the Matrix Voice sends data to the socket, it should also accept the response coming from the vosk-server.
My question to you: What should the Matrix Voice do with the response coming from the server?
It can not just be a remote microphone, unless you change the vosk server to broadcast the results.

I have tried with a small webpage connected to the same server, but running test.py only returned results to the test.py socket and not the to “standalone” socket.

I use Matrix Voice to control IoT devices. If we receive a transcribe directly in response from web socket server, we can remove additional mqtt logic and immediately take control over sensors connected to Matrix Voice, e.g. RF/IR modules.

Regarding vosk package: if you tried it on arm architecture, then yes, it’s missing and should be built manually.

I am running it on amd64, but the docker is fine :slight_smile:

So, regarding the response, what is your suggestion? You have the Matrix Voice running stand alone controlling IoT deveices? If so, how?
Also, I do not know how to control those sensors via the esp32 and GPIO pins.

I will first try and find out how to stream audio over a websocket to a simple server, that way I can make sure that part works.
We can move on from there.