Matrix PCM system crash

For days now I’ve been struggling with a crashing program and I think I’ve narrowed it down to the matrix PCM. I’m hoping someone might be able to help a bit.

System: Raspberry Pi 3B+
Matrix Creator
OS: Latest Raspbian Stretch (fully updated and upgraded)
Matrix Kernel and HAL installed.

I’m running a combination of Google Assistant and Snowboy (merged together, both C++), using the ALSA implementation as Google uses it in their C++ example. So no Portaudio or anything.
I have disabled all my threads in my attempt to narrow it down. The only thread I have left is one thread (a lambda function) that gets triggered whenever audio data is available. Normally, this thread would pass the data to Snowboy, but I’ve disabled that to narrow the problem down.

The problem I face is that the system will freeze after “x” time of running. Sometimes hours, sometimes minutes. The last time it froze, I was able to get a trace from /var/log/kern.log:

Aug 20 17:25:00 UnityMain kernel: [  300.263784] INFO: rcu_sched detected stalls on CPUs/tasks:
Aug 20 17:25:00 UnityMain kernel: [  300.263804] 	3-...: (1 GPs behind) idle=84e/140000000000000/0 softirq=4728/4759 fqs=1050 
Aug 20 17:25:00 UnityMain kernel: [  300.263806] 	(detected by 0, t=2102 jiffies, g=1229, c=1228, q=87)
Aug 20 17:25:00 UnityMain kernel: [  300.263817] Sending NMI from CPU 0 to CPUs 3:
Aug 20 17:25:00 UnityMain kernel: [  316.001766] brcmfmac: brcmf_do_escan: error (-110)
Aug 20 17:25:00 UnityMain kernel: [  316.001773] brcmfmac: brcmf_cfg80211_scan: scan error (-110)
Aug 20 17:25:00 UnityMain kernel: [  363.308412] INFO: rcu_sched detected stalls on CPUs/tasks:
Aug 20 17:25:00 UnityMain kernel: [  363.308430] 	3-...: (1 GPs behind) idle=84e/140000000000000/0 softirq=4728/4759 fqs=3704 
Aug 20 17:25:00 UnityMain kernel: [  363.308432] 	(detected by 2, t=8407 jiffies, g=1229, c=1228, q=1269)
Aug 20 17:25:00 UnityMain kernel: [  363.308442] Sending NMI from CPU 2 to CPUs 3:
Aug 20 17:25:00 UnityMain kernel: [  376.007936] brcmfmac: brcmf_do_escan: error (-110)
Aug 20 17:25:00 UnityMain kernel: [  376.007945] brcmfmac: brcmf_cfg80211_scan: scan error (-110)
Aug 20 17:25:00 UnityMain kernel: [  394.297605] NMI backtrace for cpu 3
Aug 20 17:25:00 UnityMain kernel: [  394.297615] CPU: 3 PID: 162 Comm: kworker/u8:2 Tainted: G         C O    4.14.62-v7+ #1134
Aug 20 17:25:00 UnityMain kernel: [  394.297616] Hardware name: BCM2835
Aug 20 17:25:00 UnityMain kernel: [  394.297633] Workqueue: matrixio_pcm matrixio_pcm_capture_work [matrixio_pcm]
Aug 20 17:25:00 UnityMain kernel: [  394.297637] task: b7b58f00 task.stack: b7b7a000
Aug 20 17:25:00 UnityMain kernel: [  394.297645] PC is at __schedule+0x7c0/0x968
Aug 20 17:25:00 UnityMain kernel: [  394.297648] LR is at __schedule+0x7bc/0x968
Aug 20 17:25:00 UnityMain kernel: [  394.297650] pc : [<8079d6c8>]    lr : [<8079d6c4>]    psr: 40000013
Aug 20 17:25:00 UnityMain kernel: [  394.297652] sp : b7b7bd68  ip : 80c04174  fp : b7b7bdc4
Aug 20 17:25:00 UnityMain kernel: [  394.297654] r10: 80b8ed40  r9 : 80b8ed40  r8 : b7b59364
Aug 20 17:25:00 UnityMain kernel: [  394.297656] r7 : 00000001  r6 : b7b58f00  r5 : b7b58f00  r4 : ba382d40
Aug 20 17:25:00 UnityMain kernel: [  394.297659] r3 : 00000000  r2 : 397f4000  r1 : b7b7a000  r0 : 00000000
Aug 20 17:25:00 UnityMain kernel: [  394.297663] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Aug 20 17:25:00 UnityMain kernel: [  394.297665] Control: 10c5383d  Table: 2cde406a  DAC: 00000055
Aug 20 17:25:00 UnityMain kernel: [  394.297668] CPU: 3 PID: 162 Comm: kworker/u8:2 Tainted: G         C O    4.14.62-v7+ #1134
Aug 20 17:25:00 UnityMain kernel: [  394.297670] Hardware name: BCM2835
Aug 20 17:25:00 UnityMain kernel: [  394.297675] Workqueue: matrixio_pcm matrixio_pcm_capture_work [matrixio_pcm]
Aug 20 17:25:00 UnityMain kernel: [  394.297692] [<8010ffd8>] (unwind_backtrace) from [<8010c240>] (show_stack+0x20/0x24)
Aug 20 17:25:00 UnityMain kernel: [  394.297698] [<8010c240>] (show_stack) from [<80787284>] (dump_stack+0xd4/0x118)
Aug 20 17:25:00 UnityMain kernel: [  394.297708] [<80787284>] (dump_stack) from [<80108cd8>] (show_regs+0x1c/0x20)
Aug 20 17:25:00 UnityMain kernel: [  394.297716] [<80108cd8>] (show_regs) from [<8078d0c0>] (nmi_cpu_backtrace+0xc0/0x128)
Aug 20 17:25:00 UnityMain kernel: [  394.297723] [<8078d0c0>] (nmi_cpu_backtrace) from [<8010e2c8>] (handle_IPI+0x12c/0x33c)
Aug 20 17:25:00 UnityMain kernel: [  394.297730] [<8010e2c8>] (handle_IPI) from [<801014d8>] (bcm2836_arm_irqchip_handle_irq+0x7c/0xac)
Aug 20 17:25:00 UnityMain kernel: [  394.297738] [<801014d8>] (bcm2836_arm_irqchip_handle_irq) from [<807a2e7c>] (__irq_svc+0x5c/0x7c)
Aug 20 17:25:00 UnityMain kernel: [  394.297740] Exception stack(0xb7b7bd18 to 0xb7b7bd60)
Aug 20 17:25:00 UnityMain kernel: [  394.297743] bd00:                                                       00000000 b7b7a000
Aug 20 17:25:00 UnityMain kernel: [  394.297747] bd20: 397f4000 00000000 ba382d40 b7b58f00 b7b58f00 00000001 b7b59364 80b8ed40
Aug 20 17:25:00 UnityMain kernel: [  394.297752] bd40: 80b8ed40 b7b7bdc4 80c04174 b7b7bd68 8079d6c4 8079d6c8 40000013 ffffffff
Aug 20 17:25:00 UnityMain kernel: [  394.297759] [<807a2e7c>] (__irq_svc) from [<8079d6c8>] (__schedule+0x7c0/0x968)
Aug 20 17:25:00 UnityMain kernel: [  394.297765] [<8079d6c8>] (__schedule) from [<8079dbbc>] (preempt_schedule_common+0x20/0x30)
Aug 20 17:25:00 UnityMain kernel: [  394.297771] [<8079dbbc>] (preempt_schedule_common) from [<8079dc14>] (_cond_resched+0x48/0x50)
Aug 20 17:25:00 UnityMain kernel: [  394.297777] [<8079dc14>] (_cond_resched) from [<8079fb78>] (mutex_lock+0x1c/0x60)
Aug 20 17:25:00 UnityMain kernel: [  394.297785] [<8079fb78>] (mutex_lock) from [<7f5871a4>] (matrixio_pcm_pointer+0x2c/0x58 [matrixio_pcm])
Aug 20 17:25:00 UnityMain kernel: [  394.297870] [<7f5871a4>] (matrixio_pcm_pointer [matrixio_pcm]) from [<7f4ffc8c>] (soc_pcm_pointer+0x48/0xec [snd_soc_core])
Aug 20 17:25:00 UnityMain kernel: [  394.297945] [<7f4ffc8c>] (soc_pcm_pointer [snd_soc_core]) from [<7f0d3b0c>] (snd_pcm_update_hw_ptr0+0x38/0x3b0 [snd_pcm])
Aug 20 17:25:00 UnityMain kernel: [  394.297993] [<7f0d3b0c>] (snd_pcm_update_hw_ptr0 [snd_pcm]) from [<7f0d3f08>] (snd_pcm_period_elapsed+0x84/0xac [snd_pcm])
Aug 20 17:25:00 UnityMain kernel: [  394.298022] [<7f0d3f08>] (snd_pcm_period_elapsed [snd_pcm]) from [<7f58747c>] (matrixio_pcm_capture_work+0x88/0x90 [matrixio_pcm])
Aug 20 17:25:00 UnityMain kernel: [  394.298032] [<7f58747c>] (matrixio_pcm_capture_work [matrixio_pcm]) from [<801375b8>] (process_one_work+0x158/0x454)
Aug 20 17:25:00 UnityMain kernel: [  394.298040] [<801375b8>] (process_one_work) from [<80137918>] (worker_thread+0x64/0x5b8)
Aug 20 17:25:00 UnityMain kernel: [  394.298048] [<80137918>] (worker_thread) from [<8013d99c>] (kthread+0x13c/0x16c)
Aug 20 17:25:00 UnityMain kernel: [  394.298055] [<8013d99c>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)

It looks like the PCM driver causes a crash here? Any clue to what might cause this? The code for the ALSA handling is just what Google uses in its example:

Interesting @IronClaw, thanks for also sharing the log.

Will try to reproduce and will let you know the results.

Thanks again.

Thanks @yoelrc88. Any help would be much appreciated. If you need more of my code, let me know.

I basically use the google system to add a datalistener:

audio_input.reset(new AudioInputALSA());

  audio_input->AddDataListener(
    [stream, &request](std::shared_ptr<std::vector<unsigned char>> data) {
      request.set_audio_in(&((*data)[0]), data->size());
      stream->Write(request);
  });
  audio_input->AddStopListener([stream]() {
    stream->WritesDone();
  });
  audio_input->Start();

AudioOutputALSA audio_output;
audio_output.Start();

The inside of that lambda-function I replaced with a function to call snowboy (also replaying the unsigned char with an int16_t, since that is what snowboy expects). However, even without calling on snowboy, the crashes occur.
The lambda-funtion gets triggerd via the addlistener, so every time data comes through the ALSA driver, the lambda is triggered.

OK thank , I will try to reproduce it just by used the regular google assistant code.

Did you ever get the crash with the Google Assistant code without your additions?

If I remember correctly, that also crashed after a while… But that is a good one to try again, just to narrow down a bit more.

Tested it with Google, but just for a bit, since continuously sending data to google can get costly…
Crashed after 12 minutes.

It seems totally independent of what I do with the actual data fróm the mic. Just having the mic open for a prolonged time seems to crash.

OK, good info, Thanks @IronClaw.

I will get back to you on this.

Update to make it easier to reproduce…
I’ve made a clean install and a small test program. With this, I think you should be able to recreate my setup.

  • Clean Rasbian Stretch install (Tested on Pi 3B and 3B+ )
  • update & upgrade
  • install Matrix Kernel Modules
  • Install Matrix HAL (maybe not needed for this test, but I need it in the real version, so would like to test with it)
  • Install: sudo apt-get install libasound2-dev (needed for ALSA)

Here is my asound.conf:

pcm.!default {
  type asym
  capture.pcm "mic"
  playback.pcm "speaker"
}

pcm.mic {
   type softvol
   slave {
      pcm "array"
   }
   control {
      name "MicArray Master"
          card 0
   }
}

pcm.speaker {
    type plug
    slave {
       pcm "hw:0,0"
    }
}

pcm.array {
    type plug
    slave {
      pcm "hw:2,0"
    }
}

Here is the test code:

/* Test Code*/
#include <alsa/asoundlib.h>
#include <chrono>
#include <iostream>
#include <vector>
#include <functional>
#include <memory>
#include <mutex>
#include <thread>

int main(){

// For 16000Hz, it's about 0.1 second.
static constexpr int kFramesPerPacket = 1600;
// 1 channel, S16LE, so 2 bytes each frame.
static constexpr int kBytesPerFrame = 2;

// Initialize.
snd_pcm_t* pcm_handle;
int pcm_open_ret = snd_pcm_open(&pcm_handle, "default", SND_PCM_STREAM_CAPTURE, 0);
if (pcm_open_ret < 0) {
  std::cerr << "snd_pcm_open returned " << pcm_open_ret << std::endl;
  return 0;
}
int pcm_nonblock_ret = snd_pcm_nonblock(pcm_handle, SND_PCM_NONBLOCK);
if (pcm_nonblock_ret < 0) {
  std::cerr << "snd_pcm_nonblock returned " << pcm_nonblock_ret << std::endl;
  return 1;
}
snd_pcm_hw_params_t* pcm_params;
int malloc_param_ret = snd_pcm_hw_params_malloc(&pcm_params);
if (malloc_param_ret < 0) {
  std::cerr << "snd_pcm_hw_params_malloc returned " << malloc_param_ret
      << std::endl;
  return 2;
}
snd_pcm_hw_params_any(pcm_handle, pcm_params);
int set_param_ret =
    snd_pcm_hw_params_set_access(pcm_handle, pcm_params, SND_PCM_ACCESS_RW_INTERLEAVED);
if (set_param_ret < 0) {
  std::cerr << "snd_pcm_hw_params_set_access returned " << set_param_ret
      << std::endl;
  return 3;
}
set_param_ret =
    snd_pcm_hw_params_set_format(pcm_handle, pcm_params, SND_PCM_FORMAT_S16_LE);
if (set_param_ret < 0) {
  std::cerr << "snd_pcm_hw_params_set_format returned " << set_param_ret
      << std::endl;
  return 4;
}
set_param_ret =
    snd_pcm_hw_params_set_channels(pcm_handle, pcm_params, 1);
if (set_param_ret < 0) {
  std::cerr << "snd_pcm_hw_params_set_channels returned " << set_param_ret
      << std::endl;
  return 5;
}
unsigned int rate = 16000;
set_param_ret =
    snd_pcm_hw_params_set_rate_near(pcm_handle, pcm_params, &rate, nullptr);
if (set_param_ret < 0) {
  std::cerr << "snd_pcm_hw_params_set_rate_near returned " << set_param_ret
      << std::endl;
  return 6;
}
set_param_ret = snd_pcm_hw_params(pcm_handle, pcm_params);
if (set_param_ret < 0) {
  std::cerr << "snd_pcm_hw_params returned " << set_param_ret << std::endl;
  return 7;
}
snd_pcm_hw_params_free(pcm_params);


// MAIN LOOP
int keeprun = 1;
while (keeprun == 1) {
  std::shared_ptr<std::vector<int16_t>> audio_data(
        new std::vector<int16_t>(kFramesPerPacket * kBytesPerFrame));
  int pcm_read_ret = snd_pcm_readi(pcm_handle, &(*audio_data.get())[0], kFramesPerPacket);
  
  if (pcm_read_ret == -EAGAIN) {
    std::this_thread::sleep_for(std::chrono::milliseconds(20));
  } else if (pcm_read_ret < 0) {
    std::cerr << "snd_pcm_readi returned " << pcm_read_ret << std::endl;
    break;
  } else if (pcm_read_ret > 0) {
    audio_data->resize(kBytesPerFrame * pcm_read_ret);
    std::cout << "Audio data: " << audio_data->at(0) << " size: " << audio_data->size() << std::endl;
    }
  }

// Finalize.
snd_pcm_close(pcm_handle);
}

And my makefile (use gcc / g++ to compile)

# Test Makefile for PCM test 

PROTOC ?= protoc
PROTOC_PLUGINS_PATH ?= /usr/local/bin

HOST_SYSTEM = $(shell uname | cut -f 1 -d_)
SYSTEM ?= $(HOST_SYSTEM)

PKG_CONFIG ?= pkg-config
HAS_PKG_CONFIG ?= $(shell command -v $(PKG_CONFIG) >/dev/null 2>&1 && echo true || echo false)
ifeq ($(HAS_PKG_CONFIG),true)
ALSA_CFLAGS=`pkg-config --cflags alsa`
ALSA_LDFLAGS=`pkg-config --libs alsa`
else
ALSA_CFLAGS ?=
ALSA_LDFLAGS ?=
endif

CXXFLAGS += -std=c++11 

LDFLAGS += -lpthread -lasound -lmatrix_creator_hal

AUDIO_SRCS =
ifeq ($(SYSTEM),Linux)
CXXFLAGS += $(ALSA_CFLAGS)
LDFLAGS += $(ALSA_LDFLAGS)
endif

LDFLAGS +=  -L/usr/lib/matrix_creator_hal

CORE_SRCS = ./test.cc

TEST_O = $(CORE_SRCS:.cc=.o)

.PHONY: all
all: test

.PHONY: test
test: test

test: $(TEST_O)
	$(CXX) $^ $(LDFLAGS) -o $@

.PHONY: clean
clean:
	rm -f test 

You might have to install some dependencies:

sudo apt-get install autoconf automake libtool build-essential curl unzip
sudo apt-get install libasound2-dev
sudo apt-get install libcurl4-openssl-dev

Then simply run with ./test
You will see the raw audio (int) data on screen, just to see the mic is working. (Values will get bigger when more sound comes in)

In my case, the whole system will freeze anywhere between a few minutes to a few hours, but I’ve never had it run for more than 10 hours without crashing.

1 Like

@yoelrc88, @andres.calderon, any updates on this? Would be good to know if you were at least able to reproduce this, or if I’m possibly looking at faulty hardware or something wrong at my end :slight_smile:

Still no updates on this, which is a bit sad to be honest.

I’ve ordered a Respeaker (Mic Array V2). Plug and play, had it up and running in half an hour and it runs fine with my test code. Just ran it for 48 hours, no crashes.

So, this is the end of the matrix-road for me. At least for now. I’ve been trying to get it to work for nearly a year now. And yes, much has improved and I still think it’s a product with great potential. If I have to replace all the stuff the Creator offers (Zwave, Zigbee, etc), I will be off more expensive and it will also take me time. But a year down the road and still no bugfree kernel modules on the only platform the product is supposed to work on? That’s just too much.

The mic array is one of the most important parts of the system. It baffles me that I hunt down a bug, even provide the means to reproduce it and I get no reply here. It would have taken less than 20 minutes to test it and at least confirm to me that the bug can be acknowledged and that it would be fixed.

I’m a developer myself. I know stuff takes time, I understand that you’ve built quite a complicated product. But your product pages should clearly state that this is a beta-product in an unfinished state, because at the moment it’s a lot of money for something that is for many people hardly useable.

So. My creator will go back in its box. Quietly awaiting if it ever becomes usefull. My plan was to build 5 more devices with creators. I hope you can bring proper updates before the main system is finished, else it’ll be 5 Respeakers instead.