How can I expand the pocketsphinx demo

Hi there,

I just got my matrix creator board today and I got all the demo applications to work, including the pocketsphinx demo.

However, even after looking at some of the tutorials for pocketsphinx out there, I’m still having trouble understanding how it actually works. Specifically I’d like to understand how the matching between the keywords in the text file are matching to the actual commands that are run.

I want to be able to control my homeautomation system with the matrix creator, but I first need to understand how pocketsphinx actually works and how I get to match the spoken/recognized words to the actual commands that are executed.

If anyone can shed some light on that, I would appreciate it.

Thanks

Nobody that can help me out here?

Hi Takeshi,

You can check the source code of PocketSphinx demo to get an idea how it is actually working: https://github.com/matrix-io/matrix-creator-hal/blob/av/pocketsphinx_demo/demos/pocketsphinx_demo.cpp

Here are the key items you may want to check if you want to modify to make your own command and implementation:

1- Initialization of PocketSphinx with the proper dictionnary and keywords

  int main(int argc, char *argv[]) {
  char const *cfg;
  config = cmd_ln_parse_r(NULL, cont_args_def, argc, argv, TRUE);
  /* Handle argument file as -argfile. */
  if (config && (cfg = cmd_ln_str_r(config, "-argfile")) != NULL) {
    config = cmd_ln_parse_file_r(config, cont_args_def, cfg, FALSE);
  }
  if (config == NULL || (cmd_ln_str_r(config, "-infile") == NULL &&
                         cmd_ln_boolean_r(config, "-inmic") == FALSE)) {
    E_INFO("Specify '-infile <file.wav>' to recognize from file or '-inmic "
           "yes' to recognize from microphone.\n");
    cmd_ln_free_r(config);
    return 1;
  }
  ps_default_search_args(config);
  ps = ps_init(config);
  if (ps == NULL) {
    cmd_ln_free_r(config);
    return 1;
  }

2- Perform actual Voice Recognition

  • If you are doing recognition from microphone, you can find the dedicated code in the procedure recognize_from_microphone()
  • You are unlikely to need to change this part.
  • The detection of an utterance is done in the below code:
  hyp = ps_get_hyp(ps, NULL);
  if (hyp != NULL) {
    process_rules(hyp);
    fflush(stdout);

3- Utterance is matched to dictionary and a specific command is started

  • You can check the code in the procedure: process_rules(char const *hyp)
  • For instance to start everloop when you say “MATRIX EVERLOOP” you con find it in the long list of if:
cmd_ever = "./everloop_demo &";
...
if (std::strcmp(hyp, "MATRIX EVERLOOP") == 0)
system(cmd_ever.c_str());

You definitely want to change this procedure with your own commands after you change the dictionary in language model in Pocket Sphinx.

Good luck!