Re: [dev] What would a well-designed voice assistant program look like?

From: Laslo Hunhold <>
Date: Thu, 22 Mar 2018 13:59:17 +0100

On Thu, 22 Mar 2018 08:33:07 -0400
LM <> wrote:

Hey Laura,

> Was at a conference where they were discussing the new voice assistant
> applications from large corporations like Amazon and Google. I began
> thinking there had to be some Open Source alternatives available. I
> found several projects such as the ones listed below. However, most
> of the Open Source projects I found were written in Python or
> JavaScript or Lua with a few C and/or C++ libraries thrown in for good
> measure.
> So what would a voice assistant program using style
> principles look like? Would love to hear some opinions. Are there
> any projects out there trying to implement one or any interest in
> doing so?

from the start, it's not about writing a "voice assistant program".
When reflecting about a problem, it should be a priority to think about
how it can be split up. In this case, we can talk about two problems,
which combined, solve the "voice assistant" problem.

   1) voice recognition
   2) assistant

Nowadays, people throw neural networks at anything that moves, and for
voice recognition, it works pretty well (see Google). It's less about
the implementation details, but big datasets you can train your
networks with. Google uses YouTube-videos as training data and it's
almost creepy to see how well they do it now. From what I've heard,
they reached superhuman capabilities in understanding voice in 99% of
the cases a few months ago.
We can assume this problem to be more or less solved and the problem
domain is easy to sketch. We can expect this module to output us a
string which resembles the spoken request reliably.

A much much bigger problem is the assistant itself, because the
complexity is nearly unlimited. Do you want it to be an assistant for
normal people, for truck drivers, for jet pilots, for astronauts, or
something completely different?
Everyone has different criteria for what he'd like to see in an
assistant. I'd probably never tell my assistant to write a dictated
e-mail, as it will be much faster to do by hand. Come to think of it, I
might be the wrong person to be asked about the need for assistants.

The big thing to break here is the A.I. problem, which is in my opinion
equivalent to the problem 2) stated above. Google's assistant manages
to be so good because it is fed a lot of data to make up for the lack
of intelligence. And again, as with voice recognition, the more
training data you have the better your A.I. behaves, except when it
doesn't (think of adversarial counter examples, which are easy to
The appearance of intelligence is largely founded on the fact that we
both as "end-users" don't have access to the structure of the neuronal
network (which would be helpful to construct counter examples) and in
fact do not "stress" the network enough to reach edge-cases where it
doesn't work. Companies like Google especially avoid the second point
more or less by scaling up the training data and making the network
more complex.
There is no general mathematical theory on intelligence, and neuronal
networks are very unstable due to multiple reasons I will not explain

It is in fact also my current field of research to develop more robust
methods for neuronal networks. If you ask me, the current approach is a
dead-end and if we don't deploy more robust mathematical methods to
model such networks, we'll have big problems in the long run.

> A few things I'd personally like to see in such a project are as
> follows: Written primarily in C for easy portability and efficiency.
> When possible, made up of simple components (libraries, programs,
> etc.) that do one thing well.
> Can be built from source by one person. (Meaning you don't need to
> rely on a bunch of precompiled libraries from another source just to
> get it to build.)
> Should have the option to be able to use it without Internet
> connection if there are privacy or connectivity concerns.

You'll not get around having to rely on a pre-trained neuronal network
unless you manage to formalize speech (partially done) or intelligence
(currently not done and probably impossible).
In this regard, personal assistants will by definition be bound to
centralized services, and if you ever think about solving it in open
source, it will have to be distributed like Diaspora or something
comparable. If you design it as a "single instance" from the get-go, it
will fail in the long run.

Also, as much of a C-fan I am, the language here doesn't matter, and
for a distributed network project, a language like Go makes more sense.

I hope this could give you some insight.

With best regards

Laslo Hunhold

Laslo Hunhold <>
Received on Thu Mar 22 2018 - 13:59:17 CET

This archive was generated by hypermail 2.3.0 : Thu Mar 22 2018 - 14:12:07 CET