Discussion:
[Kde-accessibility] Simon Question
Jessica Horst
2014-03-20 13:51:09 UTC
Permalink
Dear Simon team,

I am a faculty member at the University of Sussex and planing a research
study. I would like to know if Simon is the right software to use for my
study.

I study child word learning. I am planning to record an adult and child
talking about objects. (The adult will be a lab member and know not to speak
at the same time as the child and there will be as little background noise
as possible.) The child will be about 3 years old, but I can go up to 4
years if that will be better for the software. I would like to train the
software on the adult input to the child (what the adult said) and then give
it the child speech. I would like an index of how similar the child speech
was to the adult speech. For example, if the adult is teaching the child the
word ³apple² and says ³apple² 12 times, when the child finally says ³apple²
how similar is that word to the adult speech the child heard?

My colleague told me that speech recognition software works by having a
threshold of similarity. For example, when I tell my mobile phone ³call
home² the software compares what I said to what I have said before and if it
is similar enough (above threshold) it will recognise my speech. I¹m hopeful
that I could use the same kind of principle here (how similar is the child¹s
speech to the adult speech (what was said before), but I would want a
numerical value instead of just knowing if it was above or below threshold.

Can Simon handle child and adult speech in this way?

Thank you for your time.
~Jessica

********************************
Dr. Jessica S. Horst
Senior Lecturer in Psychology

University of Sussex
School of Psychology
Brighton BN1 9QH
United Kingdom

Email: ***@sussex.ac.uk
Tel: +44 (0)1273 87 3084
Lab: http://www.sussex.ac.uk/wordlab
Peter Grasch
2014-03-23 13:48:07 UTC
Permalink
Hello Jessica,

nice to meet you.
Post by Jessica Horst
My colleague told me that speech recognition software works by having a
threshold of similarity. For example, when I tell my mobile phone ³call
home² the software compares what I said to what I have said before and if it
is similar enough (above threshold) it will recognise my speech. I¹m
hopeful that I could use the same kind of principle here (how similar is
the child¹s speech to the adult speech (what was said before), but I would
want a numerical value instead of just knowing if it was above or below
threshold.
I am sorry to say but you have been slightly misinformed. In practice the
process is slightly different.
(Disclaimer: the following explanations contains a few simplifications)
The decoding produces the most likely path through the space of alternatives
(allowed sentences, if you are doing grammar based decoding). The question
answered by the decoding is: Given the observations (recording), which of the
possibilities (sentences) is the most likely?
To determine the most likely candidate, there is an internal scoring process
but these scores are entirely relative to each other and not compared to a
fixed threshold. Most decoders implement some form of confidence scoring,
telling you how confident the system is in it's results, but these scores will
likely not be what you want because differences that appear substantial to the
human ear will not necessarily have a big impact on the confidence score and
the other way around.

Depending on your use case a dedicated classifier will probably yield better
results. What exactly do you want to do?

Best regards,
Peter

Loading...