[Kde-accessibility] KSpeech

Discussion:

Jeremy Whiting

2014-03-06 06:04:12 UTC

Took a quick read through that just now and it looks pretty promising
from what I saw. I guess I don't know my way around gerrit very well
because I couldn't see a place to comment on the code like
reviewboard.
Really the only difference between jovie and that class are the following:
1. jovie has some old code and ui to control jobs at a fine grain that
spd doesn't expose really well, so I left it out when I ported ktts to
spd.
2. user defined filters with some sane/useful defaults (if we were to
use QtSpeech for kde notifications, set konvi to speak all messages,
there's not a way to let the user say change "<jpwhiting> fregl: you
rock" into "jpwhiting says fregl you rock")
3. user configurability (As a user I can't set up which voice I would
like all speech-using applications to use)
4. dbus, though this isn't as important if each application that uses
speech links to the library and speech-dispatcher or the system apis
do the async for us already anyway as you said.

Items 1 and 4 will be irrelevant in a KF5 world but I'm wondering how
2 and 3 could be added either to qtspeech itself or as a kspeech
library that wraps qtspeech for kde applications to use.

Any thoughts on that? I would be pretty interested in helping with
qtspeech if it greatly simplifies or even deprecates jovie as it looks
like it could do possibly.
Jeremy

Hello all, I've realized a bit ago that kspeech was not included in
the kdelibs split (probably because it was in staging at the time and
didn't conform to the other framework policies yet). I've cleaned it
up a bit and put it in my scratch space, but have some architectural
questions about it before I make it a proper framework.
1. The KSpeech dbus interface is old and showing its age. Many of the
methods are no longer implemented in the application itself since it
was ported to speech-dispatcher. One thing I would definitely like to
do is clean up/remove methods that aren't implemented currently (and
possibly re add some later on if speech-dispatcher gets better/more
support for job control, etc.) So the question about this is is KF5
time a good time to drop/clean up the dbus interface?
2. The KSpeech interface that was in kdelibs/interfaces is just that a
dbus interface only. I would like to make it a proper
library/framework with a QObject based class for talking to Jovie (the
application that implements the KSpeech dbus interface) and wonder if
other things such as what's currently in jovie/libkttsd should be in
the kspeech library also. If I move code from jovie into libkspeech
(or merge kspeech interface into libkttsd and make libkttsd a
framework likely renamed to libkspeech since libkttsd isn't a public
library anyway and has the old ktts name) what's the best way to
preserve the history of both the kspeech interface and libkttsd
sources. Didn't the plasma or kde-workspaces split do something fancy
with git where old history pointed to the old git repo somehow?
Along with this, if libkspeech is defining the kspeech dbus interface
and has a class to talk to that interface, does the interface still
need to be in servicetypes like the dbustexttospeech.desktop file that
was installed in /usr/share/kde4/servicetypes in kde4 times?

https://codereview.qt-project.org/#admin,project,qt/qtspeech,info
It's a regular Qt module providing a library that currently consists of one
class.
It is currently quite incomplete because it lacks voice/language
configuration.
On the up side, I implemented basic backends for win/mac/android/linux.
Linux is using speech-dispatcher, but I was quite dissatisfied with spd's
API. For example it lacks proper free functions for the structs it allocates
- so one has to basically leak them.
I didn't dare looking at Jovie/kttsd since I used the Qt license.
Greetings,
Frederik

thanks,
Jeremy
_______________________________________________
Kde-frameworks-devel mailing list
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel

_______________________________________________
Kde-frameworks-devel mailing list
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel

Frederik Gladhorn

2014-03-06 13:43:00 UTC

Permalink

Post by Jeremy Whiting
Took a quick read through that just now and it looks pretty promising
from what I saw. I guess I don't know my way around gerrit very well
because I couldn't see a place to comment on the code like
reviewboard.
1. jovie has some old code and ui to control jobs at a fine grain that
spd doesn't expose really well, so I left it out when I ported ktts to
spd.

I would like to expose "voices" and "languages" in a sensible fashion. This is
tricky to get right cross-platform. I started with something on Linux but
decided to implement other backends first before attempting to implement voice
selection.
For language/locale I think qtspeech should default to the system locale and
let the user select a different one.

Post by Jeremy Whiting
2. user defined filters with some sane/useful defaults (if we were to
use QtSpeech for kde notifications, set konvi to speak all messages,
there's not a way to let the user say change "<jpwhiting> fregl: you
rock" into "jpwhiting says fregl you rock")

Maybe. I'd rather keep qtspeech very simple. My goals where to make it a tiny
library that is lean, fast and async by using signals and slots.
I want it to be good enough to be used in apps that use voice navigation, but
also when writing a screen reader. Some level of configuration is required in
any case. Let's come up with a good api that makes sense across platforms,
then I'm in.

Post by Jeremy Whiting
3. user configurability (As a user I can't set up which voice I would
like all speech-using applications to use)

As with other Qt libs, this is more for the platform to set up. Currently
qtspeech uses whatever voice is selected system wide (aka the default). I
think that is the right approach - follow what we get from the platform.
For KDE I'd thus suggest creating a configuration module which lets the user
choose the platform defaults.

Post by Jeremy Whiting
4. dbus, though this isn't as important if each application that uses
speech links to the library and speech-dispatcher or the system apis
do the async for us already anyway as you said.

I don't see a point in adding dbus into the mix indeed. One thing that is
interesting though is what kind of effect you get when opening the speech
backend from two apps at the same time.

Post by Jeremy Whiting
Items 1 and 4 will be irrelevant in a KF5 world but I'm wondering how
2 and 3 could be added either to qtspeech itself or as a kspeech
library that wraps qtspeech for kde applications to use.
Any thoughts on that? I would be pretty interested in helping with
qtspeech if it greatly simplifies or even deprecates jovie as it looks
like it could do possibly.

I'd be more than happy to get contributions of course. I cannot promise much
from my side, of course I'd like to continue working on this project as time
permits (so far it really is a spare time thing).

Greetings,
Frederik

Post by Jeremy Whiting
Jeremy

https://codereview.qt-project.org/#admin,project,qt/qtspeech,info
It's a regular Qt module providing a library that currently consists of one
class.
It is currently quite incomplete because it lacks voice/language
configuration.
On the up side, I implemented basic backends for win/mac/android/linux.
Linux is using speech-dispatcher, but I was quite dissatisfied with spd's
API. For example it lacks proper free functions for the structs it
allocates - so one has to basically leak them.
I didn't dare looking at Jovie/kttsd since I used the Qt license.
Greetings,
Frederik

thanks,
Jeremy
_______________________________________________
Kde-frameworks-devel mailing list
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel

_______________________________________________
Kde-frameworks-devel mailing list
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel

Jeremy Whiting

2014-03-06 16:13:19 UTC

Permalink

Post by Frederik Gladhorn

Using the system locale as default makes sense. What do you mean by
"voices" you mean something like spd's voice type (male1, male2,
female1, etc.)
Ktts had a complex system of specifying a voice with xml with
language, voice type, speed, pitch, etc. attributes and if an
attribute was empty it meant any voice with the other attributes was
acceptable. I think that's a bit too fine-grained for most cases
though, most uses I can think of just want to choose the voice type,
or even just the gender, and let the user/defaults choose the rest.
If more complex specification is wanted applications could always use
ssml to change the voice as part of the text they send to qtspeech.

Post by Frederik Gladhorn

Right, simple is definitely good. I'm just wondering if it could
accept plugins that implement some filtering method to filter the
text. Then filters could be as simple as a regex to convert
xml/html/etc. text into something that makes sense audibly like that
example from irc, or a complex filter plugin to change the voice could
inject ssml into the text. Maybe something like

QAbstractSpeechFilter {
public:
virtual QString filterText(QString &text)
};

Then a simple filtermanager (or even part of the existing class) loads
the plugins and when say() is called it passes the text through all
the plugins filterText() methods.

Is there some other Qt library or class that takes plugins for
specific functionality we could use as inspiration for making this
work and look clean also?

Post by Frederik Gladhorn

Post by Jeremy Whiting
3. user configurability (As a user I can't set up which voice I would
like all speech-using applications to use)

Yeah, each platform could have its own configuration of the defaults
sure, the only part missing is a real-time configuration change. For
example if Jovie is reduced to a kcm to configure speech-dispatcher's
default voice and I start listening to a pdf from okular or something
and decide I need the pitch to be lower, changing the default voice
wont change the voice that speech-dispatcher is already using to read
the pdf. Maybe that could be fixed with a patch to speech-dispatcher
to accept immediate default changes though, I'll have to think about
that.

Post by Frederik Gladhorn

I don't see a point in adding dbus into the mix indeed. One thing that is
interesting though is what kind of effect you get when opening the speech
backend from two apps at the same time.

Yep, that's completely understandable, np.

thanks,
Jeremy

Post by Frederik Gladhorn
Greetings,
Frederik

Post by Jeremy Whiting
Jeremy

https://codereview.qt-project.org/#admin,project,qt/qtspeech,info
It's a regular Qt module providing a library that currently consists of one
class.
It is currently quite incomplete because it lacks voice/language
configuration.
On the up side, I implemented basic backends for win/mac/android/linux.
Linux is using speech-dispatcher, but I was quite dissatisfied with spd's
API. For example it lacks proper free functions for the structs it
allocates - so one has to basically leak them.
I didn't dare looking at Jovie/kttsd since I used the Qt license.
Greetings,
Frederik

thanks,
Jeremy
_______________________________________________
Kde-frameworks-devel mailing list
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel

_______________________________________________
Kde-frameworks-devel mailing list
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel

Christoph Feck

2014-03-06 19:34:05 UTC

Permalink

Post by Jeremy Whiting

Post by Frederik Gladhorn

Post by Jeremy Whiting
3. user configurability (As a user I can't set up which voice I
would like all speech-using applications to use)

As with other Qt libs, this is more for the platform to set up.
Currently qtspeech uses whatever voice is selected system wide
(aka the default). I think that is the right approach - follow
what we get from the platform. For KDE I'd thus suggest creating
a configuration module which lets the user choose the platform
defaults.

Yeah, each platform could have its own configuration of the
defaults sure, the only part missing is a real-time configuration
change. For example if Jovie is reduced to a kcm to configure
speech-dispatcher's default voice and I start listening to a pdf
from okular or something and decide I need the pitch to be lower,
changing the default voice wont change the voice that
speech-dispatcher is already using to read the pdf. Maybe that
could be fixed with a patch to speech-dispatcher to accept
immediate default changes though, I'll have to think about that.

Let me refer to http://www.w3.org/TR/2011/WD-css3-speech-20110419/
which defines attributes a web page can use to influence speech. Would
be nice if we had API supporting web speech.

Regarding voice selection, it would be very useful to allow the
application to specify female/male/child voice via API (in addition to
the ability to let the user reconfigure actual voices). Similar to
letting the application request Sans, Sans Serif, and Monospaced font.

For example, when generating different voices while reading out e-book
stories.

Christoph Feck (kdepepo)

Frederik Gladhorn

2014-03-10 11:34:57 UTC

Permalink

Post by Christoph Feck

Post by Jeremy Whiting

Post by Frederik Gladhorn

Post by Jeremy Whiting
3. user configurability (As a user I can't set up which voice I
would like all speech-using applications to use)

As with other Qt libs, this is more for the platform to set up.
Currently qtspeech uses whatever voice is selected system wide
(aka the default). I think that is the right approach - follow
what we get from the platform. For KDE I'd thus suggest creating
a configuration module which lets the user choose the platform
defaults.

Yeah, each platform could have its own configuration of the
defaults sure, the only part missing is a real-time configuration
change. For example if Jovie is reduced to a kcm to configure
speech-dispatcher's default voice and I start listening to a pdf
from okular or something and decide I need the pitch to be lower,
changing the default voice wont change the voice that
speech-dispatcher is already using to read the pdf. Maybe that
could be fixed with a patch to speech-dispatcher to accept
immediate default changes though, I'll have to think about that.

Let me refer to http://www.w3.org/TR/2011/WD-css3-speech-20110419/
which defines attributes a web page can use to influence speech. Would
be nice if we had API supporting web speech.

This is interesting in that different synths have already some sort of support
for this.

Post by Christoph Feck
Regarding voice selection, it would be very useful to allow the
application to specify female/male/child voice via API (in addition to
the ability to let the user reconfigure actual voices). Similar to
letting the application request Sans, Sans Serif, and Monospaced font.
For example, when generating different voices while reading out e-book
stories.

Makes sense. I'd like to get an overview over the native APIs first.
Let's collect their capabilities and then try to come up with a sensible
compromise.
Feel free to gather data here:
http://qt-project.org/wiki/QtSpeech

Cheers,
Frederik

Post by Christoph Feck
Christoph Feck (kdepepo)

guenter

2014-03-09 17:35:55 UTC

Permalink

Post by Jeremy Whiting

Post by Frederik Gladhorn

Hi folks,

although my locale is de_DE I'm reading often English and other
languages too. So I would vote for an easy way for users/applications of
switching the language.

Greetings,
Günter

--
.