(lisppad speech)
Library (lisppad speech)
provides a speech synthesis API which parses text and converts it into audible speech. The conversion is based on factors like the language, the voice, and a range of parameters which are all aggregated by speaker objects.
Speech synthesis
Speaks the given string text using with the speaker object providing all speech synthesis parameters. If speaker is not provided, the value of parameter object current-speaker
is used.
Converts the given natural language string text into a string of phonemes using the given speaker. If speaker is not provided, the value of parameter object current-speaker
is used.
Speakers can be configured to speak phonemes instead of natural language via procedure speaker-interpret-phonemes!
.
Speakers
A speaker is an object defining speech synthesis parameters. There is a current speaker which is used by default, unless a speaker is explicitly specified for the various procedures that require a speaker parameter.
A speaker object has the following components:
an immutable voice,
a mutable speaking rate,
a mutable speaking volume,
a flag determining whether the speaker interprets text or phonemes,
a flag determining how numbers are interpreted, as well as
a speaking pitch.
Defines the current speaker, which is used as a default by all functions for which the speaker argument is optional. If there is no current speaker, this parameter is set to #f
.
Returns #t
if obj is a speaker object; otherwise #f
is returned.
Returns a new speaker for the given voice. If voice is not provided, a default voice, specified at the operating system level, is being used. Speakers are stateful objects which can be configured with a number of procedures: set-speaker-rate!
, set-speaker-volume!
, set-speaker-interpret-phonemes!
, set-speaker-interpret-numbers!
, and set-speaker-pitch!
.
Returns the voice of speaker. If speaker is not provided, the parameter object current-speaker
is used.
Returns the speaking rate of speaker. If speaker is not provided, the parameter object current-speaker
is used.
Sets the speaking rate of speaker to number rate. If speaker is not provided, the parameter object current-speaker
is used.
Returns the volume of speaker as a flonum ranging from 0.0 to 1.0. If speaker is not provided, the parameter object current-speaker
is used.
Sets the volume of speaker to number volume which is a flonum between 0.0 and 1.0. If speaker is not provided, the parameter object current-speaker
is used.
Returns #t
if speaker interprets phonemes instead of natural language text. If speaker is not provided, the parameter object current-speaker
is used.
If boolean argument phoneme? is #f
, speaker is configured to interpret natural language. If phoneme? is set to any other value, the speaker is interpreting phonemes instead. If speaker is not provided, the parameter object current-speaker
is used.
Returns #t
if speaker interprets numbers as a natural language speaker would do ("100" is spoken as "hundred"). If it returns #f
, speaker decomposes numbers into a sequence of digits and speaks them individually ("100" is spoken as "one zero zero"). If speaker is not provided, the parameter object current-speaker
is used.
Sets the number interpretation of speaker to boolean natural?. If natural? is #t
speaker will interpret numbers as a natural language speaker would do ("100" is spoken as "hundred"). If natural? is #f
, speaker decomposes numbers into a sequence of digits and speaks them individually ("100" is spoken as "one zero zero"). If speaker is not provided, the parameter object current-speaker
is used.
Returns the pitch of speaker as a pair of two flonums: the car is the base of the pitch, and the cdr is the modulation of the pitch. If speaker is not provided, the parameter object current-speaker
is used.
Sets the pitch of speaker to the pair of flonums pitch whose car is the base of the pitch, and the cdr is the modulation of the pitch. If speaker is not provided, the parameter object current-speaker
is used.
Voices
Voices are provided by the operating system and library (lispkit speech)
does not have an explicit representation as objects. Symbols are used as identifiers for voices. For example, com.apple.speech.synthesis.voice.Alex
refers to the default US voice.
A voice has the following characteristics:
Name (string)
Age (fixnum)
Gender (
male
orfemale
)Locale (symbol, e.g.
en_US
)
Library (lispkit system)
provides means to handle locales, including language and country codes.
Returns a symbol identifying the voice specified by the arguments of voice
. If no argument is provided, an indentifier for the default voice is returned. If a name string is provided, then an identifier for a voice whose name is name is returned, or #f
if no such voice exists. If an id symbol is provided, then an identifier for a voice whose identifier matches id is returned, or #f
if no such voice exists.
Returns a list of symbols identifying voices matching the given language filter lang and gender filter gender. Both lang and gender are symbols. lang should either be a language or locale identifier. It can also be set to #f
if only a gender filter is needed. gender should either be symbol male
or female
.
Returns #t
if obj is a symbol identifying an available voice, otherwise #f
is returned. This procedure fails if obj is neither a symbol nor the value #f
.
Returns the name of the voice identified by symbol voice.
Returns the age of the voice identified by symbol voice.
Returns the gender of the voice identified by symbol voice.
Returns the locale of the voice identified by symbol voice.
Last updated