# (lisppad speech)

Library `(lisppad speech)` provides a speech synthesis API which parses text and converts it into audible speech. The conversion is based on factors like the language, the *voice*, and a range of parameters which are all aggregated by *speaker* objects.

## Speech synthesis

**(speak&#x20;*****text*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(speak&#x20;*****text speaker*****)**

Speaks the given string *text* using with the *speaker* object providing all speech synthesis parameters. If *speaker* is not provided, the value of parameter object `current-speaker` is used.

**(phonemes&#x20;*****text*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(phonemes&#x20;*****text speaker*****)**

Converts the given natural language string *text* into a string of phonemes using the given *speaker*. If *speaker* is not provided, the value of parameter object `current-speaker` is used.

Speakers can be configured to speak phonemes instead of natural language via procedure `speaker-interpret-phonemes!`.

## Speakers

A *speaker* is an object defining speech synthesis parameters. There is a *current speaker* which is used by default, unless a speaker is explicitly specified for the various procedures that require a speaker parameter.

A speaker object has the following components:

* an immutable voice,
* a mutable speaking rate,
* a mutable speaking volume,
* a flag determining whether the speaker interprets text or phonemes,
* a flag determining how numbers are interpreted, as well as
* a speaking pitch.

**current-speaker** <img src="/files/mK8eMQUj1oS8rq8TeU89" alt="" data-size="line">

Defines the *current speaker*, which is used as a default by all functions for which the speaker argument is optional. If there is no current speaker, this parameter is set to `#f`.

**(speaker?&#x20;*****obj*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">

Returns `#t` if *obj* is a speaker object; otherwise `#f` is returned.

**(make-speaker)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(make-speaker&#x20;*****voice*****)**

Returns a new speaker for the given *voice*. If *voice* is not provided, a default voice, specified at the operating system level, is being used. Speakers are stateful objects which can be configured with a number of procedures: `set-speaker-rate!`, `set-speaker-volume!`, `set-speaker-interpret-phonemes!`, `set-speaker-interpret-numbers!`, and `set-speaker-pitch!`.

**(speaker-voice)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(speaker-voice&#x20;*****speaker*****)**

Returns the voice of *speaker*. If *speaker* is not provided, the parameter object `current-speaker` is used.

**(speaker-rate)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(speaker-rate&#x20;*****speaker*****)**

Returns the speaking rate of *speaker*. If *speaker* is not provided, the parameter object `current-speaker` is used.

**(set-speaker-rate!&#x20;*****rate*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(set-speaker-rate!&#x20;*****rate speaker*****)**

Sets the speaking rate of *speaker* to number *rate*. If *speaker* is not provided, the parameter object `current-speaker` is used.

**(speaker-volume)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(speaker-volume&#x20;*****speaker*****)**

Returns the volume of *speaker* as a flonum ranging from 0.0 to 1.0. If *speaker* is not provided, the parameter object `current-speaker` is used.

**(set-speaker-volume!&#x20;*****volume*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(set-speaker-volume!&#x20;*****volume speaker*****)**

Sets the volume of *speaker* to number *volume* which is a flonum between 0.0 and 1.0. If *speaker* is not provided, the parameter object `current-speaker` is used.

**(speaker-interpret-phonemes)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(speaker-interpret-phonemes&#x20;*****speaker*****)**

Returns `#t` if *speaker* interprets phonemes instead of natural language text. If *speaker* is not provided, the parameter object `current-speaker` is used.

**(set-speaker-interpret-phonemes!&#x20;*****phoneme?*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(set-speaker-interpret-phonemes!&#x20;*****phoneme? speaker*****)**

If boolean argument *phoneme?* is `#f`, *speaker* is configured to interpret natural language. If *phoneme?* is set to any other value, the *speaker* is interpreting phonemes instead. If *speaker* is not provided, the parameter object `current-speaker` is used.

**(speaker-interpret-numbers)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(speaker-interpret-numbers&#x20;*****speaker*****)**

Returns `#t` if *speaker* interprets numbers as a natural language speaker would do ("100" is spoken as "hundred"). If it returns `#f`, *speaker* decomposes numbers into a sequence of digits and speaks them individually ("100" is spoken as "one zero zero"). If *speaker* is not provided, the parameter object `current-speaker` is used.

**(set-speaker-interpret-numbers!&#x20;*****natural?*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(set-speaker-interpret-numbers!&#x20;*****natural? speaker*****)**

Sets the number interpretation of *speaker* to boolean *natural?*. If *natural?* is `#t` *speaker* will interpret numbers as a natural language speaker would do ("100" is spoken as "hundred"). If *natural?* is `#f`, *speaker* decomposes numbers into a sequence of digits and speaks them individually ("100" is spoken as "one zero zero"). If *speaker* is not provided, the parameter object `current-speaker` is used.

**(speaker-pitch)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(speaker-pitch&#x20;*****speaker*****)**

Returns the pitch of *speaker* as a pair of two flonums: the car is the base of the pitch, and the cdr is the modulation of the pitch. If *speaker* is not provided, the parameter object `current-speaker` is used.

**(set-speaker-pitch!&#x20;*****pitch*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(set-speaker-pitch!&#x20;*****pitch speaker*****)**

Sets the pitch of *speaker* to the pair of flonums *pitch* whose car is the base of the pitch, and the cdr is the modulation of the pitch. If *speaker* is not provided, the parameter object `current-speaker` is used.

## Voices

Voices are provided by the operating system and library `(lispkit speech)` does not have an explicit representation as objects. Symbols are used as identifiers for voices. For example, `com.apple.speech.synthesis.voice.Alex` refers to the default US voice.

A voice has the following characteristics:

* Name (string)
* Age (fixnum)
* Gender (`male` or `female`)
* Locale (symbol, e.g. `en_US`)

Library `(lispkit system)` provides means to handle *locales*, including language and country codes.

**(voice)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(voice&#x20;*****name*****)**\
\&#xNAN;**(voice&#x20;*****id*****)**

Returns a symbol identifying the voice specified by the arguments of `voice`. If no argument is provided, an indentifier for the default voice is returned. If a *name* string is provided, then an identifier for a voice whose name is *name* is returned, or `#f` if no such voice exists. If an *id* symbol is provided, then an identifier for a voice whose identifier matches *id* is returned, or `#f` if no such voice exists.

**(available-voices)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">\
\&#xNAN;**(available-voices&#x20;*****lang*****)**\
\&#xNAN;**(available-voices&#x20;*****lang gender*****)**

Returns a list of symbols identifying voices matching the given language filter *lang* and gender filter *gender*. Both *lang* and *gender* are symbols. *lang* should either be a language or locale identifier. It can also be set to `#f` if only a gender filter is needed. *gender* should either be symbol `male` or `female`.

```scheme
(available-voices 'en)
⇒ (com.apple.speech.synthesis.voice.Alex com.apple.speech.synthesis.voice.daniel com.apple.speech.synthesis.voice.fiona com.apple.speech.synthesis.voice.Fred com.apple.speech.synthesis.voice.karen com.apple.speech.synthesis.voice.moira com.apple.speech.synthesis.voice.rishi com.apple.speech.synthesis.voice.samantha com.apple.speech.synthesis.voice.tessa com.apple.speech.synthesis.voice.veena)
(available-voices (locale "en" "GB"))
⇒ (com.apple.speech.synthesis.voice.daniel)
```

**(available-voice?&#x20;*****obj*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">

Returns `#t` if *obj* is a symbol identifying an available voice, otherwise `#f` is returned. This procedure fails if *obj* is neither a symbol nor the value `#f`.

**(voice-name&#x20;*****voice*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">

Returns the name of the voice identified by symbol *voice*.

**(voice-age&#x20;*****voice*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">

Returns the age of the voice identified by symbol *voice*.

**(voice-gender&#x20;*****voice*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">

Returns the gender of the voice identified by symbol *voice*.

**(voice-locale&#x20;*****voice*****)** <img src="/files/STqjiJsrexexyFklGQwH" alt="" data-size="line">

Returns the locale of the voice identified by symbol *voice*.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.lisppad.app/libraries/lisppad/speech.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
