# (lispkit regexp)

Library `(lispkit regexp)` provides an API for defining regular expressions and applying them to strings. Supported are both matching as well as search/replace.

## Regular expressions

The regular expression syntax supported by this library corresponds to the one of [`NSRegularExpression`](https://developer.apple.com/documentation/foundation/nsregularexpression) of Apple's *Foundation* framework. This is also the origin of the documentation of this section.

### Meta-characters

<table><thead><tr><th width="161">Character</th><th>Description</th></tr></thead><tbody><tr><td><code>\a</code></td><td>Match a <em>bell</em> (<code>\u0007</code>)</td></tr><tr><td><code>\A</code></td><td>Match at the beginning of the input. Differs from <code>^</code> in that <code>\A</code> will not match after a new line within the input.</td></tr><tr><td><code>\b</code></td><td>Outside of a [Set], match if the current position is a word boundary. Boundaries occur at the transitions between word (<code>\w</code>) and non-word (<code>\W</code>) characters, with combining marks ignored. Inside of a [Set], match a <em>backspace</em> (<code>\u0008</code>).</td></tr><tr><td><code>\B</code></td><td>Match if the current position is not a word boundary.</td></tr><tr><td><code>\cX</code></td><td>Match a control-X character.</td></tr><tr><td><code>\d</code></td><td>Match any character with the unicode general category of <code>Nd</code>, i.e. numbers and decimal digits.</td></tr><tr><td><code>\D</code></td><td>Match any character that is not a decimal digit.</td></tr><tr><td><code>\e</code></td><td>Match an <em>escape</em> (<code>\u001B</code>).</td></tr><tr><td><code>\E</code></td><td>Terminates a <code>\Q ... \E</code> quoted sequence.</td></tr><tr><td><code>\f</code></td><td>Match a <em>form feed</em> (<code>\u000C</code>).</td></tr><tr><td><code>\G</code></td><td>Match if the current position is at the end of the previous match.</td></tr><tr><td></td><td>Match a <em>line feed</em> (<code>\u000A</code>).</td></tr><tr><td><code>\N{unicode character}</code></td><td>Match the named character.</td></tr><tr><td><code>\p{unicode property}</code></td><td>Match any character with the specified unicode property.</td></tr><tr><td><code>\P{unicode property}</code></td><td>Match any character not having the specified unicode property.</td></tr><tr><td><code>\Q</code></td><td>Quotes all following characters until \E.</td></tr><tr><td></td><td>Match a <em>carriage return</em> (<code>\u000D</code>).</td></tr><tr><td><code>\s</code></td><td>Match a whitespace character. Whitespace is defined as <code>[\t\n\f\r\p{Z}]</code>.</td></tr><tr><td><code>\S</code></td><td>Match a non-whitespace character.</td></tr><tr><td></td><td>Match a horizontal tabulation (<code>\u0009</code>).</td></tr><tr><td><code>\uhhhh</code></td><td>Match the character with the hex value <code>hhhh</code>.</td></tr><tr><td><code>\Uhhhhhhhh</code></td><td>Match the character with the hex value <code>hhhhhhhh</code>. Exactly eight hex digits must be provided, even though the largest Unicode code point is <code>\U0010ffff</code>.</td></tr><tr><td><code>\w</code></td><td>Match a word character. Word characters are <code>[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]</code>.</td></tr><tr><td><code>\W</code></td><td>Match a non-word character.</td></tr><tr><td><code>\x{hhhh}</code></td><td>Match the character with hex value <code>hhhh</code>. From one to six hex digits may be supplied.</td></tr><tr><td><code>\xhh</code></td><td>Match the character with two digit hex value <code>hh</code>.</td></tr><tr><td><code>\X</code></td><td>Match a grapheme cluster.</td></tr><tr><td><code>\Z</code></td><td>Match if the current position is at the end of input, but before the final line terminator, if one exists.</td></tr><tr><td><code>\z</code></td><td>Match if the current position is at the end of input.</td></tr><tr><td>\<em>n</em></td><td>Back Reference. Match whatever the <em>n</em>-th capturing group matched. <em>n</em> must be a number ≥ 1 and ≤ total number of capture groups in the pattern.</td></tr><tr><td><code>\0ooo</code></td><td>Match an octal character. <em>ooo</em> is from one to three octal digits. <code>0377</code> is the largest allowed octal character. The leading zero is required and distinguishes octal constants from back references.</td></tr><tr><td><code>[pattern]</code></td><td>Match any one character from the pattern.</td></tr><tr><td><code>.</code></td><td>Match any character.</td></tr><tr><td><code>^</code></td><td>Match at the beginning of a line.</td></tr><tr><td><code>$</code></td><td>Match at the end of a line.</td></tr><tr><td>\</td><td>Quotes the following character. Characters that must be quoted to be treated as literals are `* ? + [ ( ) { } ^ $</td></tr></tbody></table>

### Regular expression operators

<table><thead><tr><th width="172">Character</th><th>Description</th></tr></thead><tbody><tr><td>`</td><td>`</td></tr><tr><td><code>*</code></td><td>Match 0 or more times, as many times as possible.</td></tr><tr><td><code>+</code></td><td>Match 1 or more times, as many times as possible.</td></tr><tr><td><code>?</code></td><td>Match zero or one times, preferring one time if possible.</td></tr><tr><td><code>{n}</code></td><td>Match exactly <code>n</code> times.</td></tr><tr><td><code>{n,}</code></td><td>Match at least <code>n</code> times, as many times as possible.</td></tr><tr><td><code>{n,m}</code></td><td>Match between <code>n</code> and <code>m</code> times, as many times as possible, but not more than <code>m</code> times.</td></tr><tr><td><code>*?</code></td><td>Match zero or more times, as few times as possible.</td></tr><tr><td><code>+?</code></td><td>Match one or more times, as few times as possible.</td></tr><tr><td><code>??</code></td><td>Match zero or one times, preferring zero.</td></tr><tr><td><code>{n}?</code></td><td>Match exactly <code>n</code> times.</td></tr><tr><td><code>{n,}?</code></td><td>Match at least <code>n</code> times, but no more than required for an overall pattern match.</td></tr><tr><td><code>{n,m}?</code></td><td>Match between <code>n</code> and <code>m</code> times, as few times as possible, but not less than <code>n</code>.</td></tr><tr><td><code>*+</code></td><td>Match zero or more times, as many times as possible when first encountered, do not retry with fewer even if overall match fails (possessive match).</td></tr><tr><td><code>++</code></td><td>Match one or more times (possessive match).</td></tr><tr><td><code>?+</code></td><td>Match zero or one times (possessive match).</td></tr><tr><td><code>{n}+</code></td><td>Match exactly <code>n</code> times.</td></tr><tr><td><code>{n,}+</code></td><td>Match at least <code>n</code> times (possessive match).</td></tr><tr><td><code>{n,m}+</code></td><td>Match between <code>n</code> and <code>m</code> times (possessive match).</td></tr><tr><td><code>(...)</code></td><td>Capturing parentheses; the range of input that matched the parenthesized subexpression is available after the match.</td></tr><tr><td><code>(?:...)</code></td><td>Non-capturing parentheses; groups the included pattern, but does not provide capturing of matching text (more efficient than capturing parentheses).</td></tr><tr><td><code>(?>...)</code></td><td>Atomic-match parentheses; first match of the parenthesized subexpression is the only one tried. If it does not lead to an overall pattern match, back up the search for a match to a position before the <code>"(?>"</code>.</td></tr><tr><td><code>(?# ... )</code></td><td>Free-format comment (?# comment).</td></tr><tr><td><code>(?= ... )</code></td><td>Look-ahead assertion. True, if the parenthesized pattern matches at the current input position, but does not advance the input position.</td></tr><tr><td><code>(?! ... )</code></td><td>Negative look-ahead assertion. True, if the parenthesized pattern does not match at the current input position. Does not advance the input position.</td></tr><tr><td><code>(?&#x3C;= ... )</code></td><td>Look-behind assertion. True, if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no <code>*</code> or <code>+</code> operators).</td></tr><tr><td><code>(?&#x3C;! ... )</code></td><td>Negative <em>look-behind assertion</em>. True, if the parenthesized pattern does not match text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no <code>*</code> or <code>+</code> operators).</td></tr><tr><td><code>(?ismwx-ismwx: ... )</code></td><td>Flag settings. Evaluate the parenthesized expression with the specified flags enabled or disabled.</td></tr><tr><td><code>(?ismwx-ismwx)</code></td><td>Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, <code>(?i)</code> changes to a case insensitive match.</td></tr></tbody></table>

### Template Matching

<table><thead><tr><th width="162">Character</th><th>Description</th></tr></thead><tbody><tr><td><code>$n</code></td><td>The text of capture group <code>n</code> will be substituted for <code>$n</code>. <code>n</code> must be ≥ 0 and not greater than the number of capture groups. A <code>$</code> not followed by a digit has no special meaning, and will appear in the substitution text as itself, i.e. <code>$</code>.</td></tr><tr><td>\</td><td>Treat the following character as a literal, suppressing any special meaning. Backslash escaping in substitution text is only required for <code>$</code> and <code>\</code>, but may be used on any other character.</td></tr></tbody></table>

### Flag options

The following flags control various aspects of regular expression matching. These flags get specified within the pattern using the `(?ismx-ismx)` pattern options.

<table><thead><tr><th width="160">Character</th><th>Description</th></tr></thead><tbody><tr><td><code>i</code></td><td>If set, matching will take place in a case-insensitive manner.</td></tr><tr><td><code>x</code></td><td>If set, allow use of white space and #comments within patterns.</td></tr><tr><td><code>s</code></td><td>If set, a "." in a pattern will match a line terminator in the input text. By default, it will not. Note that a carriage-return/line-feed pair in text behave as a single line terminator, and will match a single "." in a regular expression pattern.</td></tr><tr><td><code>m</code></td><td>Control the behavior of <code>^</code> and <code>$</code> in a pattern. By default these will only match at the start and end, respectively, of the input text. If this flag is set, <code>^</code> and <code>$</code> will also match at the start and end of each line within the input text.</td></tr><tr><td><code>w</code></td><td>Controls the behavior of <code>\b</code> in a pattern. If set, word boundaries are found according to the definitions of word found in <em>Unicode UAX 29, Text Boundaries</em>. By default, word boundaries are identified by means of a simple classification of characters as either <em>word</em> or <em>non-word</em>, which approximates traditional regular expression behavior.</td></tr></tbody></table>

## API

**(regexp?&#x20;*****obj*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">

Returns `#t` if *obj* is a regular expression object; otherwise `#f` is returned.

**(regexp&#x20;*****str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp&#x20;*****str opt ...*****)**

Returns a new regular expression object from the given regular expression pattern *str* and matching options *opt*, ... . *str* is a string, matching options *opt* are symbols. The following matching options are supported:

* `case-insensitive`: Match letters in the regular expression independent of their case.
* `allow-comments`: Ignore whitespace and `#`-prefixed comments in the regular expression pattern.
* `ignore-meta`: Treat the entire regular expression pattern as a literal string.
* `dot-matches-line-separator`: Allow `.` to match any character, including line separators.
* `anchors-match-lines`: Allow `^` and `$` to match the start and end of lines.
* `unix-only-line-separators`: Treat only  as a line separator; otherwise, all standard line separators are used.
* `unicode-words`: Use Unicode TR#29 to specify word boundaries; otherwise, all traditional regular expression word boundaries are used.

**(regexp-pattern&#x20;*****regexp*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">

Returns the regular expression pattern for the given regular expression object *regexp*. A regular expression pattern is a string matching the regular expression syntax supported by library `(lispkit regexp)`.

**(regexp-capture-groups&#x20;*****regexp*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">

Returns the number of capture groups of the given regular expression object *regexp*.

**(escape-regexp-pattern&#x20;*****str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">

Returns a regular expression pattern string by adding backslash escapes to pattern *str* as necessary to protect any characters that would match as pattern meta-characters.

```scheme
(escape-regexp-pattern "(home/objecthub)")
⟹ "\\(home\\/objecthub\\)"
```

**(escape-regexp-template&#x20;*****str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">

Returns a regular expression pattern template string by adding backslash escapes to pattern template *str* as necessary to protect any characters that would match as pattern meta-characters.

**(regexp-matches&#x20;*****regexp str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp-matches&#x20;*****regexp str start*****)**\
\&#xNAN;**(regexp-matches&#x20;*****regexp str start end*****)**

Returns a *matching spec* if the regular expression object *regexp* successfully matches the entire string *str* from position *start* (inclusive) to *end* (exclusive); otherwise, `#f` is returned. The default for *start* is 0; the default for *end* is the length of the string.

A *matching spec* returned by `regexp-matches` consists of pairs of fixnum positions *(startpos . endpos)* in *str*. The first pair is always representing the full match (i.e. *startpos* is 0 and *endpos* is the length of *str*), all other pairs represent the positions of the matching capture groups of *regexp*.

```scheme
(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-matches email "matthias@objecthub.net")
⟹ ((0 . 22))
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-matches series "Season 3  Episode 12")
⟹ ((0 . 20) (7 . 8) (18 . 20))
```

**(regexp-matches?&#x20;*****regexp str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp-matches?&#x20;*****regexp str start*****)**\
\&#xNAN;**(regexp-matches?&#x20;*****regexp str start end*****)**

Returns `#t` if the regular expression object *regexp* successfully matches the entire string *str* from position *start* (inclusive) to *end* (exclusive); otherwise, `#f` is returned. The default for *start* is 0; the default for *end* is the length of the string.

**(regexp-search&#x20;*****regexp str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp-search&#x20;*****regexp str start*****)**\
\&#xNAN;**(regexp-search&#x20;*****regexp str start end*****)**

Returns a *matching spec* for the first match of the regular expression *regexp* with a part of string *str* between position *start* (inclusive) and *end* (exclusive). If *regexp* does not match any part of *str* between *start* and *end*, `#f` is returned. The default for *start* is 0; the default for *end* is the length of the string.

A *matching spec* returned by `regexp-search` consists of pairs of fixnum positions *(startpos . endpos)* in *str*. The first pair is always representing the full match of the pattern, all other pairs represent the positions of the matching capture groups of *regexp*.

```scheme
(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-search email "Contact matthias@objecthub.net or foo@bar.org")
⟹ ((8 . 30))
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-search series "New Season 3 Episode 12: Pilot")
⟹ ((4 . 23) (11 . 12) (21 . 23))
```

**(regexp-search-all&#x20;*****regexp str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp-search-all&#x20;*****regexp str start*****)**\
\&#xNAN;**(regexp-search-all&#x20;*****regexp str start end*****)**

Returns a list of all *matching specs* for matches of the regular expression *regexp* with parts of string *str* between position *start* (inclusive) and *end* (exclusive). If *regexp* does not match any part of *str* between *start* and *end*, the empty list is returned. The default for *start* is 0; the default for *end* is the length of the string.

A *matching spec* returned by `regexp-search` consists of pairs of fixnum positions *(startpos . endpos)* in *str*. The first pair is always representing the full match of the pattern, all other pairs represent the positions of the matching capture groups of *regexp*.

```scheme
(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-search-all email "Contact matthias@objecthub.net or foo@bar.org")
⟹ (((8 . 30)) ((34 . 45)))
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-search-all series "New Season 3 Episode 12: Pilot")
⟹ (((4 . 23) (11 . 12) (21 . 23)))
```

**(regexp-extract&#x20;*****regexp str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp-extract&#x20;*****regexp str start*****)**\
\&#xNAN;**(regexp-extract&#x20;*****regexp str start end*****)**

Returns a list of substrings from *str* which all represent full matches of the regular expression *regexp* with parts of string *str* between position *start* (inclusive) and *end* (exclusive). If *regexp* does not match any part of *str* between *start* and *end*, the empty list is returned. The default for *start* is 0; the default for *end* is the length of the string.

```scheme
(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-extract email "Contact matthias@objecthub.net or foo@bar.org" 10)
⟹ ("tthias@objecthub.net" "foo@bar.org")
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-extract series "New Season 3 Episode 12: Pilot")
⟹ ("Season 3 Episode 12")
```

**(regexp-split&#x20;*****regexp str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp-split&#x20;*****regexp str start*****)**\
\&#xNAN;**(regexp-split&#x20;*****regexp str start end*****)**

Splits string *str* into a list of possibly empty substrings separated by non-empty matches of regular expression *regexp* within position *start* (inclusive) and *end* (exclusive). If *regexp* does not match any part of *str* between *start* and *end*, a list with *str* as its only element is returned. The default for *start* is 0; the default for *end* is the length of the string.

```scheme
(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-split email "Contact matthias@objecthub.net or foo@bar.org" 10)
⟹ ("Contact ma" " or " "")
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-split series "New Season 3 Episode 12: Pilot")
⟹ ("New " ": Pilot")
```

**(regexp-partition&#x20;*****regexp str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp-partition&#x20;*****regexp str start*****)**\
\&#xNAN;**(regexp-partition&#x20;*****regexp str start end*****)**

Partitions string *str* into a list of non-empty strings matching regular expression *regexp* within position *start* (inclusive) and *end* (exclusive), interspersed with the unmatched portions of the whole string. The first and every odd element is an unmatched substring, which will be the empty string if *regexp* matches at the beginning of the string or end of the previous match. The second and every even element will be a substring fully matching *regexp*. If *str* is the empty string or if there is no match at all, the result is a list with *str* as its only element.

```scheme
(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-partition email "Contact matthias@objecthub.net or foo@bar.org" 10)
⟹ ("Contact ma" "tthias@objecthub.net" " or " "foo@bar.org" "")
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-partition series "New Season 3 Episode 12: Pilot")
⟹ ("New " "Season 3 Episode 12" ": Pilot")
```

**(regexp-replace&#x20;*****regexp str subst*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp-replace&#x20;*****regexp str subst start*****)**\
\&#xNAN;**(regexp-replace&#x20;*****regexp str subst start end*****)**

Returns a new string replacing all matches of regular expression *regexp* in string *str* within position *start* (inclusive) and *end* (exclusive) with string *subst*. `regexp-replace` will always return a new string, even if there are no matches and replacements.

The optional parameters *start* and *end* restrict both the matching and the substitution, to the given positions, such that the result is equivalent to omitting these parameters and replacing on `(substring` *str start end*`)`.

```scheme
(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-replace email "Contact matthias@objecthub.net or foo@bar.org" "<omitted>" 10)
⟹ "Contact ma<omitted> or <omitted>"
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-replace series "New Season 3 Episode 12: Pilot" "Series")
⟹ "New Series: Pilot"
```

**(regexp-replace!&#x20;*****x*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">

Mutates string *str* by replacing all matches of regular expression *regexp* within position *start* (inclusive) and *end* (exclusive) with string *subst*. The optional parameters *start* and *end* restrict both the matching and the substitution. `regexp-replace!` returns the number of replacements that were applied.

```scheme
(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(define str "Contact matthias@objecthub.net or foo@bar.org")
(regexp-replace! email str "<omitted>" 10) ⟹ 2
str ⟹ "Contact ma<omitted> or <omitted>"
```

**(regexp-fold&#x20;*****regexp kons knil str*****)** <img src="https://1467949168-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fna2foeoaXHYkSD3fhs0t%2Fuploads%2Fgit-blob-d20368c588cfbb523beb2fae4f8be0f8ef011884%2Fproc.png?alt=media" alt="" data-size="line">\
\&#xNAN;**(regexp-fold&#x20;*****regexp kons knil str finish*****)**\
\&#xNAN;**(regexp-fold&#x20;*****regexp kons knil str finish start*****)**\
\&#xNAN;**(regexp-fold&#x20;*****regexp kons knil str finish start end*****)**

`regexp-fold` is the most fundamental and generic regular expression matching iterator. It repeatedly searches string *str* for the regular expression *regexp* so long as a match can be found. On each successful match, it applies `(kons` *i regexp-match str acc*`)` where *i* is the index since the last match (beginning with *start*), *regexp-match* is the resulting *matching spec*, and *acc* is the result of the previous *kons* application, beginning with *knil*. When no more matches can be found, `regexp-fold` calls *finish* with the same arguments, except that *regexp-match* is `#f`. By default, *finish* just returns *acc*.

```scheme
(regexp-fold (regexp "(\\w+)")
             (lambda (i m str acc)
               (let ((s (substring str (caar m) (cdar m))))
                 (if (zero? i) s (string-append acc "-" s))))
             ""
             "to  be  or  not  to  be")
⟹ "to-be-or-not-to-be"
```
