(lispkit regexp)

Library (lispkit regexp) provides an API for defining regular expressions and applying them to strings. Supported are both matching as well as search/replace.

Regular expressions

The regular expression syntax supported by this library corresponds to the one of NSRegularExpression of Apple's Foundation framework. This is also the origin of the documentation of this section.

Meta-characters

Character

Description

Regular expression operators

Template Matching

Flag options

The following flags control various aspects of regular expression matching. These flags get specified within the pattern using the (?ismx-ismx) pattern options.

API

Returns #t if obj is a regular expression object; otherwise #f is returned.

Returns a new regular expression object from the given regular expression pattern str and matching options opt, ... . str is a string, matching options opt are symbols. The following matching options are supported:

case-insensitive: Match letters in the regular expression independent of their case.
allow-comments: Ignore whitespace and #-prefixed comments in the regular expression pattern.
ignore-meta: Treat the entire regular expression pattern as a literal string.
dot-matches-line-separator: Allow . to match any character, including line separators.
anchors-match-lines: Allow ^ and $ to match the start and end of lines.
unix-only-line-separators: Treat only as a line separator; otherwise, all standard line separators are used.
unicode-words: Use Unicode TR#29 to specify word boundaries; otherwise, all traditional regular expression word boundaries are used.

Returns the regular expression pattern for the given regular expression object regexp. A regular expression pattern is a string matching the regular expression syntax supported by library (lispkit regexp).

Returns the number of capture groups of the given regular expression object regexp.

Returns a regular expression pattern string by adding backslash escapes to pattern str as necessary to protect any characters that would match as pattern meta-characters.

(escape-regexp-pattern "(home/objecthub)")
⟹ "\\(home\\/objecthub\\)"

Returns a regular expression pattern template string by adding backslash escapes to pattern template str as necessary to protect any characters that would match as pattern meta-characters.

Returns a matching spec if the regular expression object regexp successfully matches the entire string str from position start (inclusive) to end (exclusive); otherwise, #f is returned. The default for start is 0; the default for end is the length of the string.

A matching spec returned by regexp-matches consists of pairs of fixnum positions (startpos . endpos) in str. The first pair is always representing the full match (i.e. startpos is 0 and endpos is the length of str), all other pairs represent the positions of the matching capture groups of regexp.

(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-matches email "matthias@objecthub.net")
⟹ ((0 . 22))
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-matches series "Season 3  Episode 12")
⟹ ((0 . 20) (7 . 8) (18 . 20))

Returns #t if the regular expression object regexp successfully matches the entire string str from position start (inclusive) to end (exclusive); otherwise, #f is returned. The default for start is 0; the default for end is the length of the string.

Returns a matching spec for the first match of the regular expression regexp with a part of string str between position start (inclusive) and end (exclusive). If regexp does not match any part of str between start and end, #f is returned. The default for start is 0; the default for end is the length of the string.

A matching spec returned by regexp-search consists of pairs of fixnum positions (startpos . endpos) in str. The first pair is always representing the full match of the pattern, all other pairs represent the positions of the matching capture groups of regexp.

(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-search email "Contact matthias@objecthub.net or foo@bar.org")
⟹ ((8 . 30))
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-search series "New Season 3 Episode 12: Pilot")
⟹ ((4 . 23) (11 . 12) (21 . 23))

Returns a list of all matching specs for matches of the regular expression regexp with parts of string str between position start (inclusive) and end (exclusive). If regexp does not match any part of str between start and end, the empty list is returned. The default for start is 0; the default for end is the length of the string.

(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-search-all email "Contact matthias@objecthub.net or foo@bar.org")
⟹ (((8 . 30)) ((34 . 45)))
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-search-all series "New Season 3 Episode 12: Pilot")
⟹ (((4 . 23) (11 . 12) (21 . 23)))

Returns a list of substrings from str which all represent full matches of the regular expression regexp with parts of string str between position start (inclusive) and end (exclusive). If regexp does not match any part of str between start and end, the empty list is returned. The default for start is 0; the default for end is the length of the string.

(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-extract email "Contact matthias@objecthub.net or foo@bar.org" 10)
⟹ ("tthias@objecthub.net" "foo@bar.org")
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-extract series "New Season 3 Episode 12: Pilot")
⟹ ("Season 3 Episode 12")

Splits string str into a list of possibly empty substrings separated by non-empty matches of regular expression regexp within position start (inclusive) and end (exclusive). If regexp does not match any part of str between start and end, a list with str as its only element is returned. The default for start is 0; the default for end is the length of the string.

(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-split email "Contact matthias@objecthub.net or foo@bar.org" 10)
⟹ ("Contact ma" " or " "")
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-split series "New Season 3 Episode 12: Pilot")
⟹ ("New " ": Pilot")

Partitions string str into a list of non-empty strings matching regular expression regexp within position start (inclusive) and end (exclusive), interspersed with the unmatched portions of the whole string. The first and every odd element is an unmatched substring, which will be the empty string if regexp matches at the beginning of the string or end of the previous match. The second and every even element will be a substring fully matching regexp. If str is the empty string or if there is no match at all, the result is a list with str as its only element.

(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-partition email "Contact matthias@objecthub.net or foo@bar.org" 10)
⟹ ("Contact ma" "tthias@objecthub.net" " or " "foo@bar.org" "")
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-partition series "New Season 3 Episode 12: Pilot")
⟹ ("New " "Season 3 Episode 12" ": Pilot")

Returns a new string replacing all matches of regular expression regexp in string str within position start (inclusive) and end (exclusive) with string subst. regexp-replace will always return a new string, even if there are no matches and replacements.

The optional parameters start and end restrict both the matching and the substitution, to the given positions, such that the result is equivalent to omitting these parameters and replacing on (substring str start end).

(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(regexp-replace email "Contact matthias@objecthub.net or foo@bar.org" "<omitted>" 10)
⟹ "Contact ma<omitted> or <omitted>"
(define series
  (regexp "Season\\s+(\\d+)\\s+Episode\\s+(\\d+)"))
(regexp-replace series "New Season 3 Episode 12: Pilot" "Series")
⟹ "New Series: Pilot"

Mutates string str by replacing all matches of regular expression regexp within position start (inclusive) and end (exclusive) with string subst. The optional parameters start and end restrict both the matching and the substitution. regexp-replace! returns the number of replacements that were applied.

(define email
  (regexp "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"))
(define str "Contact matthias@objecthub.net or foo@bar.org")
(regexp-replace! email str "<omitted>" 10) ⟹ 2
str ⟹ "Contact ma<omitted> or <omitted>"

regexp-fold is the most fundamental and generic regular expression matching iterator. It repeatedly searches string str for the regular expression regexp so long as a match can be found. On each successful match, it applies (kons i regexp-match str acc) where i is the index since the last match (beginning with start), regexp-match is the resulting matching spec, and acc is the result of the previous kons application, beginning with knil. When no more matches can be found, regexp-fold calls finish with the same arguments, except that regexp-match is #f. By default, finish just returns acc.

(regexp-fold (regexp "(\\w+)")
             (lambda (i m str acc)
               (let ((s (substring str (caar m) (cdar m))))
                 (if (zero? i) s (string-append acc "-" s))))
             ""
             "to  be  or  not  to  be")
⟹ "to-be-or-not-to-be"

Previous(lispkit record)Next(lispkit set)

Last updated 2 years ago