(lispkit string)

Strings are sequences of characters. In LispKit, characters are UTF-16 code units. Strings are written as sequences of characters enclosed within quotation marks ("). Within a string literal, various escape sequences represent characters other than themselves. Escape sequences always start with a backslash \:

  • \a: alarm (U+0007)

  • \b: backspace (U+0008)

  • : character tabulation (U+0009)

  • : linefeed (U+000A)

  • : return (U+000D)

  • \": double quote (U+0022)

  • \\: backslash (U+005C)

  • \|: vertical line (U+007C)

  • \line-end: used for encoding multi-line string literals

  • \xhex-scalar-value;: specified character

The result is unspecified if any other character in a string occurs after a backslash.

Except for a line ending, any character outside of an escape sequence stands for itself in the string literal. A line ending which is preceded by a backslash expands to nothing and can be used to encode multi-line string literals.

(display "The word \"recursion\" has many meanings.")  ⇒
The word "recursion" has many meanings.
(display "Another example:\ntwo lines of text.")  ⇒
Another example:
two lines of text.
(display "\x03B1; is named GREEK SMALL LETTER ALPHA.")  ⇒
α is named GREEK SMALL LETTER ALPHA.

The length of a string is the number of characters, i.e. UTF-16 code units, that it contains. This number is an exact, non-negative integer that is fixed when the string is created. The valid indexes of a string are the exact non-negative integers less than the length of the string. The first character of a string has index 0, the second has index 1, and so on.

Some of the procedures that operate on strings ignore the difference between upper and lower case. The names of the versions that ignore case end with -ci (for “case insensitive”).

LispKit only supports mutable strings.

Basic constructors and procedures

The make-string procedure returns a newly allocated string of length k. If char is given, then all the characters of the string are initialized to char, otherwise the contents of the string are unspecified.

Returns a newly allocated string composed of the arguments. It is analogous to procedure list.

Returns a newly allocated string composed of the characters contained in list.

The string-ref procedure returns character k of string str using zero-origin indexing. It is an error if k is not a valid index of string str.

The string-set! procedure stores char in element k of string str. It is an error if k is not a valid index of string str.

Returns the number of characters in the given string str.

Predicates

Returns #t if obj is a string; otherwise returns #f.

Returns #t if str is an empty string, i.e. a string of length 0. Otherwise, string-empty? returns #f.

Returns #t if all the strings have the same length and contain exactly the same characters in the same positions; otherwise string=? returns #f.

These procedures return #t if their arguments are (respectively): monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically non-increasing. These predicates are transitive.

These procedures compare strings in a lexicographic fashion; i.e. string<? implements a the lexicographic ordering on strings induced by the ordering char<? on characters. If two strings differ in length but are the same up to the length of the shorter string, the shorter string would be considered to be lexicographically less than the longer string.

A pair of strings satisfies exactly one of string<?, string=?, and string>?. A pair of strings satisfies string<=? if and only if they do not satisfy string>?. A pair of strings satisfies string>=? if and only if they do not satisfy string<?.

Returns #t if, after case-folding, all the strings have the same length and contain the same characters in the same positions; otherwise string-ci=? returns #f.

These procedures compare strings in a case-insensitive fashion. The "-ci" procedures behave as if they applied string-foldcase to their arguments before invoking the corresponding procedures without "-ci".

Returns #t if string str contains string sub; returns #f otherwise.

Returns #t if string str has string sub as a prefix; returns #f otherwise.

Returns #t if string str has string sub as a suffix; returns #f otherwise.

Composing and extracting strings

Many of the following procedures accept an optional start and end argument as their last two arguments. If both or one of these optional arguments are not provided, start defaults to 0 and end defaults to the length of the corresponding string.

This procedure checks whether string sub is contained in string str within the index range start to end. It returns the first index into str at which sub is fully contained within start and end. If sub is not contained in the substring of str, then #f is returned.

The substring procedure returns a newly allocated string formed from the characters of string str beginning with index start and ending with index end. This is equivalent to calling string-copy with the same arguments, but is provided for backward compatibility and stylistic flexibility.

Returns a newly allocated string whose characters are the concatenation of the characters in the given strings str ....

Returns a newly allocated string whose characters are the concatenation of the characters in the strings contained in list. sep is either a character or string, which, if provided, is used as a separator between two strings that get concatenated. It is an error if list is not a proper list containing only strings as elements.

These procedures apply the Unicode full string uppercasing, lowercasing, titlecasing, and case-folding algorithms to their argument string str and return the result as a newly allocated string. It is not guaranteed that the resulting string has the same lenght like str. Language-sensitive string mappings and foldings are not used.

Procedure string-normalize-diacritics transforms the given string str by normalizing diacritics and returning the result as a newly allocated string.

(string-normalize-diacritics "Meet Chloë at São Paulo Café")
"Meet Chloe at Sao Paulo Cafe"

Procedure string-normalize-separators normalizes string str by replacing sequences of separation characters from character set cset with string or character sep. If sep is not provided, " " is used as a default. If cset is not provided, all unicode newline and whitespace characters are used as a default for cset. cset is either a string of separation characters or a character set as defined by library (lispkit char-set).

Procedure string-encode-named-chars returns a new string, replacing characters with their corresponding named XML entity in string str. If parameter required-only? is set to #f, all characters with corresponding named XML entities are being replaced, otherwise only the required characters are replaced.

(string-encode-named-chars "<one> & two = 3")
"&LT;one&gt; &AMP; two &equals; 3"
(string-encode-named-chars "<one> & two = 3" #t)
"&lt;one&gt; &amp; two = 3"

Procedure string-decode-named-chars returns a new string, replacing named XML entities with their corresponding character.

(string-decode-named-chars "2&Hat;&lcub;3&rcub; &equals; 8")
"2^{3} = 8"

Returns a newly allocated copy of the part of the given string str between start and end. The default for start is 0, for end it is the length of str. Calling string-copy is equivalent to calling substring with the same arguments. substring is provided primarily for backward compatibility.

Procedure string-split splits string str using the separator sep and returns a list of the component strings, in order. sep is either a string or a character. Boolean argument allow-empty? determines whether empty component strings are dropped. allow-empty? is #t by default.

(string-split "name-|-street-|-zip-|-city-|-" "-|-")  ⇒  ("name" "street" "zip" "city" "")
(string-split "name-|-street-|-zip-|-city-|-" "-|-" #f)  ⇒  ("name" "street" "zip" "city")

Returns a newly allocated string by removing all characters from the beginning and end of string str that are contained in chars. chars is either a string or it is a character set. If chars is not provided, whitespaces and newlines are being removed.

(string-trim "  lispkit is fun ")                             ⇒  "lispkit is fun"
(string-trim "________" "_")                                  ⇒  ""
(string-trim "712+72=784" (char-set->string char-set:digit))  ⇒  "+72="
(string-trim "712+72=784" char-set:digit)                     ⇒  "+72="

Procedure string-pad-right returns a newly allocated string created by padding string str at the beginning of the string with character char until it is of length k. If k is less than the length of string str, the resulting string gets truncated at length k if boolean argument force-length? is #t; otherwise, the string str gets returned as is.

(string-pad-right "scheme" #\space 8)    ⇒  "scheme  "
(string-pad-right "scheme" #\x 4)        ⇒  "scheme"
(string-pad-right "scheme" #\x 4 #t)     ⇒  "sche"
(string-pad-right "scheme" "_" 10)       ⇒  "scheme____"

Procedure string-pad-left returns a newly allocated string created by padding string str at the beginning of the string with character char until it is of length k. If k is less than the length of string str, the resulting string gets truncated at length k if boolean argument force-length? is #t; otherwise, the string str gets returned as is.

(string-pad-left "scheme" #\space 8)    ⇒  "  scheme"
(string-pad-left "scheme" #\x 4)        ⇒  "scheme"
(string-pad-left "scheme" #\x 4 #t)     ⇒  "heme"
(string-pad-left "scheme" "_" 10)       ⇒  "____scheme"

Procedure string-pad-center returns a newly allocated string created by padding string str at the beginning and end with character char until it is of length k, such that str is centered in the middle. If k is less than the length of string str, the resulting string gets truncated at length k if boolean argument force-length? is #t; otherwise, the string str gets returned as is.

(string-pad-center "scheme" #\space 8)  ⇒  " scheme "
(string-pad-center "scheme" #\x 4)      ⇒  "scheme"
(string-pad-center "scheme" #\x 4 #t)   ⇒  "heme"
(string-pad-center "scheme" "_" 10)     ⇒  "__scheme__"

Manipulating strings

Replaces all occurences of string sub in string str between indices start and end with string repl and returns the number of occurences of sub that were replaced.

Replaces the first occurence of string sub in string str between indices start and end with string repl and returns the index at which the first occurence of sub was replaced.

Replaces the part of string str between index start and end with string repl. The default for start is 0, for end it is start (i.e. if not provided, end is equals to start). If both start and end are not provided, string-insert! inserts repl at the beginning of str. If start is provided alone (without end), string-insert! inserts repl at position start.

(define s "Zenger is my name")
(string-insert! s "Matthias ")
s  ⇒  "Matthias Zenger is my name"
(string-insert! s "has always been" 16 18)
s  ⇒  "Matthias Zenger has always been my name"

Appends the strings other, ... to mutable string str in the given order.

Copies the characters of string from between index start and end to string to, starting at index at. If the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. It is an error if at is less than zero or greater than the length of string to. It is also an error if (- (string-length to) at) is less than (- end start).

The string-fill! procedure stores fill in the elements of string str between index start and end. It is an error if fill is not a character.

Iterating over strings

The string-map procedure applies procedure proc element-wise to the characters of the strings str ... and returns a string of the results, in order. If more than one string str is given and not all strings have the same length, string-map terminates when the shortest string runs out. It is an error if proc does not accept as many arguments as there are strings and returns a single character.

(string-map char-foldcase "AbdEgH")  ⇒  "abdegh"
(string-map (lambda (c) (integer->char (+ 1 (char->integer c)))) "HAL")  ⇒  "IBM"

The arguments to string-for-each are like the arguments to string-map, but string-for-each calls proc for its side effects rather than for its values. Unlike string-map, string-for-each is guaranteed to call proc on the characters of the strings in order from the first character to the last. If more than one string str is given and not all strings have the same length, string-for-each terminates when the shortest string runs out. It is an error for proc to mutate any of the strings. It is an error if proc does not accept as many arguments as there are strings.

Converting strings

The string->list procedure returns a list of the characters of string str between start and end preserving the order of the characters.

Input/Output

Reads the text file at path and stores its content in a newly allocated string which gets returned by read-file.

Writes the characters of string str into a new text file at path. write-file returns #t if the file could be written successfully; otherwise #f is returned.

Last updated