(lispkit format)

Library (lispkit format) provides an implementation of Common Lisp's format procedure for LispKit. Procedure format can be used for creating formatted text using a format string similar to printf. The formatting formalism, though, is significantly more expressive, allowing users to display numbers in various formats (e.g. hex, binary, octal, roman numerals, natural language), applying conditional formatting, outputting text in a tabular format, iterating over data structures, and even applying format recursively to handle data that includes its own preferred formatting strings.

Usage overview

In its most simple form, procedure format gets invoked with a control string followed by an arbitrary number of arguments. The control string consists of characters that are copied verbatim into the output as well as formatting directives. All formatting directives start with a tilde (~) and end with a single character identifying the type of the directive. Directives may also take prefix parameters written immediately after the tilde character, separated by comma as well as modifiers (see below for details).

For example, the call of format below injects two integer arguments into the control string via directive ~D and returns the resulting string:

(format "There are ~D warnings and ~D errors." 12 7)
ā‡’ "There are 12 warnings and 7 errors."

Simple Directives

Here is a simple control string which injects a readable description of an argument via the directive ~A: "I received ~A as a response". Directive ~A refers to a the next argument provided to format when compiling the formatted output:

(format "I received ~A as a response" "nothing")
ā‡’ "I received nothing as a response"
(format "I received ~A as a response" "a long email")
ā‡’ "I received a long email as a response"

Directive ~A may be given parameters to influence the formatted output. The first parameter of ~A-directives defines the minimal length. If the length of the textual representation of the next argument is smaller than the minimal length, padding characters are inserted:

(format "|Name: ~10A|Location: ~13A|" "Smith" "New York")
ā‡’ "|Name: Smith     |Location: New York     |"
(format "|Name: ~10A|Location: ~13A|" "Williams" "San Francisco")
ā‡’ "|Name: Williams  |Location: San Francisco|"
(format "|Name: ~10,,,'_@A|Location: ~13,,,'-A|" "Garcia" "Los Angeles")
ā‡’ "|Name: ____Garcia|Location: Los Angeles--|"

The third example above utilizes more than one parameter and, in one case, includes a @ modifier. The directive ~13,,,'-A defines the first and the fourth parameter. The second and third parameter are omitted and thus defaults are used. The fourth parameter defines the padding character. If character literals are used in the parameter list, they are prefixed with a quote '. The directive ~10,,,'_@A includes an @ modifier which will result in padding of the output on the left.

It is possible to inject a parameter from the list of arguments. The following examples show how parameter v is used to do this for formatting a floating-point number with a configurable number of fractional digits.

(format "length = ~,vF" 2 pi)
ā‡’ "length = 3.14"
(format "length = ~,vF" 4 pi)
ā‡’ "length = 3.1416"

Here v is used as the second parameter of the fixed floating-point directive ~F, indicating the number of fractional digits. It refers to the next provided argument (which is either 2 or 4 in the examples above).

Composite Directives

The next example shows how one can refer to the total number of arguments that are not yet consumed in the formatting process by using # as a parameter value.

(format "~A left for formatting: ~#[none~;one~;two~:;many~]."
        "Arguments" "eins" 2)
ā‡’ "Arguments left for formatting: two."
(format "~A left for formatting: ~#[none~;one~;two~:;many~]."
        "Arguments")
ā‡’ "Arguments left for formatting: none."
(format "~A left for formatting: ~#[none~;one~;two~:;many~]."
        "Arguments", "eins", 2, "drei", "vier")
ā‡’ "Arguments left for formatting: many."

In these examples, the conditional directive ~[ is used. It is followed by clauses separared by directive ~; until ~] is reached. Thus, there are four clauses in the example above: none, one, two, and many. The parameter in front of the ~[ directive determines which of the clauses is being output. All other clauses will be discarded. For instance, ~1[zero~;one~;two~:;many~] will output one as clause 1 is chosen (which is the second one, given that numbering starts with zero). The last clause is special because it is prefixed with the ~; directive using a : modifier: this is a default clause which is chosen when none of the others are applicable. Thus, ~8[zero~;one~;two~:;many~] outputs many. This also explains how the example above works: here # refers to the number of arguments that are still available and this number drives what is being returned in this directive: ~#[...~].

Another powerful composite directive is the iteration directive ~{. With this directive it is possible to iterate over all elements of a sequence. The control string between ~{ and ~} gets repeated as long as there are still elements left in the sequence which is provided as an argument. For instance, Numbers:~{ ~A~} applied to argument ("one" "two" "three") results in the output Numbers: one two three. The control string between ~{ and ~} can also consume more than one element of the sequence. Thus, Numbers:~{ ~A=>~A~} applied to argument ("one" 1 "two" 2) outputs Numbers: one=>1 two=>2.

Of course, it is also possible to nest arbitrary composite directives. Here is an example for a control string that uses a combination of iteration and conditional directives to output the elements of a sequence separated by a comma: (~{~#[~;~A~:;~A, ~]~}). When this control string is used with the argument ("one" "two" "three"), the following formatted output is generated: (one, two, three).

Formatting language

Control strings consist of characters that are copied verbatim into the output as well as formatting directives. All formatting directives start with a tilde (~) and end with a single character identifying the type of the directive. Directives may take prefix parameters written immediately after the tilde character, separated by comma. Both integers and characters are allowed as parameters. They may be followed by formatting modifiers :, @, and +. This is the general format of a formatting directive:

~param1,param2,...mX

where

  • m is a potentially empty modifier, consisting of an arbitrary sequence of modifier characters :, @, and +

  • X is a character identifying a directive type

  • paramN is either a nummeric or character parameter according to the specification below.

The following grammar describes the syntax of directives formally in BNF:

<directive>  ::= "~" <modifiers> <char>
               | "~" <parameters> <modifiers> <char>
<modifiers>  ::= <empty>
               | ":" <modifiers>
               | "@" <modifiers>
               | "+" <modifiers>
<parameters> ::= <parameter>
               | <parameter> "," <parameters>
<parameter>  ::= <empty>
               | "#"
               | "v"
               | <number>
               | "-" <number>
               | <character>
<number>     ::= <digit>
               | <digit> <number>
<digit>      ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<character>  ::= "'" <char>

Formatting directives

The formatting directives supported by library (lispkit format) are based on the directives specified in Common Lisp the Language, 2nd Edition by Guy L. Steele Jr. Some directives have been extended to meet today's formatting requirements (e.g. to support localization) and to enable a powerful usage throughout LispKit. Extensions were introduced in a way to not impact backward compatibility.

Formatting configurations

A few formatting directives provided by procedure format require access to environment variables such as the locale, the width of tab characters, the length of lines, etc. Also the type-specific customization of the formatting of native and user-defined objects, e.g. via the ~S directive, is based on a formatting control registry defined by an environment variable.

All relevant environment variables are bundled together into format config objects. Format configurations are organized hierarchically. Each format configuration optionally refers to a parent configuration. It inherits all environment variables and allows their values to be overridden.

The root of this format configuration hierarchy constitutes base-format-config. Typically, changes to this object impact all invocations of format, unless format is called with a custom format config object which is not derived from base-format-config. Without a custom format config, format reads the environment variables from the current format config parameter current-format-config (which, by default, inherits from base-format-config). Like every other parameter object, it is possible to define a new config dynamically via parameterize.

Format config objects are also used in combination with type-specific formatting as provided by the ~S directive, as explained in the next section.

Type-specific formatting

Procedure format provides great means to format numbers, characters, strings, as well as sequences, i.e. lists and vectors. But as soon as values of data types encapsulating their state have to be output, only the default textual representation is supported, which is also used when a value is output via procedure write.

For this reason, procedure format supports the customization of how composite objects are formatted. The approach for doing this is simple: Internally, a composite object can be mapped ("unpacked") into a vector of "field values". These field values are then interpreted as arguments for an object type-specific control string which defines how the field values of such objects are formatted. If there is no object type-specific control string available, the object is output as if it was written via procedure write.

The following example shows how to customize the formatting of objects defined by a record type. The following record is used to model colored 2-dimensional points:

(define-record-type <point>
  (make-point x y c)
  point?
  (x point-x)
  (y point-y)
  (c point-color))

By default, objects of type <point> are output in the following way:

(define pt (make-point 7 13 (color 0.5 0.9 0)))
(format "~S" pt)
 ā‡’ "#<record <point>: x=7, y=13, c=#<color 0.5 0.9 0.0>>"

LispKit defines a type tag for every type. This type tag will later be used to define a custom format for records of type <point>. We can retrieve the type tag for type <point> via procedure record-type-tag:

(define point-type-tag (record-type-tag <point>))

Now we can define a custom format for objects of type <point> in which we refer to the unpacked fields in the order as defined in the <point> record type definition following a fixnum value denoting the identity of the record. The following control string formats <point> records in this way: point{x=?,y=?,color=?}. Note that it skips the record identity via the ~* directive.

"point{x=~*~S,y=~S,c=~S}"

format refers to a number of environment variables via a formatting configuration (see previous section). The default configuration is defined by definition base-format-config and it includes custom type-specific formats. With procedure format-config-control-set! we can declare that all objects of type <point> should be formatted with the control string shown above:

(format-config-control-set!
  base-format-config
  point-type-tag
  "point{x=~*~S,y=~S,c=~S}")

Formatting records of type <point> via the ~S directive is now based on this new control string.

(format "~S" pt)
 ā‡’ "point{x=7,y=13,c=#<color 0.5 0.9 0.0>}"

If we wanted to also change how colors are formatted, we could do that in a similar way:

(format-config-control-set!
  base-format-config
  color-type-tag
  "color{~S, ~S, ~S}")

Now colors are formatted differently:

(format "~S" pt)  ā‡’ "point{x=7,y=13,c=#<color 0.5 0.9 0.0>}"
(format "~S" (color 1.0 0.3 0.7))  ā‡’ "color{1.0, 0.3, 0.7}"

If we wanted to change the way how colors are formatted only in the context of formatting points, we could do that by creating a formatting configuration for colors and associate it only with the formatting control string for points. The following code first removes the global color format so that colors are formatted again using the default mechanism. Then it redefines the formatting control for points by also specifying a format configuration that is used while applying the point formatting control string.

(format-config-control-remove! base-format-config color-type-tag)
(format-config-control-set!
  base-format-config
  point-type-tag 
  "point{x=~*~S,y=~S,c=~S}"
  (format-config (list color-type-tag "color{~S, ~S, ~S}")))
(format "~S" (color 1.0 0.3 0.7))  ā‡’ "#<color 1.0 0.3 0.7>"
(format "~S" pt)  ā‡’ "point{x=7,y=13,c=color{0.5, 0.9, 0.0}}"

API

Symbol representing the format-config type. The type-for procedure of library (lispkit type) returns this symbol for all formatting configurations objects.

Formatting configurations can have parent configurations from which all formatting environment variables are being inherited. base-format-config is the root formatting configuration for repl-format-config and current-format-config.

The formatting configuration that a read-eval-print loop might use for displaying the result of an evaluation. Initially, repl-format-config is set to an empty formatting configuration with parent base-format-config.

Parameter object referring to the current formatting configuration that is used as a default whenever no specific formatting configuration is specified, e.g. by procedure format. Initially, current-format-config is set to an empty formatting configuration with parent base-format-config.

format is the universal formatting procedure provided by library (lispkit format). format creates formatted output by outputting the characters of the control string cntrl while interpreting formatting directives embedded in cntrl. Each formatting directive is prefixed with a tilde which might be preceded by formatting parameters and modifiers. The next character identifies the formatting directive and thus determines what output is being generated by the directive. Most directives use one or more arguments arg as input.

Formatting configuration config defines environment variables influencing the output of some formatting directives. If config is not provided, the formatting configuration from parameter object current-format-config is used. For convenience, some environment variables, such as locale, can be overridden if they are provided when format is being invoked. locale refers to a locale identifier like en_US that is used by locale-specific formatting directives. tabw defines the maximum number of space characters that correspond to a single tab character. linew specifies the number of characters per line; this is used by the justification directive only.

Returns #t if obj is a formatting configuration; otherwise #f is returned.

Creates a new formatting configuration with parent as parent configuration. If parent is not provided explicitly, current-format-config is used. If parent is #f, the new formatting configuation will not have a parent configuration. locale refers to a locale identifier like en_US that is used by locale-specific formatting directives. tabw defines the maximum number of space characters that correspond to a single tab character. linew specifies the maximum number of characters per line.

Creates a new formatting configuration with parent as parent configuration. If parent is #f, the new formatting configuation does not have a parent configuration. The remaining arguments define overrides for the environment variables inherited from parent.

locale refers to a locale identifier like en_US that is used by locale-specific formatting directives. tabw defines the maximum number of space characters that correspond to a single tab character. linew specifies the maximum number of characters per line.

Returns a copy of formatting configuration config. If either collapse? is omitted or set to #f, a 1:1 copy of config is being made. If collapse? is set to true, a new format config without parent configuration is created which contains the same values for the supported formatting environment variables as config.

Merges the format configurations child and parent by creating a new collapsed copy of child whose parent configuration parent is.

Returns the locale defined by format configuration config. If config defines a locale itself, it is being returned. Otherwise, the locale of the parent configuration of config gets returned. If config is not provided, the default configuration current-format-config is used.

Sets the locale of the format configuration config to locale. If locale is #f, the locale setting gets removed from config (but might still get inherited from config's parents). If config is not provided, the default configuration current-format-config gets mutated.

Returns the width of a tab character in terms of space characters defined by format configuration config. If config defines a tab width itself, it is being returned. Otherwise, the tab width of the parent configuration of config gets returned. If config is not provided, the default configuration current-format-config is used.

Sets the tab width of the format configuration config to tabw. If tabw is #f, the tab width setting gets removed from config (but might still get inherited from config's parents). If config is not provided, the default configuration current-format-config gets mutated. The "tab width" is the maximum number of space characters representing one tab character.

Returns the maximum number of characters per line defined by format configuration config. If config defines a line width itself, it is being returned. Otherwise, the line width of the parent configuration of config gets returned. If config is not provided, the default configuration current-format-config is used.

Sets the line width of the format configuration config to linew. If linew is #f, the line width setting gets removed from config (but might still get inherited from config's parents). If config is not provided, the default configuration current-format-config gets mutated. The "line width" is the maximum number of characters per line.

Declares for formatting configuration config that objects whose type has type tag tag are being formatted with control string cntrl by formatting directive ~S. If formatting configuration sconf is provided, it is used as a type-specific configuration that is merged with the current configuration when ~S formats objects of type tag tag. If cntrl is #f, type-specific formatting rules for tag are being removed from conf (but might still be inherited from the parent of conf). If cntrl is #t, native formatting is being forced for tag, no matter what is inherited from the parent of config. If config is not provided, the default configuration current-format-config gets mutated.

Removes any type-specific formatting with directive ~S for objects whose type has tag tag from formatting configuration config. If config is not provided, the default configuration current-format-config gets mutated.

Returns a list of type tags, i.e. symbols, for which there is a type-specific formatting control string defined by formatting configuration config or its parents. If config is not provided, the default configuration current-format-config gets mutated.

Returns the parent configuration of format configuration config. If config is not provided, the default configuration current-format-config is used. format-config-parent returns #f if config does not have a parent formatting configuration.

Last updated