(lispkit csv)

Library (lispkit csv) provides a simple API for reading and writing structured data in CSV format from a text file. The API provides two different levels of abstraction: reading and writing at

  1. line-level (lower-level API), and

  2. record-level (higher-level API).

A text file in CSV format typically has the following structure:

"First name", "Last name", "Birth date"
Steve, Jobs, 1955-02-24
Bill, Gates, "1955-10-28"
"Jeff", "Bezos", "1964-01-12"

The first line is called the header. It defines the names and the order of the columns. Columns are separated by a separator character (which is , in the example above). The column names can optionally be wrapped by a quotation character, which is needed if the name contains, for instance, the separator character.

Each following line defines one data record which provides values for the columns defined in the header. The values are again separated by the separator character and they may be optionally wrapped by the quotation character. If a value is wrapped with a quotation character, the same character can be used within the value if it is escaped. The quotation character can be escaped by a sequence of two quotation characters (e.g. if " is used as a quotation character, "" encodes a single " character within the string value).

The client of the API decides how to handle inconsistencies between the lines, e.g. if lines have too few or too many values.

CSV ports

Both levels use a CSV port to configure the textual input/output port, the separator and quotation character.

Returns #t if obj is a CSV port; returns #f otherwise.

Returns #t if obj is a CSV port for reading data; returns #f otherwise.

Returns #t if obj is a CSV port for writing data; returns #f otherwise.

Returns a new CSV port for reading or writing data via an underlying textual port tport. If tport is an output port, the CSV port can be used for writing. If tport is an input port, the CSV port can be used for reading. The default for tport is the current input port current-input-port exported from library (lispkit port).

The separation character used by the CSV port is sep, the quotation character is quote. The default for sep is #\, and for quote the default is #\".

Returns the textual port on which the CSV port csvp is based on.

Returns the separation character used by the CSV port csvp.

Returns the quotation character used by the CSV port csvp.

Line-level API

The line-level API provides means to read a whole CSV file via csv-read and write data in CSV format via csv-write.

Reads from CSV port csvp first the header, if readheader? is set to #t, and then all the lines until the end of the input is reached. Procedure csv-read returns two values: the header line (a list of strings representing the column names), and a vector of all data lines, which itself are lists of strings representing the individual field values. The default for readheader? is #t. If readheader? is set to #f, the first result of csv-read is always #f.

Writes to CSV port csvp first the header (a list of strings representing the column names) unless header is set to #f. Then procedure csv-write writes each line of lines. lines is a vector of lists representing the individual field values in string form.

Record-level API

The higher level API has a notion of records. The default representation for records are association lists. The functions for reading and writing records are csv-read-records and csv-write-records:

Reads from CSV port csvp first the header and then all the data lines until the end of the input is reached. Header names (strings) are mapped via procedure make-col into column identifiers or column factories (i.e. procedures which take one argument, a column value, and they return either a representation of this column if the value is valid, or #f if the column value is invalid). With make-record a list of column identifiers and column factories as well as a list of column values (strings) of a data line are mapped into a record. Procedure csv-read-records returns a vector of records.

The default make-col procedure is make-symbol-column. The default make-record function is make-alist-record/excess.

First writes the header to CSV port csvp by mapping header, which is a list of column identifiers. to a list of header names using procedure col->str. Then, csv-write-records writes all the records from the vector records by mapping each record to a data line. This is done by applying field->str to all column identifiers for the record. field->str takes two arguments: a column identifier and the record.

The default implementation for procedure col->str is symbol->string. The default implementation for procedure field->str is alist-field->string.

Returns a symbol representing the trimmed string str. If the trimmed string is empty, make-symbol-column returns #t. This procedure can be used for creating column identifers out of column names in procedure csv-read-records.

Returns a new record given a list of column identifiers or column factories (i.e. procedures which take one argument, a column value, and they return either a representation of this column if the value is valid, or #f if the column value is invalid) cols, and a list of column values fields.

This procedure represents records as association lists, iterating through all cols and fields values. If there are more fields values than cols expressions, than they are skipped. If there are more cols expressions than fields values, #f is used as a default for missing fields values. If a cols expression is a procedure, the association entry gets created by calling the procedure with the corresponding fields value. For all other cols expression types, a pair is created with the cols expression being the car and the fields value being the cdr.

Returns a new record given a list of column identifiers or column factories (i.e. procedures which take one argument, a column value, and they return either a representation of this column if the value is valid, or #f if the column value is invalid) cols, and a list of column values fields.

This procedure represents records as association lists, iterating through all cols and fields values. If there are more fields values than cols expressions, than #f is used as a default cols expression. If there are more cols expressions than fields values, #f is used as a default for missing fields values. If a cols expression is a procedure, the association entry gets created by calling the procedure with the corresponding fields value. For all other cols expression types, a pair is created with the cols expression being the car and the fields value being the cdr.

Returns the column value of column col from association list record. alist-field->string assumes that record is an association list whose values are strings. This is how the procedure is defined:

(define (alist-field->string record column)
  (cdr (assq column record)))

Last updated