Duct Tape Programming With Chicken Scheme


I have heard people use the term "duct tape programmer" to refer to programmers who get software completed efficiently, often at the expense of obsessing over things like code structure, modularity, performance, etc. I will define duct taping here as the practice of loosely integrating a simple, pre-existing solution into a toolchain in favor of a tightly-integrated solution that would usually involve substantially more effort to implement.

For example, if a particular programming language does not have a library that can read a Microsoft Access database, duct taping that ability on might involve calling on another tool to convert the database to a simpler format. For instance, an Access to XML converter.

Since I typically use Python at work, it’s rare for me to need functionality that is not readily available as a Python library. On a handful of occasions, I have written code to generate input files for LaTeX to render as PDF tables. That's about it.

I thought it might be fun to see how difficult or easy it is to duct tape functionality onto a less mainstream language. I am going to see how Chicken Scheme might be used accomplish some tasks I frequently need to do at work without libraries readily available.

Plotting

Thankfully, a language-independent plotting solution already exists: Gnuplot. In general, software like Gnuplot which follows the rule of composition is great for adding functionality to a language.

The default plots in Gnuplot look like something from an ugly 1980s technical journal (see for yourself), but that’s okay. Python's Matplotlib doesn’t exactly have great-looking defaults either.

With a little bit of effort, I was able to get Gnuplot to generate something more acceptable-looking to me (full page pdf, gnuplot file)

At work, I sometimes need to generate hundreds of plots at a time. A simple way to do this using Chicken Scheme and Gnuplot could be to create a template with placeholder tags and reuse it for all plots that are of the same type.


set terminal svg size 400,300 enhanced font 'Helvetica,13'
set title '?title?'
set style line 1 lt 1 lw 2 pt 7 ps 0.2 linecolor rgb '#3399ff'
set xrange [-3:3]
unset key

set output '?output?'
plot "?data?" using 1:2 with lines ls 1
(template.gp)

The template tags in the example above are surrounded by question marks (e.g., ?output?). Below is a small example that generates 5 plots in Chicken Scheme using the above template.


(use srfi-1)
(use srfi-13)
(use utils)
(use shell)

(define-record PlotData
  function
  title)

;;-------------------------------------------------------------------------
;; Utility functions
;; ... These functions would normally be in a separate module to be
;; ... imported and reused.
;;------------------------------------------------------------------------

;; Works like Python's str.replace function
(define (my-string-replace str s1 s2)
  (let ((str-idx (string-contains str s1)))
    (if (boolean? str-idx)  ;; Returns #f if substring not found
    str
    (string-replace str s2
            str-idx (+ str-idx (string-length s1))
            0 (string-length s2))
    )))

;; Writes two arrays to an ASCII file as space-separated columns
(define (to-ascii-columns xs ys outfile)
  (define (write-data fhandle xs ys)
    (if (or (null? xs) (null? ys))
    '()
    (begin
      (format fhandle "~A ~A" (car xs) (car ys))
      (newline fhandle)
      (write-data fhandle (cdr xs) (cdr ys)))))
  (call-with-output-file outfile
    (lambda (fout)
      (write-data fout xs ys))))

(define (render-tag str tagpair)
  (let ((tag (car tagpair))
    (render-to (cdr tagpair)))
    (my-string-replace str tag render-to)))

;; Renders tags in a template. Tags can be in any format.
(define (render-tags str tags)
  (if (null? tags)
      str
      (render-tags (render-tag str (car tags)) (cdr tags))))

;;-----------------------------------------------------------------------
;; Example implementation
;;-----------------------------------------------------------------------
(define (generate-plot pd xs)
  (let ((template (read-all "template.gp"))
    (title (PlotData-title pd))
    (f (PlotData-function pd)))
    (begin
      (define output (render-tags
              template
              (list (cons "?title?" title)
                (cons "?data?" "outfile.dat")
                (cons "?output?" (string-append title ".svg")))))
      (call-with-output-file "temp.gp"
    (lambda (fout)
      (format fout output)))
      (to-ascii-columns xs (map f xs) "outfile.dat")
      (run "gnuplot temp.gp")
      )))
    

(define x0 -3)
(define xf 3)
(define xn 1000)
(define dx (/ (- xf x0) (- xn 1)))
(define xs (iota 1000 -3 dx))

(define plots (list
 (make-PlotData sin "Sine")
 (make-PlotData cos "Cosine")
 (make-PlotData sqrt "Square Root")
 (make-PlotData (lambda (x) (* x x)) "Square")
 (make-PlotData abs "Absolute Value")
 ))

(map (lambda (p) (generate-plot p xs)) plots)

(ex_gp_template.scm)

Aside from writing a couple functions to process data from another source (like a database), batch plotting just about anything else could be done in a nearly identical way to what is shown above.

Reading Excel

Since the .xlsx format is just zipped XML files, I could send a shell command to unzip the file and parse out the cell data. However, even if I thought I understood the file format by casually inspecting the XML innards, I cannot be sure without reading the file specification, which is nearly 1000 pages long. I am not interested in doing that.

There does not seem to be any free and mature C or C++ libraries that read .xlsx. Python on the other hand, has a couple. And, it is trivial to write a Python script to convert from .xlsx to SQLite. Most languages, including Chicken Scheme, tend to have a decent SQLite wrapper.

I used the xlrd library to read in Excel files. Since I only ever need cell values and types, my requirements for an SQLite database schema are pretty minimal.


create table workbook (
    workbook text primary key
  , date_mode int check(date_mode = 0 or date_mode = 1)
);

create table worksheets (
    sheet_index int primary key
  , worksheet text
  , nrows int
  , ncols int
);

create table cells (
    worksheet text references worksheets(worksheet)
  , row int
  , column int
  , value string
  , cell_type int
);

The Python code is similarly straight-forward


#!/usr/bin/python3

from datetime import datetime
import xlrd
import sqlite3
import sys
import os

CREATE_TABLES_SQL = "" # See SQL above

def createSqliteDB(db_name):
    if os.path.exists(db_name):
        os.remove(db_name)

    conn = sqlite3.connect(db_name)
    curs = conn.cursor()

    curs.executescript(CREATE_TABLES_SQL)
    conn.commit()

    return conn, curs

def importWorksheet(conn, curs, ws, index):
    sql = """
    insert into worksheets values (%d, '%s', %d, %d)
    """ % (index, ws.name, ws.nrows, ws.ncols)

    curs.execute(sql)

    for row in range(0, ws.nrows):
        for col in range(0, ws.ncols):
            value = ws.cell(row, col).value
            cell_type = ws.cell(row, col).ctype
            sql = """
            insert into cells values ('%s', %d, %d, '%s', %d)
            """ % (ws.name, row, col, value, cell_type)
            curs.execute(sql)


def importXlsx(conn, curs, infile):
    wb = xlrd.open_workbook(infile)
    wb_name = os.path.basename(infile)
    wb_datemode = wb.datemode

    sql = """
    insert into workbook values ('%s', %d)
    """ % (wb_name, wb_datemode)
    curs.execute(sql)

    for idx in range(0, wb.nsheets):
        ws = wb.sheet_by_index(idx)
        importWorksheet(conn, curs, ws, idx)

    conn.commit()

def main():
    if len(sys.argv) != 3:
        print("Usage: xlsxtosqlite.py <xlsx infile> <sqlite outfile>")

    infile = sys.argv[1]
    outfile = sys.argv[2]
            
    conn, curs = createSqliteDB(outfile)
    importXlsx(conn, curs, infile)

if __name__ == "__main__":
    main()

(xlsxtosqlite.py)

I am aware that Python’s sqlite3 library allows placeholder arguments in queries, but since I often need to use MS Access at work and have had difficulty using similar helper functions with the pyodbc library when interacting with Access, I have gotten in the awful habit of not using them. I do not do web design, so SQL injection is not a concern.

Excel dates are stored as floating-point values and there are two different formats: (1) days elapsed since january 1, 1900 and (2) days elapsed since January 1, 1904. The floating point date value might suffice for some applications, but since Chicken Scheme has a good datetime module, I utilized it in my wrapper to automatically cast Excel's numeric dates into a Chicken datetime.

With a little bit of effort, I was able to put together a wrapper around my xlsx/sqlite conversion script and the sqlite3 Chicken library. Basic typecasting is done on the fly (xlsx.import.scm).

Now I can do read an .xlsx file in Scheme like this:


(import xlsx)
(define wb (xlsx-open-workbook "samples/Financial Sample.xlsx"))
(define ws (xlsx-sheet-by-index wb 0))

(define headers (map
         (lambda (col) (xlsx-cell-value ws 0 col))
         (iota (worksheet-ncols ws))))

Conclusion

Duct tape solutions are often simpler than writing bindings to a C library or writing a library from scratch. I would never find the time to implement a stable .xlsx or plotting library in Scheme on my own, and I would not want to anyway.

I do not see myself switching to a language like Chicken Scheme anytime soon for work. The integrated solutions are too convenient. Every programmer should probably know at least one language like Python, C#, or Java that can just get shit done. That being said, it is fun and educational on occasion to branch out. :)