A brief introduction to Lisp


Introduction

Yacas owes much of its current form to Lisp. A lot of ideas from Lisp have been taken over into the Yacas interpreter. This book aims to describe this connection, to show how Yacas compares to other Lisp dialects (Yacas can be viewed as a dialect of Lisp), and how the Yacas interpreter is related to Lisp.

Viewed from the top level, Yacas is just one function; Yacas is called with one, or possibly two arguments, or from a different point of view, a request is made to the Yacas system. How the function performs the action is implementation-dependent, but at the top-level it can be viewed as one call. From c++ or Java, it might look like

yacas.Evaluate("expression");

Or from Lisp,

(yacas expression environment)

The environment argument is optional; it can be a global resource that doesn't have to be passed in.

For instance, taking the derivative of a function could look like

yacas.evaluate("D(x)Sin(x);");

when in c++, or

(yacas '(D x (sin x)) )

when in Lisp. The quote in Lisp makes it clear that the argument, the parameter to the request, is actually data, and not program. This is an important distinction which has to be made clear in Lisp, but is clearer in other environments, due to the form of data being distinct from the form of program.

To drive home this point, (* a 1) would raise an error when given to the Lisp eval function if the variable a is not bound to a value, but would be perfectly valid data when passed to a CAS. In fact, one would like it to return a in this case. The * operator is actually a function call when invoked from Lisp ( (* a 1) is a program), but is data when passed to a CAS.

One can conclude that Yacas, an engine for simplifying symbolic expressions and performing calculations, is just one huge function that performs that task, and has one entry point, a function call with one argument as the data to the request.

Of course this is a bit of an oversimplification. We are not just interested in evaluating and simplifying expressions. When Yacas is embedded inside an application that offers a user interface, one would want to have access to all the operations allowed, and their arguments. Furthermore, for a debugger, information about the internal state of the interpreter is needed (values of global variables, etcetera).

At the end of the day we want to arrive at a language where we can implement an algorithm in a few lines of code, and have the compiler generate efficient code for it.

Given the strong typed Lisp we are using here, how is it different from using c++/Java directly?

The difference is in the syntax. Algorithms for manipulating symbolic expressions are written down much more naturally in a language that supports these operations natively. The code to perform symbolic manipulation operations can look surprisingly complex when written in c++ or Java. Later on, a compiler for the Lisp dialect developed here that will try to compile to human-readable source code will be described. Although the results are somewhat readable, they are a lot less readable than the original Lisp or Yacas script code they were generated from.

Convenient notation aids the programmer greatly in thinking about a problem. It is not unusual in mathematics to devise a new notation scheme to make life easier. For example, the bra ket notation, where <a|b> stands for the inproduct defined as

Integrate()a*b

has the added advantage of being able to write down projection operators of the form |b><a| which would be a lot more cumbersome to deal with without this notation.

Flexibility is not lost in a strong typed system; one can define a type EXPRESSION, which (naturally) represents an expression that can be manipulated. This could even be the default type for variables for which types are not specified, to make life easier for the programmer. This is not advisable, though, as the section describing the compiler will make clear.

This section will first introduce the reader to the basic concepts of Lisp, and then proceed to show how the yacas function can be implemented in Lisp in general. Clearness of concepts explained will be favoured over efficiency and performance. A small subset of Lisp will be used, so it should be easy to take the code along to various different Lisp dialects, or to write a little Lisp implementation to implement the yacas function.

The first chapter briefly describes a small Lisp dialect. A small subset of what is usually found in the average Lisp interpreter is described. The Lisp dialect is different in small subtle ways from normal Lisp implementations. Simplicity is put first.

For the Lisp dialect presented, one was chosen that is different from the Lisp dialect that represents Yacas. This separation is done explicitly, on purpose, to facilitate implementation of execution of algorithms written out in Yacas syntax from any other language. All the Yacas-specific details have to be handled by the routine performing these algorithms, independent of language constructs that might support it.

Even though in Lisp program and data look the same, for Yacas they are distinctly different in that data will likely not be evaluated by the default evaluator. It is important to keep the distinction between data and program.


Why choose Lisp over other languages?

Lisp has some powerful options not available in for instance c++/Java. The syntax is closer to the way a human thinks about transforming expressions, and the resulting routines are thus shorter and easier to read. The same routine, when written out in Lisp or in c++, is much more readable and natural in Lisp than in c++, for instance. c++ syntax is made for other purposes.

Also, the typed Lisp dialect developed in this book is easily translated to other languages. This removes the argument of Lisp being less efficient; the code can be converted to efficient c++/Java code. What is more, the code is easy to parse and transform into these languages, giving a language-independent specification of algorithms.


A little history of Lisp

Lisp is one of the older computer programming languages in existence. It came from a mathematical background, lambda calculus, and is used even now. Arguably, most programming languages owe a lot to Lisp.

There are various sources on the internet describing the early history of Lisp.


The Lisp interactive command line interface

When Lisp is started, the user is generally greeted with a prompt. The Lisp interpreter is in its read-eval-print loop; it reads in an expression typed in by the user, evaluates it, and prints the result of evaluation to the console.

The definition of an expression can be given such;

So, the words foo and bar could be called atoms. A list is represented by writing the expressions in the list out with spaces in between them, and brackets around them. So, (foo bar) would be a list with two elements, foo and bar.

In Lisp, each expression can evaluate to a value (if no error occurred). When evaluating an atom, the atom can either be a 'special' atom that evaluates to itself, or otherwise a lexical word that refers to a variable. When it is a variable, the Lisp system looks up its value, and returns the value the variable is bound to, in what is called the 'environment' the Lisp interpreter is evaluating in. When a variable is not bound, an error is raised. The noted difference with the usual Lisp dialects is that a normal Lisp system raises an error when an expression tries to evaluate a variable that is not bound to a value, whereas in a CAS, the variable is a free parameter in some expression.

When a list of expressions is encountered, the first element in the list is a function that should be called, and the rest of the expressions in the list are the arguments. The interpreter handles the list by first evaluating the arguments (the rest of the list), and then calling the function with the evaluated arguments as parameters to the function. There are exceptions to this; the function quote returns its argument unevaluated. In stead of writing (quote <expression>), a common short-hand is '<expression>, eg. the expression with a quote in front of it. Quotes are used a lot, to tell the system that data is being passed in.

For the following example interaction, the following commands are used:

The following is an interaction with our little lisp dialect. In the usual Lisp dialects, when variables are unbound, an error is raised.

In> a
Error, variable has no value : a

Lisp dialects are usually case-insensitive (the atom a is the same as A).

Here the atom a evaluated to itself, because it was not assigned.

In> (setf a '(this is a list))
Out> (this is a list )
In> a
Out> (this is a list )

Here we assigned the value (this is a list) to the variable a, and a now evaluates to that list. The system looked up the value bound to the variable a without re-evaluating it.

In> 'a
Out> a
In> (eval a)
ERROR: function not defined : this
Out> nil

With quote we can withhold evaluating a, and with eval we can re-evaluate a. Since this is not a defined function in this case, an error is raised.

Note that evaluation as defined in Lisp dialects is less suitable for CAS systems. A CAS would return expressions unevaluated, which is what you want in a CAS. It then represents some function which has not been specified yet. In a Lisp intepreter, it is nice to have the interpreter raise an error, but in a CAS, where it is data, it should not be simplified further.

For atoms that refer to variables that are not yet bound, the variables should be returned unevaluated by default, and functions that are called but not defined should be returned as the same function but with their arguments evaluated. evaluation as defined in most Lisp dialects is not suited to simplifying expressions. Later in this book an evaluator will be developed in Lisp itself that is closer to a sense of 'simplification' required for CAS.

Sometimes the user wants to refer to a function without having to define it, because transformation rules for the function can be given without explaining explicitly how to go about evaluating the function. Certain functions have properties, and sometimes it is easier to work with the properties than to evaluate them explicitly.

For the rest of this book, it is assumed that the semi-colon, ;, means that the rest of the input line up to the end of line or end of file is a human readable comment, not to be executed.


A basic set of Lisp primitive commands

Surprisingly few Lisp functions are needed to be able to implement any algorithm in Lisp. Languages are usually referred to as being Turing complete if any algorithm can be implemented in them. Turing showed that a small apparatus, called a Turing machine, was universal in that theoretically any algorithm could be implemented in it. The way to show a language is Turing complete is to implement a Turing machine in that language. if that is done, that language can execute any algorithm, as a Turing machine can be implemented in it, and Turing machines can execute any algorithm.

Note that building a Turing machine is just a theoretical exercise. Turing machines don't concern themselves with efficiency. Algorithms can be run, in finite time, but it will most certainly not be the most efficient way to execute the algorithm on a real computer.

Methods to make execution more efficient break down in two steps:

This section describes a minimum set of Lisp primitives needed for building a Turing machine, theoretically. After that, first numeric operations are introduced, before implementing a Turing machine. Fundamentally, the language does not need to support arithmetic in order to be able to implement any algorithm, as other constructs using lists can be used to represent and work with numbers, but not implementing it does make the effort rather complex.

The following sections will describe the following minimal set of commands;

In the next sections, whenever a function call is explained, the arguments to the function will be delimited by < and >. Arguments are evaluated unless stated otherwise.


Naming conventions

The set of primitives described in this section are meant to be used as building blocks for building a final bigger system.


Controlling evaluation

To control whether evaluation is performed, the following commands are available:


Assigning variables and functions

In addition, functions and macros can be defined:

The default set of functions implemented in the core of a Lisp interpreter can be divided between function-like and macro-like functions, in that when a function-like (defun) function is called, its arguments are evaluated first. Examples of this are cons, first and rest.

When calling macro-like (defmacro) functions, however, the arguments are not evaluated first, and the variables defined in the calling environment are visible while expressions using arguments on the command line are evaluated. A macro can be viewed as a bit of code that got pasted in to the place where it got called, by the interpreter (in fact, this is what a compiler can do).

The defmacro function allows the programmer to expand the language by adding functions that act as if they were defined in the core part of the interpreter.

Macros are not necessary, strictly speaking. Macros clean up the syntax of a program, in that it allows a programmer to write more clean and brief code. For example, the expression

(if (equal a b) () (print a))

could be written more concisely as

(unless (equal a b) (print a))

if unless were defined. But it would have to be defined as a macro, as when unless is called, the arguments should not be evaluated first. Unless can be defined as:

(defmacro unless (predicate body) 
  (if predicate nil body) )    

This makes unless work, as the following interaction shows:

In> (unless t (print 'SOMETHING))
Out> nil
In>  (unless nil (print 'SOMETHING))
SOMETHING 
Out> SOMETHING

Especially when code is generated, and does not have to be human-readable, macros are not necessary. But they are a nice feature nonetheless, and have to be implemented in the interpreter, it can not be faked with normal defun functionality.

Examples involving defun:

In> (defun foo (atom) (cons atom (list rest)))
Out> foo
In> (setf rest 'end)
Out> end
In> (foo a)
Out> (a end )
In> (foo b)
Out> (b end )

In addition, nameless (pure) functions can be made, which work exactly as their defun/defmacro counterparts, as lambda and macro:

Examples:

In> (setf add '(lambda (x y) (+ x y)))
Out> add
In> (add 2 3)
Out> 5

Note that pure functions, the lambda and macro expressions described here, are not strictly necessary, if we have equivalents for defun and defmacro. But pure unnamed functions do allow for convenient notation, without clobbering up the global name space. They also allow for locally defined functions. If they are required, they have to be part of the interpreter, since it cannot be faked with defun and defmacro.

An interesting option is to not define the operations defun and defmacro, but just setf, lambda and macro. This set is just as powerful, and in fact allows you to create defun and defmacro from them. We choose defun and defmacro over the route of using setf, lambda and macro to define defun and defmacro because lambda and macro are not strictly required to build a CAS. When implementing a more mature Lisp interpreter, it is wise to define lambda and macro, for the convenience of the programmer.


Creating and disseminating lists of expressions

Lists of expressions are used to build more elaborate structures. The following commands are available for doing so:

To disseminate a list:

Examples, taken from an interaction with clisp:

In> (first '(a b c))
Out> a
In> (rest '(a b c))
Out> (b c )

To reconstruct a list:

Example taken from an interaction with clisp:

In> (cons 'a '(b c))
Out> (a b c )
In> (list (+ 2 3) 4 3)
Out> (5 4 3 )

One important function is assoc:

Examples:

In> (setf list '( ("name" "Peter") ("age" 28) ) )
Out> list
In> (assoc '"name" list)
Out> ("name" "Peter" )
In> (assoc '"age" list)
Out> ("age" 28 )
In> (assoc '"something" list)
Out> nil


Predicates

Common Lisp has two special atoms, t and nil. t stands for true, whereas nil stands for false. nil is actually usually also the empty list. Typing an empty list () is the same as typing nil:

In> ()
Out> nil
In> nil
Out> nil

In Common Lisp, nil is considered to mean false when considered as a boolean result from a list, and anything different from nil is considered to mean true. This is then used as an efficient way to find items in lists. The result returned can then either be a found result, or nil which means nothing was found.

In> (rest '(a b))
Out> (b )
In> (rest '(a))
Out> nil

The author feels uneasy with having the nil atom overloaded as both an empty list and false when used in predicates, because it makes code harder take along to another language. In some cases nil means an empty list, and in some cases it means false, which complicates converting code, due to the fact that the code has to be examined more closely for this. Yacas has a distinct set of atoms, True and False precisely for this reason. A test to see if something is an empty list is still possible, by comparing the result of evaluating an expression with an empty list. Indeed, Yacas is not alone in this, Scheme (another dialect of Lisp) has different symbols for true and false, #t and #f respectively.

Two predicates needed for operating on lists are then:

Note the difference in atom and listp, which was taken from Common Lisp. In Common Lisp predicates are supposed to be suffixed by a 'p', but apparently not so with atom.


Variable scoping

When setf is called, first the local variables are searched for a candidate variable to set, and after that the global variables are searched. When entering a function body, a local frame of local variables for that function are defined, and the parameter variables are bound to the arguments passed in to the function. let allows the programmer to declare additional local variables.

The following example sets up variables n and m and assigns them values locally. n is bound to a, and m bound to (b c).

In> (let ( (n 'a) (m '(b c)) ) (cons n m))
Out> (a b c )
In> n
ERROR: variable not defined : n
Out> nil

The expression part in the let form gets evaluated after the local variables have been assigned. Note that evaluation of the expressions in the let form are performed in the scope where the let form is called, thus expressions in the let form can use local variables from its surroundings.

Local variables that are declared in a certain scope where the variable already had a value mask the older declaration of the variable.

Note: the values that will be assigned to the local variables are evaluated before being assigned to the local variables. So the expressions in the assign-list cannot refer to other variables declared in the same assign-list.


Execution control

No language is complete without conditional execution and looping.

This concludes the a set of functions required for building up a computer algebra system. It is not a minimal set by any means, as a following section will show; if can be defined by using defmacro and cond, for instance.

The next section will first introduce arbitrary number arithmetic support. It is not strictly necessary for the interpreter to work and be able to do anything, but it is the first concession we will make towards efficiency, as it is rather hard to find an algorithm that does not need arithmetic.

The next sections after that will continue by introducing a few building blocks that are needed to implement a CAS like Yacas.


Arbitrary precision arithmetic

Before proceeding to build up a full system, we first introduce the arithmetic to the system. Arithmetic could theoretically be performed by using what we already have (lists of objects, where the number of objects in the list represents the number), but it would be impractical.

We start with the representation of numbers. This section will only refer to arbitrary precision integers, floating point numbers will come later. The implementation of the Lisp dialect described here is designed to be easily extended that way.

Numbers are represented as atoms, sequences of characters. More specifically, they can be a sequence of digits from 0 to 9 (we will work in the decimal number system here, as that is what most humans are used to). Negative integers have a minus sign in front of them. So, 1234 and -1234 are atoms representing the integers 1234 and -1234 respectively.

The most basic arithmetic operations are then addition and negating a number:

In> (+ 1 2 3)
Out> 6
In> (- 5)
Out> -5
In> (+ 2 (- 3))
Out> -1

Together with a predicate to determine that something is an integer (integerp), and an ordering operator less than denoted <, other arithmetic operations like multiplication, division, exponentiation etcetera can be built.

Some examples using < and integerp.

In> (integerp 4)
Out> t
In> (integerp -4)
Out> t
In> (integerp 'a)
Out> nil
In> Out> (a b c )
In> (integerp '(a b c))
Out> nil
In> (< 2 3)  
Out> t
In> (< 3 2)
Out> nil

Note that a computer algebra system requires arbitrary precision arithmetic, as for certain algorithms the intermediate results of a calculation can become big, even if the final result is small. So, a CAS should be able to do the following:

In> (+ 12345678901234567890999999999999999999 1)
Out> 12345678901234567891000000000000000000


A few useful macros and functions

Now we are ready to create more powerful building blocks using the primitive set of functions introduced in previous sections.

The routines developed here will be built from the primitive building blocks described before. However, an implementation is free to implement these hard-coded in the core kernel for speed. This is advisable because the constructs developed here will be used often, and so there is a need for efficiency.

Note that the functions introduced so far are not minimal. Some can be defined in terms of others. They are defined here because they are easy to define in an interpreter without making the interpreter much larger, but are used often, so efficiency is required. For instance, given the cond and defmacro operations, it is possible to define an if operation:

In> (defmacro Newif
          (pred then else)
          (cond
             ( pred then)
             ( t    else)
        ) )
Out> Newif
In> (Newif (equal 'a 'b) 'c 'd)
Out> d
In> (Newif (equal 'a 'a) 'c 'd)
Out> c
In> (if (equal 'a 'b) 'c 'd)
Out> d
In> (if (equal 'a 'a) 'c 'd)
Out> c

Now let us start by building a set of new primitives using the ones introduced so far. We start by defining Null, which returns True if its argument is an empty list:

In> (defun null (expr) (equal expr nil) )
Out> null
In> (null 'a)
Out> nil
In> (null ())
Out> t
In> (null nil)
Out> t

With this, we can define a Length function which returns the length of a list:

In> (defun length (list)
  (if (null list)
    0
    (+ 1 (length (rest list)))
) )
Out> length
In> (length '(a b c))
Out> 3
In> (length ())
Out> 0

In addition to the already defined function first, we can define the functions second, third, etcetera to print code that is easier to read:

In> (defun second (list) (first (rest list) ) )
Out> second
In> (defun third  (list) (first (rest (rest list) ) ) )
Out> third
In> (setf lst '(a b c))
Out> lst
In> (first lst)
Out> a
In> (second lst)
Out> b
In> (third lst)
Out> c

and with this we can start to print functions for boolean expressions. First, let us define a not operator, which returns True if its argument is nil:

In> (defmacro not (a)  (null a) )
Out> not
In> (not 'a)
Out> nil
In> (not nil)
Out> t

Then lazy and and lazy or operations can be defined:

In> (defmacro and (a b) (if a b nil))
Out> and
In> (defmacro or  (a b) (if a t b) )      
Out> or

These evaluate their first argument first, and decide to evaluate the second only if the first one does not give enough information to decide whether to stop. For instance, if the first argument to and returns false (eg. nil), then the result of evaluating the entire expression should be nil, otherwise it returns the result of evaluating the second argument. or works the other way round: it evaluates the second argument if the result of evaluating the first argument was nil.

Lazy and and or operators can be used to combine predicates into a larger predicate, but also to guard evaluation of an expression, let it only evaluate an expression if a pre-condition is met. The unless operation could equally well have been defined by using or:

In> (defmacro unless (predicate body) 
  (or predicate body) )    
Out> unless
In> (unless t (print 'SOMETHING))
Out> nil
In> (unless nil (print 'SOMETHING))
SOMETHING 
Out> SOMETHING

Most of these functions are convenience functions. They make code easier to read. Macros in effect allow you to extend the language.


A reality check: implementing more functions in the interpreter for efficiency

Even though new functions can be created in Lisp code rather than the interpreter, it is not necessarily a good idea to do so. For instance, declaring additional local variables can be done with defmacro (and in this case macro would also be necessary). However, it is used a lot in Lisp code, so it makes sense to write an efficient version in the interpreter itself. This section describes a few commands that, although not strictly necessary, should be implemented in the interpreter for efficiency of execution.

The reasons for adding these functions fall into the following categories:

The commands described here can be implemented in the interpreter, but they are not strictly necessary, or they can be implemented in Lisp code using more basic primitives.