Rlab2 Reference Manual: User Functions

6. User Functions

Functions are almost first class variables in Rlab. Functions are stored as ordinary variables in the symbol-table; this explains the unusual syntax for creating and storing a function. User written functions are a key feature in any high level language. Mastering their usage is key to getting the most out of Rlab.

User functions are created within the Rlab programming environment... that is the major distinction between user and builtin functions. Otherwise, there is no difference in the usage of builtin and user-written functions.

The argument passing and function scoping rules are fairly straightforward and simple. The primary objective is ease of use, and safety, without compromising capability or efficiency.

The two inter-related concepts important to understanding functions are execution-environments, and variable-scopes. An execution-environment consists of the variables available to the currently executing statements. A variable's scope refers to the environment that variable is bound to. For instance statements executed from the command line are always executed in the global-environment. The global environment consists of all variables and functions created within that environment. Thus, a statement executed in the global environment only has access to global variables.

All function arguments, and variables created within functions are visible only within the function's environment, or local-scope. Conversely, function statements do not have access to variables in the global-scope (unless certain steps are taken).

Furthermore, function arguments can be considered local variables (in simplistic sense). Any modifications to a function's arguments are only visible within the function's environment. This is commonly referred to as "pass by value" (as opposed to "pass by reference"). Pass by value means that a variable's value (as opposed to the variable itself) is passed to the function's environment. Thus, when a function argument is modified, only a copy of its value is affected, and not the original variable.

This separation of environments, or scopes, ensures a "safe" programming environment. Function writers will not accidentally clobber a global variable with a function statement, and users will not accidentally impair the operation of a function with global statements.

6.1 Function Syntax

The function definition syntax is:

function ( arg1 , arg2 , ... argN ) { statements }

Of course, the function must be assigned to a variable in order for it to be usable:

var = function ( arg1 , arg2 , ... argN ) { statements }

The number of arguments, and their names are arbitrary (actually there is a 32 argument limit. If anyone ever hits that limit, it can be easily increased). The function can be used with fewer arguments than specified, but not more.

The function statements are any valid Rlab statements, with the exception of other function definitions. The optional return-statement is peculiar to functions, and allows the program to return data to the calling environment.

6.2 Using Functions

Every invocation of a function returns a value. Normally the user determines the return-value of a function. However, if a return statement does not exist, the function return value will default to zero. Since function invocations are expressions, many syntactical shortcuts can be taken. For example, if a function returns a list, like the eig function:


> eig(a)
   val          vec          
> eig(a).val
   -0.677      0.469       1.44

eig returns a list containing the elements val and vec; if we just want to get at one of the list elements, such as val, we can extract it directly. The same sort of shortcut can be used when a function returns matrices:


> size(rand(20,30))
       20         30  
> size(rand(20,30))[2]
       30

6.3 Function Arguments

As mentioned earlier, functions can have an arbitrary number of arguments. When program execution is passed to a user-function, all or some, or none of the arguments can be specified. Arguments that are not specified are replaced with UNDEFINED objects. How well the user-functions behaves when invoked with varying argument lists is entirely up to the user/programmer. If you don't want to specify the last N arguments to a function, just truncate the argument list. If you don't want to specify arguments that are in the beginning, or middle of the complete list of arguments, use commas. For example, a user-function defined like:


> x = function ( arg1 , arg2 , arg3 , arg4 )
  {
    if (exist (arg1)) { arg1 }
    if (exist (arg2)) { arg2 }
    if (exist (arg3)) { arg3 }
    if (exist (arg4)) { arg4 }
  }
        <user-function>

Can be used as follows:


> x ( );
> x ( pi );
     3.14  
> x ( pi, 2*pi );
     3.14  
     6.28  
> x ( , , , 4*pi );
     12.6

In the first instance, no arguments are specified, and the function does nothing. In the second instance, only the first argument is specified, and it is echoed to the screen. In the third instance, the first and second arguments are specified, and are properly echoed. In the last case, only the fourth argument is specified.

Each function has a local variable named nargs available. nargs is automatically set to the number of arguments passed to a function for each invocation. For example:


> x = function ( a, b, c, d, e, f, g ) { return nargs; }
        <user-function>
> x(1,2,3)
        3
> x(1,,2)
        3

This simple one-line function demonstrates how the nargs variable is initialized. Notice that while nargs correctly reports the number of arguments in the first instance, the results from the second trial may seem a little confusing. The variable nargs is intended to provide programmers with the number of arguments passed to a user-function, and to provide some degree of compatibility with other popular very high-level languages. However, the existence of the variable nargs and the ability to skip arguments conflict and make the meaning of nargs ambiguous. The bottom line is: nargs has the value of the argument number of the last specified argument.

A better method of determining which arguments were explicitly specified, and where were not is the use the builtin function exist. For example:


> x = function ( a , b, c, d, e, f, g )
  {
    if (exist (a)) { "1st argument was specified" }
    if (!exist (a)) { "1st argument was not specified" }
  }
        <user-function>
> x(2);
1st argument was specified  
> x(,3);
1st argument was not specified

This method is often used to decide if an argument's default value needs to be used. Another example:


> # compute a partial sum of a vector...
> psum = function ( v , n )
  {
    if (!exist (n)) { n = length (v); }
    return sum (v[1:n]);
  }
        <user-function>
> v = rand(1024,1);
> psum(v)
      497  
> psum(v,5)
     2.99

6.4 Function Variable Scoping Rules

Although the subject of function variable scoping has already been addressed, it deserves more attention. Rlab utilizes a pass by value scheme for all function arguments. In many interpretive languages, this means a copy of the function arguments is made prior to function execution. This practice can be very time and memory inefficient, especially when large matrices or list-structures are function arguments. Rlab does not copy function arguments, unless the function modifies the arguments.


> x = function ( a ) { global(A) a = 2*a; A ? return a; }
        <user-function>
> A = 2*pi
     6.28  
> x(A)
     6.28  
     12.6  
> A
     6.28

In the above example the function x modifies its argument a. Before modifying its argument, the function prints the value of the global variable A. A is assigned the value of 2*pi, and x is called with argument A. After x modifies its argument, A is printed; note that the value of A is not changed. Printing the value of A afterwards verifies that its value has not changed.

6.5 Function Recursion

Functions can be used recursively. Each invocation of a function creates a unique environment. Probably the most popular example for displaying this type of behavior is the factorial function:

//
// Fac.r (the slowest fac(), but the most interesting)
//

fac = function(a) 
{
  if(a <= 1) 
  {
      return 1;
  else
      return a*$self(a-1);
  }
};

The special syntax: $self can be used within recursive functions to protect against the possibility of the function being renamed.


> fac(4)
       24

6.6 Loading Functions and Function Dependencies

Functions definitions are treated just like any other valid statements. Since function definitions are mostly stored in files for re-use, you will need to know how to load the program statements within a file into Rlab's workspace. There are three main methods for performing this task:

The load function reads the argument (a string), and reads that file, interpreting the file's contents as Rlab program(s). The file is read, and interpreted, regardless of file modification or access times.
the rfile statement is a short-cut to the load statement. Additionally, rfile searches a pre-defined path of directories to files (ending in .r) to load. The file is read, and interpreted, regardless of file modification or access times.
The require statement, is similar to the rfile statement, except it only loads the specified file once. require looks for a user-function with the same name as the argument to require, loading the file if, and only if, a user-function of the same name cannot be found. File access and/or modification times are not considered.

How a function is Loaded

Some discussion of the "behind the scenes work" that occurs when a function is loaded is warranted. Although not absolutely necessary, understanding this process will help the reader write more complicated programs efficiently.

The file is opened, and its contents are read.
As the program statements are read, they are compiled and executed. Thus, as Rlab encounters global program statements, they are executed. Function definitions (assignments) are global program statements. So... function definitions get compiled, and stored as they are read.
Variable (this includes) functions references are resolved at compile-time. A variable can be UNDEFINED as long as it is resolved before statement execution. Thus a function can reference other functions, that have not been previously defined; as long as these references are resolved before the function is executed.
When the file's end mark (EOF) is encountered, the file is closed, and execution returns to its previous environment.

Some examples will make the previous remarks more meaningful.

Example-1

The following file contains a trivial function definition/assignment. Note that the variable (function) y is undefined.

x = function ( A )
{
  return y(A);
};

Now, we will load the file (ex-1.r). Note that there are no apparent problems until the function is actually executed.


> load("./ex-1.r");
> x(3)
cannot use an undefined variable as a function
ERROR: ../rlab: function undefined
        near line 3, file: ./ex-1.r

Now, we can define y to be a function, and x will execute as expected.


> y = function ( A ) { return 3*A; };
> x(3)
        9

Example-2

Understanding how files are loaded will help the efficiency of your functions. For example, functions are often written which depend upon other functions. Resolving the dependencies within the file that contains the function is common, for example, lets use the previous file (ex-1.r). We know that the function x depends upon the function y, so we will put a rfile statement within the file ex-1.r, which we will now call ex-2.r.

rfile y

x = function ( A )
{
  return y(A);
};

Note that the rfile statement is exterior to the function; it is a global program statement. Thus, the file y.r will get loaded once, and only once when the file ex-2.r is loaded.


> rfile ex-2
> x(4)
       12

We could have put the rfile or load statement within the function definition. But, then the file y.r would get loaded every time the function x was executed! Performance would be worse than if the rfile statement was a global statement, especially if the function was used within a for-loop!

Restrictions

Although there has been no reason for the user to infer that there are limitations on the number of functions per file, or the way function comments must be formatted, the fact that there are no restrictions in these matters must be explicitly stated because other interpreted languages often make such restrictions.

There are no restrictions on the number of function definitions within a single file.
There are no restrictions on how a functions comments must be formatted to work with the online help system. When the help command is used, the entire contents of the named file are displayed on the screen (usually via a pager). The user can view comments at the top of the file, or look at the entire function.
There are no restrictions on the type of programs stored in files. Simple user-functions can be stored, as well as global program statements, or any mixture of the two.

6.7 Static Variables

Static variables, or file-static variables are variables that are only visible within the file in which they are declared. The syntax for the static-declaration is:

static ( var1, var2, ... varN )

File-static variables are accessible by all program statements after the static declaration. All functions have access to file-static variables as if they were local variables, no special declarations are necessary.

Since functions are variables, functions can be hidden within a file with the static declaration. These functions are only accessible to program statements within that particular file. This technique is very useful for writing function toolboxes or libraries, since it eliminates the problem of global variables accidentally colliding with toolbox variables.

6.8 Help for User Functions

Although help for user-functions has already been eluded to, it is worth covering in detail. By convention, all the documentation necessary to use a function, or a library of functions is included within the same file, in the form of comments. This practice is very convenient since your function is never without documentation. Rlab's online help system is very simple. when a user types help eig (for example) Rlab looks for a help file named eig in the designated help directory. If the file eig cannot be found, the rfile search path is checked for files named eig.r. If a file named eig.r is found, its entire contents are displayed in the terminal window. Normally some sort of paging program is used, such as more(1), which allows the user to terminate viewing the file at any time.

Next Previous Contents