Node:Top, Next:Introduction, Previous:(dir), Up:(dir)

- Introduction:
- R Basics:
- R and S:
- R Web Interfaces:
- R Add-On Packages:
- R and Emacs:
- R Miscellanea:
- R Programming:
- R Bugs:
- Acknowledgments:

Node:Introduction, Next:R Basics, Previous:Top, Up:Top

This document contains answers to some of the most frequently asked questions about R.

Node:Legalese, Next:Obtaining this document, Previous:Introduction, Up:Introduction

This document is copyright © 1998-2003 by Kurt Hornik.

This document is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

A copy of the GNU General Public License is available via WWW
at

http://www.gnu.org/copyleft/gpl.html.

You can also obtain it by writing to the Free Software Foundation, Inc., 59 Temple Place -- Suite 330, Boston, MA 02111-1307, USA.

Node:Obtaining this document, Next:Citing this document, Previous:Legalese, Up:Introduction

The latest version of this document is always available from

http://www.ci.tuwien.ac.at/~hornik/R/

From there, you can obtain versions converted to plain ASCII text, DVI, GNU info, HTML, PDF, PostScript as well as the Texinfo source used for creating all these formats using the GNU Texinfo system.

You can also obtain the R FAQ from the `doc/FAQ`

subdirectory of a CRAN site (see What is CRAN?).

Node:Citing this document, Next:Notation, Previous:Obtaining this document, Up:Introduction

In publications, please refer to this FAQ as Hornik
(2003), "The R FAQ", and give the above,
*official* URL and the ISBN 3-901167-51-X.

Node:Notation, Next:Feedback, Previous:Citing this document, Up:Introduction

Everything should be pretty standard. `R>`

is used for the R
prompt, and a `$`

for the shell prompt (where applicable).

Node:Feedback, Previous:Notation, Up:Introduction

Feedback is of course most welcome.

In particular, note that I do not have access to Windows or Mac systems. Features specific to the Windows and Mac OS ports of R are described in the "R for Windows FAQ" and the "R for Macintosh FAQ/DOC". If you have information on Mac or Windows systems that you think should be added to this document, please let me know.

Node:R Basics, Next:R and S, Previous:Introduction, Up:Top

- What is R?:
- What machines does R run on?:
- What is the current version of R?:
- How can R be obtained?:
- How can R be installed?:
- Are there Unix binaries for R?:
- What documentation exists for R?:
- Citing R:
- What mailing lists exist for R?:
- What is CRAN?:
- Can I use R for commercial purposes?:

Node:What is R?, Next:What machines does R run on?, Previous:R Basics, Up:R Basics

R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.

The design of R has been heavily influenced by two existing languages: Becker, Chambers & Wilks' S (see What is S?) and Sussman's Scheme. Whereas the resulting language is very similar in appearance to S, the underlying implementation and semantics are derived from Scheme. See What are the differences between R and S?, for further details.

The core of R is an interpreted computer language which allows branching and looping as well as modular programming using functions. Most of the user-visible functions in R are written in R. It is possible for the user to interface to procedures written in the C, C++, or FORTRAN languages for efficiency. The R distribution contains functionality for a large number of statistical procedures. Among these are: linear and generalized linear models, nonlinear regression models, time series analysis, classical parametric and nonparametric tests, clustering and smoothing. There is also a large set of functions which provide a flexible graphical environment for creating various kinds of data presentations. Additional modules ("add-on packages") are available for a variety of specific purposes (see R Add-On Packages).

R was initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics of the University of Auckland in Auckland, New Zealand. In addition, a large group of individuals has contributed to R by sending code and bug reports.

Since mid-1997 there has been a core group (the "R Core Team") who can modify the R source code CVS archive. The group currently consists of Doug Bates, John Chambers, Peter Dalgaard, Robert Gentleman, Kurt Hornik, Stefano Iacus, Ross Ihaka, Friedrich Leisch, Thomas Lumley, Martin Maechler, Guido Masarotto, Duncan Murdoch, Paul Murrell, Martyn Plummer, Brian Ripley, Duncan Temple Lang, and Luke Tierney.

R has a home page at http://www.r-project.org/. It is free software distributed under a GNU-style copyleft, and an official part of the GNU project ("GNU S").

Node:What machines does R run on?, Next:What is the current version of R?, Previous:What is R?, Up:R Basics

R is being developed for the Unix, Windows and Mac families of operating systems. Support for Mac OS Classic will end with the 1.7 series.

The current version of R will configure and build under a number of
common Unix platforms including i386-freebsd, `cpu`-linux-gnu for
the i386, alpha, arm, hppa, ia64, m68k, powerpc, and sparc CPUs (see
e.g. http://buildd.debian.org/build.php?&pkg=r-base),
i386-sun-solaris, powerpc-apple-darwin, mips-sgi-irix, alpha-dec-osf4,
rs6000-ibm-aix, hppa-hp-hpux, and sparc-sun-solaris.

If you know about other platforms, please drop us a note.

Node:What is the current version of R?, Next:How can R be obtained?, Previous:What machines does R run on?, Up:R Basics

The current released version is 1.7.1. Based on this `major.minor.patchlevel' numbering scheme, there are two development versions of R, working towards the next patch (`r-patched') and minor or eventually major (`r-devel') releases of R, respectively. Version r-patched is for bug fixes mostly. New features are typically introduced in r-devel.

Node:How can R be obtained?, Next:How can R be installed?, Previous:What is the current version of R?, Up:R Basics

Sources, binaries and documentation for R can be obtained via CRAN, the "Comprehensive R Archive Network" (see What is CRAN?).

Sources are also available via anonymous rsync. Use

rsync -rC rsync.r-project.org::moduleR

to create a copy of the source tree specified by `module` in the
subdirectory `R`

of the current directory, where `module`
specifies one of the three existing flavors of the R sources, and can be
one of `r-release`

(current released version), `r-patched`

(patched released version), and `r-devel`

(development version).
The rsync trees are created directly from the master CVS archive and are
updated hourly. The `-C`

option in the `rsync`

command
is to cause it to skip the CVS directories. Further information on
`rsync`

is available at http://rsync.samba.org/rsync/.

Node:How can R be installed?, Next:Are there Unix binaries for R?, Previous:How can R be obtained?, Up:R Basics

- How can R be installed (Unix):
- How can R be installed (Windows):
- How can R be installed (Macintosh):

Node:How can R be installed (Unix), Next:How can R be installed (Windows), Previous:How can R be installed?, Up:How can R be installed?

If binaries are available for your platform (see Are there Unix binaries for R?), you can use these, following the instructions that come with them.

Otherwise, you can compile and install R yourself, which can be done
very easily under a number of common Unix platforms (see What machines does R run on?). The file `INSTALL`

that comes with the
R distribution contains a brief introduction, and the "R Installation
and Administration" guide (see What documentation exists for R?)
has full details.

Note that you need a FORTRAN compiler or `f2c`

in addition to
a C compiler to build R. Also, you need Perl version 5 to build the R
object documentations. (If this is not available on your system, you
can obtain a PDF version of the object reference manual via CRAN.)

In the simplest case, untar the R source code, change to the directory
thus created, and issue the following commands (at the shell prompt):

$ ./configure $ make

If these commands execute successfully, the R binary and a shell script
front-end called `R`

are created and copied to the `bin`

directory. You can copy the script to a place where users can invoke
it, for example to `/usr/local/bin`

. In addition, plain text help
pages as well as HTML and LaTeX versions of the documentation are
built.

Use `make dvi` to create DVI versions of the R manuals, such as
`refman.dvi`

(an R object reference index) and `R-exts.dvi`

,
the "R Extension Writers Guide", in the `doc/manual`

subdirectory. These files can be previewed and printed using standard
programs such as `xdvi`

and `dvips`

. You can also use
`make pdf` to build PDF (Portable Document Format) version of the
manuals, and view these using e.g. Acrobat. Manuals written in the
GNU Texinfo system can also be converted to info files
suitable for reading online with Emacs or stand-alone GNU
Info; use `make info` to create these versions (note that this
requires `makeinfo`

version 4).

Finally, use `make check` to find out whether your R system works
correctly.

You can also perform a "system-wide" installation using `make
install`. By default, this will install to the following directories:

`${prefix}/bin`

- the front-end shell script
`${prefix}/man/man1`

- the man page
`${prefix}/lib/R`

- all the rest (libraries, on-line help system, ...). This is the "R
Home Directory" (
`R_HOME`

) of the installed system.

In the above, `prefix`

is determined during configuration
(typically `/usr/local`

) and can be set by running
`configure`

with the option

$ ./configure --prefix=/where/you/want/R/to/go

(E.g., the R executable will then be installed into
`/where/you/want/R/to/go/bin`

.)

To install DVI, info and PDF versions of the manuals, use `make
install-dvi`, `make install-info` and `make install-pdf`,
respectively.

Node:How can R be installed (Windows), Next:How can R be installed (Macintosh), Previous:How can R be installed (Unix), Up:How can R be installed?

The `bin/windows`

directory of a CRAN site contains binaries for
a base distribution and a large number of add-on packages from CRAN
to run on Windows 95, 98, ME, NT4, 2000, and XP (at least) on Intel and
clones (but not on other platforms). The Windows version of R was
created by Robert Gentleman, and is now being developed and maintained
by Duncan Murdoch and
Brian D. Ripley.

For most installations the Windows installer program will be the easiest tool to use.

See the "R for Windows FAQ" for more details.

Node:How can R be installed (Macintosh), Previous:How can R be installed (Windows), Up:How can R be installed?

The `bin/macos`

directory of a CRAN site contains bin-hexed
(`hqx`

) and stuffit (`sit`

) archives for a base distribution
and a large number of add-on packages to run under MacOS 8.6 to MacOS
9.1 or MacOS X natively. The Mac version of R and the Mac binaries are
maintained by
Stefano Iacus.

The "R for Macintosh FAQ/DOC" has more details.

Binaries of base distributions for MacOS X (Darwin) with X11 are made
available by Jan de Leeuw in the
`bin/macosx`

directory of a CRAN site.

Node:Are there Unix binaries for R?, Next:What documentation exists for R?, Previous:How can R be installed?, Up:R Basics

The `bin/linux`

directory of a CRAN site contains Debian
stable/testing packages for the i386 platform (now part of the Debian
distribution and maintained by Dirk Eddelbuettel), Mandrake
8.0/8.1/8.2/9.0/9.1 i386 packages by Michele Alzetta, Red Hat 7.x/8.x/9
i386 and 7.x alpha packages (maintained by Martyn Plummer and Naoki
Takebayashi, respectively), SuSE 7.3/8.0/8.1/8.2 i386 packages by Detlef
Steuer, and VineLinux 2.6 i386 packages by Susunu Tanimura.

The Debian packages can be accessed through APT, the Debian package
maintenance tool. Simply add the line

deb http://cran.r-project.org/bin/linux/debiandistributionmain

(where `distribution` is either `stable`

or `testing`

;
feel free to use a CRAN mirror instead of the master) to the file
`/etc/apt/sources.list`

. Once you have added that line the
programs `apt-get`

, `apt-cache`

, and `dselect`

(using the apt access method) will automatically detect and install
updates of the R packages.

No other binary distributions are currently publically available.

Node:What documentation exists for R?, Next:Citing R, Previous:Are there Unix binaries for R?, Up:R Basics

Online documentation for most of the functions and variables in R
exists, and can be printed on-screen by typing `help( name)`
(or

This documentation can also be made available as one reference manual for on-line reading in HTML and PDF formats, and as hardcopy via LaTeX, see How can R be installed?. An up-to-date HTML version is always available for web browsing at http://stat.ethz.ch/R-manual/.

The R distribution also comes with the following manuals.

- "An Introduction to R" (
`R-intro`

) includes information on data types, programming elements, statistical modeling and graphics. This document is based on the "Notes on S-PLUS" by Bill Venables and David Smith. - "Writing R Extensions" (
`R-exts`

) currently describes the process of creating R add-on packages, writing R documentation, R's system and foreign language interfaces, and the R API. - "R Data Import/Export" (
`R-data`

) is a guide to importing and exporting data to and from R. - "The R Language Definition" (
`R-lang`

), a first version of the "Kernighan & Ritchie of R", explains evaluation, parsing, object oriented programming, computing on the language, and so forth. - "R Installation and Administration" (
`R-admin`

).

Books on R include

Peter Dalgaard (2002), "Introductory Statistics with R", Springer: New York, ISBN 0-387-95475-9.J. Fox (2002), "An R and S-PLUS Companion to Applied Regression", Sage Publications, ISBN 0-761-92280-6 (softcover) or 0-761-92279-2 (hardcover), http://www.socsci.mcmaster.ca/jfox/Books/Companion/.

The book

W. N. Venables and B. D. Ripley (2002), "Modern Applied Statistics with S. Fourth Edition". Springer, ISBN 0-387-95457-0

has a home page at http://www.stats.ox.ac.uk/pub/MASS4/ providing additional material. Its companion is

W. N. Venables and B. D. Ripley (2000), "S Programming". Springer, ISBN 0-387-98966-8

and provides an in-depth guide to writing software in the S language which forms the basis of both the commercial S-PLUS and the Open Source R data analysis software systems. See http://www.stats.ox.ac.uk/pub/MASS3/Sprog/ for more information.

In addition to material written specifically or explicitly for R, documentation for S/S-PLUS (see R and S) can be used in combination with this FAQ (see What are the differences between R and S?). Introductory books include

P. Spector (1994), "An introduction to S and S-PLUS", Duxbury Press.A. Krause and M. Olsen (2002), "The Basics of S-PLUS" (Third Edition). Springer, ISBN 0-387-95456-2

The book

J. C. Pinheiro and D. M. Bates (2000), "Mixed-Effects Models in S and S-PLUS", Springer, ISBN 0-387-98957-0

provides a comprehensive guide to the use of the **nlme** package
for linear and nonlinear mixed-effects models. This has a home page at
http://nlme.stat.wisc.edu/MEMSS/.

As an example of how R can be used in teaching an advanced introductory statistics course, see

D. Nolan and T. Speed (2000), "Stat Labs: Mathematical Statistics Through Applications", Springer Texts in Statistics, ISBN 0-387-98974-9

This integrates theory of statistics with the practice of statistics through a collection of case studies ("labs"), and uses R to analyze the data. More information can be found at http://www.stat.Berkeley.EDU/users/statlabs/.

Last, but not least, Ross' and Robert's experience in designing and
implementing R is described in Ihaka & Gentleman (1996), "R: A Language
for Data Analysis and Graphics",
*Journal of Computational and Graphical Statistics*, **5**, 299-314.
See Citing R.

An annotated bibliography (BibTeX format) of R-related publications
which includes most of the above references can be found at

http://www.r-project.org/doc/bib/R.bib

Node:Citing R, Next:What mailing lists exist for R?, Previous:What documentation exists for R?, Up:R Basics

To cite R in publications, use

@article{, author = {Ross Ihaka and Robert Gentleman}, title = {R: A Language for Data Analysis and Graphics}, journal = {Journal of Computational and Graphical Statistics}, year = 1996, volume = 5, number = 3, pages = {299--314} }

Node:What mailing lists exist for R?, Next:What is CRAN?, Previous:Citing R, Up:R Basics

Thanks to Martin Maechler, there are three mailing lists devoted to R.

`r-announce`

- This list is for announcements about the development of R and the
availability of new code.
`r-devel`

- This list is for discussions about the future of R and pre-testing of
new versions. It is meant for those who maintain an active position in
the development of R.
`r-help`

- The `main' R mailing list, for announcements about the development of R and the availability of new code, questions and answers about problems and solutions using R, enhancements and patches to the source code and documentation of R, comparison and compatibility with S and S-PLUS, and for the posting of nice examples and benchmarks.

Note that the r-announce list is gatewayed into r-help, so you don't need to subscribe to both of them.

Send email to r-help@lists.r-project.org to reach everyone on
the r-help mailing list. To subscribe (or unsubscribe) to this list
send `subscribe`

(or `unsubscribe`

) in the *body* of the
message (not in the subject!) to
r-help-request@lists.r-project.org. Information about the list
can be obtained by sending an email with `info`

as its contents to
r-help-request@lists.r-project.org.

Subscription and posting to the other lists is done analogously, with `r-help' replaced by `r-announce' and `r-devel', respectively.

Subscriptions to `r-help' and `r-devel' are also available in digest
format, see the `doc/html/mail.html`

file in CRAN for more
information.

It is recommended that you send mail to r-help rather than only to the R Core developers (who are also subscribed to the list, of course). This may save them precious time they can use for constantly improving R, and will typically also result in much quicker feedback for yourself.

Of course, in the case of bug reports it would be very helpful to have code which reliably reproduces the problem. Also, make sure that you include information on the system and version of R being used. See R Bugs for more details.

Archives of the above three mailing lists are made available on the net
in a monthly schedule via the `doc/html/mail.html`

file in CRAN.
Searchable archives of the lists are available via
http://maths.newcastle.edu.au/~rking/R/.

The R Core Team can be reached at r-core@lists.r-project.org for comments and reports.

Node:What is CRAN?, Next:Can I use R for commercial purposes?, Previous:What mailing lists exist for R?, Up:R Basics

The "Comprehensive R Archive Network" (CRAN) is a collection of sites which carry identical material, consisting of the R distribution(s), the contributed extensions, documentation for R, and binaries.

The CRAN master site at TU Wien, Austria, can be found at the URL

http://cran.r-project.org/

and is currently being mirrored daily at

http://cran.at.r-project.org/ (TU Wien, Austria) http://cran.au.r-project.org/ (PlanetMirror, Australia) http://cran.br.r-project.org/ (Universidade Federal de Paraná, Brazil) http://cran.ch.r-project.org/ (ETH Zürich, Switzerland) http://cran.de.r-project.org/ (APP, Germany) http://cran.dk.r-project.org/ (SunSITE, Denmark) http://cran.hu.r-project.org/ (Semmelweis U, Hungary) http://cran.uk.r-project.org/ (U of Bristol, United Kingdom) http://cran.us.r-project.org/ (U of Wisconsin, USA) http://cran.za.r-project.org/ (Rhodes U, South Africa)

Please use the CRAN site closest to you to reduce network load.

From CRAN, you can obtain the latest official release of R, daily snapshots of R (copies of the current CVS trees), as gzipped and bzipped tar files, a wealth of additional contributed code, as well as prebuilt binaries for various operating systems (Linux, MacOS Classic, MacOS X, and MS Windows). CRAN also provides access to documentation on R, existing mailing lists and the R Bug Tracking system.

To "submit" to CRAN, simply upload to ftp://cran.r-project.org/incoming/ and send an email to cran@r-project.org. Note that CRAN generally does not accept submissions of precompiled binaries due to security reasons.

Note:It is very important that you indicate the copyright (license) information (GPL, BSD, Artistic, ...) in your submission.

Please always use the URL of the master site when referring to CRAN.

Node:Can I use R for commercial purposes?, Previous:What is CRAN?, Up:R Basics

R is released under the GNU General Public License (GPL). If you have any questions regarding the legality of using R in any particular situation you should bring it up with your legal counsel. We are in no position to offer legal advice.

It is the opinion of the R Core Team that one can use R for commercial purposes (e.g., in business or in consulting). The GPL, like all Open Source licenses, permits all and any use of the package. It only restricts distribution of R or of other programs containing code from R. This is made clear in clause 6 ("No Discrimination Against Fields of Endeavor") of the Open Source Definition:

The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

It is also explicitly stated in clause 0 of the GPL, which says in part

Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program.

Most add-on packages, including all recommended ones, also explicitly allow commercial use in this way. A few packages are restricted to "non-commercial use"; you should contact the author to clarify whether these may be used or seek the advice of your legal counsel.

None of the discussion in this section constitutes legal advice. The R Core Team does not provide legal advice under any circumstances.

Node:R and S, Next:R Web Interfaces, Previous:R Basics, Up:Top

- What is S?:
- What is S-PLUS?:
- What are the differences between R and S?:
- Is there anything R can do that S-PLUS cannot?:
- What is R-plus?:

Node:What is S?, Next:What is S-PLUS?, Previous:R and S, Up:R and S

S is a very high level language and an environment for data analysis and graphics. In 1998, the Association for Computing Machinery (ACM) presented its Software System Award to John M. Chambers, the principal designer of S, for

the S system, which has forever altered the way people analyze, visualize, and manipulate data ...S is an elegant, widely accepted, and enduring software system, with conceptual integrity, thanks to the insight, taste, and effort of John Chambers.

The evolution of the S language is characterized by four books by John Chambers and coauthors, which are also the primary references for S.

- Richard A. Becker and John M. Chambers (1984), "S. An Interactive
Environment for Data Analysis and Graphics," Monterey: Wadsworth and
Brooks/Cole.
This is also referred to as the "

*Brown Book*", and of historical interest only. - Richard A. Becker, John M. Chambers and Allan R. Wilks (1988), "The New
S Language," London: Chapman & Hall.
This book is often called the "

*Blue Book*", and introduced what is now known as S version 2. - John M. Chambers and Trevor J. Hastie (1992), "Statistical Models in
S," London: Chapman & Hall.
This is also called the "

*White Book*", and introduced S version 3, which added structures to facilitate statistical modeling in S. - John M. Chambers (1998), "Programming with Data," New York: Springer,
ISBN 0-387-98503-4
(<
`http://cm.bell-labs.com/cm/ms/departments/sia/Sbook/`

>).This "

*Green Book*" describes version 4 of S, a major revision of S designed by John Chambers to improve its usefulness at every stage of the programming process.

See http://cm.bell-labs.com/cm/ms/departments/sia/S/history.html for further information on "Stages in the Evolution of S".

There is a huge amount of user-contributed code for S, available at the S Repository at CMU.

Node:What is S-PLUS?, Next:What are the differences between R and S?, Previous:What is S?, Up:R and S

S-PLUS is a value-added version of S sold by Insightful Corporation. Based on the S language, S-PLUS provides functionality in a wide variety of areas, including robust regression, modern non-parametric regression, time series, survival analysis, multivariate analysis, classical statistical tests, quality control, and graphics drivers. Add-on modules add additional capabilities for wavelet analysis, spatial statistics, GARCH models, and design of experiments.

See the Insightful S-PLUS page for further information.

Node:What are the differences between R and S?, Next:Is there anything R can do that S-PLUS cannot?, Previous:What is S-PLUS?, Up:R and S

We can regard S as a language with three current implementations or
"engines", the "old S engine" (S version 3; S-PLUS 3.x and 4.x),
the "new S engine" (S version 4; S-PLUS 5.x and above), and R.
Given this understanding, asking for "the differences between R and S"
really amounts to asking for the specifics of the R implementation of
the S language, i.e., the difference between the R and S *engines*.

For the remainder of this section, "S" refers to the S engines and not the S language.

Node:Lexical scoping, Next:Models, Previous:What are the differences between R and S?, Up:What are the differences between R and S?

Contrary to other implementations of the S language, R has adopted the evaluation model of Scheme.

This difference becomes manifest when *free* variables occur in a
function. Free variables are those which are neither formal parameters
(occurring in the argument list of the function) nor local variables
(created by assigning to them in the body of the function). Whereas S
(like C) by default uses *static* scoping, R (like Scheme) has
adopted *lexical* scoping. This means the values of free variables
are determined by a set of global variables in S, but in R by the
bindings that were in effect at the time the function was created.

Consider the following function:

cube <- function(n) { sq <- function() n * n n * sq() }

Under S, `sq()`

does not "know" about the variable `n`

unless it is defined globally:

S> cube(2) Error in sq(): Object "n" not found Dumped S> n <- 3 S> cube(2) [1] 18

In R, the "environment" created when `cube()`

was invoked is
also looked in:

R> cube(2) [1] 8

As a more "interesting" real-world problem, suppose you want to write a function which returns the density function of the r-th order statistic from a sample of size n from a (continuous) distribution. For simplicity, we shall use both the cdf and pdf of the distribution as explicit arguments. (Example compiled from various postings by Luke Tierney.)

The S-PLUS documentation for `call()`

basically suggests the
following:

dorder <- function(n, r, pfun, dfun) { f <- function(x) NULL con <- round(exp(lgamma(n + 1) - lgamma(r) - lgamma(n - r + 1))) PF <- call(substitute(pfun), as.name("x")) DF <- call(substitute(dfun), as.name("x")) f[[length(f)]] <- call("*", con, call("*", call("^", PF, r - 1), call("*", call("^", call("-", 1, PF), n - r), DF))) f }

Rather tricky, isn't it? The code uses the fact that in S, functions are just lists of special mode with the function body as the last argument, and hence does not work in R (one could make the idea work, though).

A version which makes heavy use of `substitute()`

and seems to work
under both S and R is

dorder <- function(n, r, pfun, dfun) { con <- round(exp(lgamma(n + 1) - lgamma(r) - lgamma(n - r + 1))) eval(substitute(function(x) K * PF(x)^a * (1 - PF(x))^b * DF(x), list(PF = substitute(pfun), DF = substitute(dfun), a = r - 1, b = n - r, K = con))) }

(the `eval()`

is not needed in S).

However, in R there is a much easier solution:

dorder <- function(n, r, pfun, dfun) { con <- round(exp(lgamma(n + 1) - lgamma(r) - lgamma(n - r + 1))) function(x) { con * pfun(x)^(r - 1) * (1 - pfun(x))^(n - r) * dfun(x) } }

This seems to be the "natural" implementation, and it works because the free variables in the returned function can be looked up in the defining environment (this is lexical scope).

Note that what you really need is the function *closure*, i.e., the
body along with all variable bindings needed for evaluating it. Since
in the above version, the free variables in the value function are not
modified, you can actually use it in S as well if you abstract out the
closure operation into a function `MC()`

(for "make closure"):

dorder <- function(n, r, pfun, dfun) { con <- round(exp(lgamma(n + 1) - lgamma(r) - lgamma(n - r + 1))) MC(function(x) { con * pfun(x)^(r - 1) * (1 - pfun(x))^(n - r) * dfun(x) }, list(con = con, pfun = pfun, dfun = dfun, r = r, n = n)) }

Given the appropriate definitions of the closure operator, this works in both R and S, and is much "cleaner" than a substitute/eval solution (or one which overrules the default scoping rules by using explicit access to evaluation frames, as is of course possible in both R and S).

For R, `MC()`

simply is

MC <- function(f, env) f

(lexical scope!), a version for S is

MC <- function(f, env = NULL) { env <- as.list(env) if (mode(f) != "function") stop(paste("not a function:", f)) if (length(env) > 0 && any(names(env) == "")) stop(paste("not all arguments are named:", env)) fargs <- if(length(f) > 1) f[1:(length(f) - 1)] else NULL fargs <- c(fargs, env) if (any(duplicated(names(fargs)))) stop(paste("duplicated arguments:", paste(names(fargs)), collapse = ", ")) fbody <- f[length(f)] cf <- c(fargs, fbody) mode(cf) <- "function" return(cf) }

Similarly, most optimization (or zero-finding) routines need some arguments to be optimized over and have other parameters that depend on the data but are fixed with respect to optimization. With R scoping rules, this is a trivial problem; simply make up the function with the required definitions in the same environment and scoping takes care of it. With S, one solution is to add an extra parameter to the function and to the optimizer to pass in these extras, which however can only work if the optimizer supports this.

Lexical scoping allows using function closures and maintaining local
state. A simple example (taken from Abelson and Sussman) is obtained by
typing `demo("scoping")` at the R prompt. Further information is
provided in the standard R reference "R: A Language for Data Analysis
and Graphics" (see What documentation exists for R?) and in Robert
Gentleman and Ross Ihaka (2000), "Lexical Scope and Statistical
Computing", *Journal of Computational and Graphical Statistics*, **9**,
491-508.

Lexical scoping also implies a further major difference. Whereas S
stores all objects as separate files in a directory somewhere (usually
`.Data`

under the current directory), R does not. All objects
in R are stored internally. When R is started up it grabs a very large
piece of memory and uses it to store the objects. R performs its own
memory management of this piece of memory. Having everything in memory
is necessary because it is not really possible to externally maintain
all relevant "environments" of symbol/value pairs. This difference
also seems to make R *faster* than S.

The down side is that if R crashes you will lose all the work for the
current session. Saving and restoring the memory "images" (the
functions and data stored in R's internal memory at any time) can be a
bit slow, especially if they are big. In S this does not happen,
because everything is saved in disk files and if you crash nothing is
likely to happen to them. (In fact, one might conjecture that the S
developers felt that the price of changing their approach to persistent
storage just to accommodate lexical scope was far too expensive.)
Hence, when doing important work, you might consider saving often (see
How can I save my workspace?) to safeguard against possible
crashes. Other possibilities are logging your sessions, or have your R
commands stored in text files which can be read in using
`source()`

.

Note:If you run R from within Emacs (see R and Emacs), you can save the contents of the interaction buffer to a file and conveniently manipulate it using`ess-transcript-mode`

, as well as save source copies of all functions and data used.

Node:Models, Next:Others, Previous:Lexical scoping, Up:What are the differences between R and S?

There are some differences in the modeling code, such as

- Whereas in S, you would use
`lm(y ~ x^3)`

to regress`y`

on`x^3`

, in R, you have to insulate powers of numeric vectors (using`I()`

), i.e., you have to use`lm(y ~ I(x^3))`

. - The glm family objects are implemented differently in R and S. The same functionality is available but the components have different names.
- Option
`na.action`

is set to`"na.omit"`

by default in R, but not set in S. - Terms objects are stored differently. In S a terms object is an expression with attributes, in R it is a formula with attributes. The attributes have the same names but are mostly stored differently. The major difference in functionality is that a terms object is subscriptable in S but not in R. If you can't imagine why this would matter then you don't need to know.
- Finally, in R
`y~x+0`

is an alternative to`y~x-1`

for specifying a model with no intercept. Models with no parameters at all can be specified by`y~0`

.

Node:Others, Previous:Models, Up:What are the differences between R and S?

Apart from lexical scoping and its implications, R follows the S language definition in the Blue and White Books as much as possible, and hence really is an "implementation" of S. There are some intentional differences where the behavior of S is considered "not clean". In general, the rationale is that R should help you detect programming errors, while at the same time being as compatible as possible with S.

Some known differences are the following.

- In R, if
`x`

is a list, then`x[i] <- NULL`

and`x[[i]] <- NULL`

remove the specified elements from`x`

. The first of these is incompatible with S, where it is a no-op. (Note that you can set elements to`NULL`

using`x[i] <- list(NULL)`

.) - In S, the functions named
`.First`

and`.Last`

in the`.Data`

directory can be used for customizing, as they are executed at the very beginning and end of a session, respectively.In R, the startup mechanism is as follows. R first sources the system startup file

. Then, it searches for a site-wide startup profile unless the command line option`$R_HOME`

/library/base/R/Rprofile`--no-site-file`

was given. The name of this file is taken from the value of the`R_PROFILE`

environment variable. If that variable is unset, the default is

(`$R_HOME`

/etc/Rprofile.site

in versions prior to 1.4.0). This code is loaded in package`$R_HOME`

/etc/Rprofile**base**. Then, unless`--no-init-file`

was given, R searches for a file called`.Rprofile`

in the current directory or in the user's home directory (in that order) and sources it into the user workspace. It then loads a saved image of the user workspace from`.RData`

in case there is one (unless`--no-restore`

was specified). If needed, the functions`.First()`

and`.Last()`

should be defined in the appropriate startup profiles. - In R,
`T`

and`F`

are just variables being set to`TRUE`

and`FALSE`

, respectively, but are not reserved words as in S and hence can be overwritten by the user. (This helps e.g. when you have factors with levels`"T"`

or`"F"`

.) Hence, when writing code you should always use`TRUE`

and`FALSE`

. - In R,
`dyn.load()`

can only load*shared objects*, as created for example by`R CMD SHLIB`. - In R,
`attach()`

currently only works for lists and data frames, but not for directories. (In fact,`attach()`

also works for R data files created with`save()`

, which is analogous to attaching directories in S.) Also, you cannot attach at position 1. - Categories do not exist in R, and never will as they are deprecated now in S. Use factors instead.
- In R,
`For()`

loops are not necessary and hence not supported. - In R,
`assign()`

uses the argument`envir=`

rather than`where=`

as in S. - The random number generators are different, and the seeds have different length.
- R passes integer objects to C as
`int *`

rather than`long *`

as in S. - R has no single precision storage mode. However, as of version 0.65.1, there is a single precision interface to C/FORTRAN subroutines.
- By default,
`ls()`

returns the names of the objects in the current (under R) and global (under S) environment, respectively. For example, givenx <- 1; fun <- function() {y <- 1; ls()}

then

`fun()`

returns`"y"`

in R and`"x"`

(together with the rest of the global environment) in S. - R allows for zero-extent matrices (and arrays, i.e., some elements of
the
`dim`

attribute vector can be 0). This has been determined a useful feature as it helps reducing the need for special-case tests for empty subsets. For example, if`x`

is a matrix,`x[, FALSE]`

is not`NULL`

but a "matrix" with 0 columns. Hence, such objects need to be tested for by checking whether their`length()`

is zero (which works in both R and S), and not using`is.null()`

. - Named vectors are considered vectors in R but not in S (e.g.,
`is.vector(c(a = 1:3))`

returns`FALSE`

in S and`TRUE`

in R). - Data frames are not considered as matrices in R (i.e., if
`DF`

is a data frame, then`is.matrix(DF)`

returns`FALSE`

in R and`TRUE`

in S). - R by default uses treatment contrasts in the unordered case, whereas S uses the Helmert ones. This is a deliberate difference reflecting the opinion that treatment contrasts are more natural.
- In R, the argument of a replacement function which corresponds to the
right hand side must be named
`value`

. E.g.,`f(a) <- b`

is evaluated as`a <- "f<-"(a, value = b)`

. S always takes the last argument, irrespective of its name. - In S,
`substitute()`

searches for names for substitution in the given expression in three places: the actual and the default arguments of the matching call, and the local frame (in that order). R looks in the local frame only, with the special rule to use a "promise" if a variable is not evaluated. Since the local frame is initialized with the actual arguments or the default expressions, thi