Getting started with the R

 Download R (type R in google) or use a unix cluster computer and start up R. Open windows explorer and locate the downloaded file. Double-click on the file. R should start.

A main window will open and within it will be another window (called the "R console"). This second window will display some introductory text and at the end will be a ">" prompt (without quotes).

You enter commands in this window, and execute it when you press the "Enter" key. Type q() at the prompt and hit (gently) "Enter" 

> q()

A dialog box will appear, with 3 options. "Cancel" brings you back to the R console. "Yes" will exit R, saving any R variables you created in the .RData file. "No" will exit R without saving. Next, type 

> x <- 3

This assigns (by means of the "<-" above) the value 3 to the variable x. Next, type

> ls()

This lists the variables and user-defined functions that is contained in the .RData file which R is using. At the moment, it has only as one variable "x". Typing

> rm(x)

deletes the variable x.

Create "x" again, then exit by typing "q()" (without quotes) and selecting "No". Restart R by double-clicking on the .RData file. Note that "x" has disappeared. Create x again, then exit R, but this time select "Yes" to saving the workspace. Restart R. Typing "ls()" shows that "x" has been saved.

3. Data Manipulation

Suppose we want to enter the following set of data into R

Name

Age

Number of siblings

Bob

27

2

Sue

33

1

Bill

21

0

John

56

4

For now, we will enter the data directly by hand. This is ok for small datasets, but we will use other methods of data input later on. Type 

> name <- c("Bob", "Sue", "Bill", "John")

> ls()

You will see a new variable "name". To see the contents of "name", type

> name

The variable "name" is actually a vector (created using the "c" above) with 4 elements, each element being a character string (specified by the double quotes). Now, we will create 2 other vectors, "age" and "num.siblings":

> age <- c(27, 33, 21, 56)

> num.siblings <- c(2, 1, 0, 4)

We can perform various operations on these vectors. 

Exercise: Try the following (one at a time), record the output and describe what the command does:

> mean(age)

> median(age)

> summary(age)

> sum(age)

> num.siblings^2

> max(num.siblings)

> num.siblings + 2

> age + num.siblings

(Note: some of the above e.g. age + num.siblings, have little, if any, real meaning. The aim is just to introduce you to some of the available functions.)

Suppose we want to create a vector consisting of numbers 1 to 20. We can do this in a number ways:

> x <- 1:20

> x <- c(1:20)

> x <- seq(1:20)

Exercise: Look up the help file on "seq", then describe the difference in output of the following:

> 5:20

> seq(from=5, to=20)

> seq(along=5:20)

> seq(5:20)

4. Getting help

For help regarding a specific R command or function, simply type

> help(functionname)

Of course, replace "functionname" with the actual name of the function or command. Also, for this to be useful, you will need to know the name of the command. 

Exercise: Find out what the commands "sort" and "rep" do.

5. Further exercises

Make sure you have read the material above. You will need it to perform this exercise.

A: Descriptive statistics of a data set

Brief background on data: Climatologists interested in flooding gather statistics on the daily rainfall in various cities. The following data set gives the maximum daily rainfall (in inches) for the years 1941 to 1970 in South Bend, Indiana.

Data:

1.88 2.23 2.58 2.07 2.94 2.29 3.14 2.14 1.95 2.51 2.86 1.48 1.12
2.76 1.48 1.12 2.76 1.50 2.99 3.48 2.12 4.69 2.29 2.12

(a) Enter the data into a variable called "rainfall" (without quotes)

(b) Compute the values of the mean and median. 

(c) Find the variance and the standard deviation. (Hint 1: the standard deviation is the square root of the variance; Hint 2: the commands "var" and "sqrt" may be useful - use the help function to find out what they do)

B: Using the "rep" command (you will need to read the "arguments" section of the help file on "rep")

(a) Create a vector "x" containing the values 1 to 10.

> x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

or, more simply

> x <- 1:10

(b) Write down the form of the command "rep" that takes "x" as an input and outputs the following

> [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

(c) Write down the form of the command "rep" that takes "x" as input and outputs the following"

> [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10