R for reproducible scientific analysis
Reference
Introduction to R and RStudio
- Use the escape key to cancel incomplete commands or running code (Ctrl+C) if you’re using R from the shell.
- Basic arithmetic operations follow standard order of precedence:
- Brackets:
(,) - Exponents:
^or** - Divide:
/ - Multiply:
* - Add:
+ - Subtract:
- - Scientific notation is available, e.g:
2e-3 - Anything to the right of a
#is a comment, R will ignore this! - Functions are denoted by
function_name(). Expressions inside the brackets are evaluated before being passed to the function, and functions can be nested. - Mathematical functions:
exp,sin,log,log10,log2etc. - Comparison operators:
<,<=,>,>=,==,!= - Use
all.equalto compare numbers! <-is the assignment operator. Anything to the right is evaluate, then stored in a variable named to the left.lslists all variables and functions you’ve createdrmcan be used to remove them- When assigning values to function arguments, you must use
=.
Project management with RStudio
- To create a new project, go to File -> New Project
- Install the
packratpackage to create self-contained projects install.packagesto install packages from CRANlibraryto load a package into Rpackrat::statusto check whether all packages referenced in your scripts have been installed.
Reading data
read.tableto read in data in a regular structuresepargument to specify the separator- “,” for comma separated
- “” for tab separated
- Other arguments:
header=TRUEif there is a header row
Seeking help
?orhelp()to seek help for a function.??to search for a function.- Wrap special operators in quotes when searching for help:
help("+"). - CRAN Task Views.
- stackoverflow.
Data structures
Basic data structures in R:
- atomic
?vector(can only contain one type) ?list(containers for other objects)?data.frametwo dimensional objects whose columns can contain different types of data?matrixtwo dimensional objects that can contain only one type of data.?factorvectors that contain predefined categorical data.?arraymulti-dimensional objects that can only contain one type of data
Remember that matrices are really atomic vectors underneath the hood, and that data.frames are really lists underneath the hood (this explains some of the weirder behaviour of R).
Data types:
?numericreal (decimal) numbers?integerwhole numbers only?charactertext?complexcomplex numbers?logicalTRUE or FALSE values
Special types:
?NAmissing values?NaN“not a number” for undefined values (e.g.0/0).?Inf,-Infinfinity.?NULLa data structure that doesn’t exist
NA can occur in any atomic vector. NaN, and Inf can only occur in complex, integer or numeric type vectors. Atomic vectors are the building blocks for all other data structures. A NULL value will occur in place of an entire data structure (but can occur as list elements).
Useful functions for querying data structures:
?strstructure, prints out a summary of the whole data structure?typeoftells you the type inside an atomic vector?classwhat is the data structure??headprint the firstnelements (rows for two-dimensional objects)?tailprint the lastnelements (rows for two-dimensional objects)?rownames,?colnames,?dimnamesretrieve or modify the row names and column names of an object.?namesretrieve or modify the names of an atomic vector or list (or columns of a data.frame).?lengthget the number of elements in an atomic vector?nrow,?ncol,?dimget the dimensions of a n-dimensional object (Won’t work on atomic vectors or lists).
Data subsetting
- Elements can be accessed by:
- Index
- Name
:to generate a sequence of numbers to extract slices[single square brackets:- extract single elements or subset: - vectors
- extract single elements of a list
- extract columns from a data.frame
[with two arguments to:- extract rows and/or columns of
- matrices
- data.frames
[[double square brackets to subset lists$to access columns or list elements by name- negative indices skip elements
Writing data
write.tableto write out objects in regular format- set
quote=FALSEso that text isn’t wrapped in"marks
Vectorisation
- Most functions and operations apply to each element of a vector
*applies element-wise to matrices%*%for true matrix multiplicationany()will returnTRUEif any element of a vector isTRUEall()will returnTRUEif all elements of a vector areTRUE
Control flow
- Use
ifcondition to start a conditional statement,else ifcondition to provide additional tests, andelseto provide a default - The bodies of the branches of conditional statements must be indented.
- Use
==to test for equality. X && Yis only true if both X and Y areTRUE.X || Yis true if either X or Y, or both, areTRUE.- Zero is considered
FALSE; all other numbers are consideredTRUE - Nest loops to operate on multi-dimensional data.
Functions
- Put code whose parameters change frequently in a function, then call it with different parameter values to customize its behavior.
- The last line of a function is returned, or you can use
returnexplictly - Any code written in the body of the function is isolated to the function when it is called.
- Document Why, then What, then lastly How (if the code isn’t self explanatory)
Split-apply-combine
- Use the
xxplyfamily of functions to apply functions to groups within some data. - the first letter,
array ,data.frame orlist corresponds to the input data - the second letter denotes the output data structure
- Anonymous functions (those not assigned a name) are used inside the
plyrfamily of functions on groups within data.
GGplot2
- figures can be created with the grammar of graphics:
ggplotto create the base figureaesthetics specify the data axes, shape, color, and data sizegeometry functions specify the type of plot, e.g.point,line,density,boxgeometry functions also add statistical transforms, e.g.geom_smoothscalefunctions change the mapping from data to aestheticsfacetfunctions stratify the figure into panelsaesthetics apply to individual layers, or can be set for the whole plot insideggplot.themefunctions change the overall look of the plot- order of layers matters!
ggsaveto save a figure.
Defensive Programming
- Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.
- Write tests before writing code in order to help determine exactly what that code is supposed to do.
- Know what code is supposed to do before trying to debug it.
- Make it fail every time.
- Make it fail fast.
- Change one thing at a time, and for a reason.
- Keep track of what you’ve done.
- Be humble