EGU 2018 | Wed, 11 Apr, 13:30-15:00 /Room 2.16

Overview

  • R packages – an introduction (5 min)
  • Structure and contents of R packages (15 min)
  • Initiating an R package with RStudio (5 min)
  • Writing a function with 'roxygen2' (20 min)
  • Building the package (5 min)
  • Maintaining a package with Git (15 min)
  • Testing and integrating checks (15 min)
  • Sharing a package (10 min)

R packages – What?

R packages – What?

  • This course does cover
    • a brief idea of the concept of R packages
    • a discussion/justification of package contents
    • hands-on work to build a package
    • work beyond heaving a package onto the shelf and vanish
  • You are looking for an introduction to R … sorry, bad luck!

R packages – What?

What you should be already familiar with

  • R and RStudio
  • Installing and using R packages
  • Writing scripts (and functions)
  • Structuring code and following good practice rules

  • You are looking for an introduction to R … sorry, bad luck!(*)

(*) Don't repeat yourself.

R packages

Further resources

R packages

Prerequisites and computational minions

  • RStudio: The skin for R, a proper developer (and user) environment
    • Support for creating scripts, packages, reports, books, slides, …
    • Plots, data environment history, file browser, help viewer
    • Auto completion, syntax highlighting, context help
    • Version control and project management
  • The R package 'devtools' by H. Wickham, J. Hester & W. Chang
  • The R package 'roxygen2' by H. Wickham, P. Danenberg & M. Eugster

R packages

Concepts

  • R lives from sharing code, and packages are the vehicles for this idea
  • Packages can be the pillars for open and reproducible science
  • Packages comprise the full set of self-contained components

Libraries are no packages! Libraries contain packages. You pull a package from a library.

R packages

Some illusions

  • R packages have to go to the Comprehensive R Archive Network (CRAN)
  • Writing R packages is a lot of boring effort
  • You can start easy and add further details later
  • People will use your package in the way you designed it

R packages

Why writing packages

  • You want to share your code with others
  • The code is organised in a coherent way
  • You want to handle/distribute only a single file
  • Bundling code makes keeping track much easier than collecting scripts
  • The code is automatically tested (by your examples and other routines)

Working with packages simply saves time and brain cells

R packages – Contents

R package contents

An overview


What do you think? What belongs to an R package?

R package contents

An overview of obvious material

  • A name (yes, a package name)
  • A description (the meta information)
  • A function (yes, there are packages with only one function)
  • A documentation (you want to understand what the function does)
  • A working examples (you don't want to rely on the documentation, only)

  • A set of further stuff that will be covered later

R package contents

An overview

R package contents

The package name

  • This is (or should be) the hardest job when creating a package
  • Requirements
    • Only letters, numbers and periods are allowed
    • Start with a letter, do not end with a period
  • Advice
    • Pick a unique name you can google, best describing your package
    • Check if the name already exists beforehand
  • Example: 'praise' by G. Csardi & S. Sorhus

R package contents

The DESCRIPTION file

  • The mandatory file that defines all the metadata of the package: name & short title, version & date, author & contact, license & dependencies:

R package contents

The DESCRIPTION file - Title and description

  • Title must be capitalised, only one line, not ending with a period
  • Description can be several sentences, but only one paragraph. Lines can only contain 80 characters and must be indented by 4 spaces.
Title: Environmental seismology toolbox
Description: A collection of functions to handle seismic data for the
    purpose of investigating the seismic signals emitted by Earth surface
    processes. The package supports inporting standard formats, data 
    preparation and analysis techniques and data visualisation.
  • Both elements are important. They will be indexed and Google has learned a lot to spot R packages.

R package contents

The DESCRIPTION file – Dependencies

  • Dependency options in short (read more):
    • Depends: all packages your package essentially needs to run
    • Imports: will be covered by the namespace section
    • Suggests: optionally needed packages
    • LinkingTo: needed to reference C++/C libraries
  • CRAN became strict with the number entries in 'Depends'. Use importFrom() in the file NAMESPACE, instead.

R package contents

The DESCRIPTION file – Author information and roles

  • Author information can be defined more comprehensively (read more):
Authors@R: person("First", "Last", email = "first.last@example.com",
                  role = c("aut", "cre"))
  • Essential for correct citation of packages! Nota bene: Type citation("PACKAGENAME") to see how a package should be cited
  • Don't use fake mails. CRAN and users cannot communicate with you.

Since end of 2017, CRAN supports ORCID

Authors@R: person(...,
                  comment = c(ORCID = "0000-0002-9079-593X"))

R package contents

The DESCRIPTION file – License issues

  • The key element to inform who can use the package for which purpose!
  • Either a link to a license file (License: file LICENSE) or a keyword of standard licenses (read more):
    • GPL-2 or GPL-3: copy-left license , other users must license code GPL-compatile. Common for CRAN-submission.
    • CC0: give away all rights, anything can be done with the code
    • BSD or MIT: permissive licenses, require additional file LICENSE.

R package contents

The DESCRIPTION file – Version patterns

  • Version numbers must be numeric and separated by a period.
  • They are more than just counters, they define dependency satisfactions

  • Format: MAJOR.MINOR.PATCH (start a released package with 0.1.0)
    • MAJOR releases should be rare
    • MINOR releases should keep the package up to date
    • PATCH releases may be frequent (but think of CRAN team time budget)
  • Make use of a NEWS file to announce version history and changes.

R package contents

Further contents

R package contents

R code

  • The actual function definitions
  • Will be covered by the next section

R package contents

Code documentation

  • The (second) most important part of a reasonable package
  • Omitting it means, nobody will be able to use your package

  • Documentation in R is reference documentation, similar to dictionaries
  • Additional documentation is covered by vignettes (not covered here)

  • In R, documentation in *.Rd-files is formalised and follows a pseudo LaTeX scheme (short version, long version)

R package contents

Code documentation - code example

R package contents

Code documentation - how it looks like

R package contents

Code documentation

  • Why should you be not attempt to build documentation files manually?
    • Tedious, clumsy, not intuitive
    • Prone to forget updating after changing the function
  • Alternative, write documentation in function definitions
    • roxygen2
    • inlinedocs (no longer updated)
  • Application see next section

R package contents

Examples and example data

  • Working (and worked) examples are mandatory documentation items
  • Serve two things
    • Explain usage of the function
    • Creates a test, excecuted every time the package is built
  • Typically useful to include example data sets
    • Must be provided as *.rda files (generated with save())
    • Must be stored in directory data
    • Must be documented individually

R package contents

Further contents

  • Namespace (will be dealt with automatically, read more)
    • Defines function name assignments to packages
  • Compiled code (not covered here)
    • C++ code present in src will be compiled during installation
  • Shiny apps and other installed software (not covered here)
    • further files/software in inst will be copied to main directory

Time for a short brainstorming break!

Pffffft, a lot of dense and boring input, right?

Brainstorming break

The task: Think about and collect the essential items for your own package. Note the results in a plain text document for later use. Time: about 4 minutes.

We will need this material soon to build a package, … an empty one, … which will finally only have one function.

Brainstorming break

The task: Think about and collect the essential items for your own package. Note the results in a plain text document for later use. Time: about 4 minutes.

We will need this material soon to build a package, … an empty one, … which will finally only have one function.

Already need a short reminder? :)

Brainstorming break

  • package name
  • package title
  • package description
  • version
  • date
  • author(s)
  • maintainer (plus email address)
  • license

Creating a package in RStudio

Creating a package in RStudio

Start with a new project: Menu >> File > New Project…

Alternatively use devtools::create("path/to/package/pkgname")

Alternatively (DON'T) use package.skeleton(). It will create an overloaded package template that needs more modification.

Creating a package in RStudio

Start with a new project: Menu File > New Project…

Creating a package in RStudio

The template as overview

  • A new RStudio-project is started (file PACKAGE_NAME.Rproj)
  • Close and delete the template function and its documentation straight away.
  • Configure the Build tools: Menu >> Tools > Project Options > Build Tools
    • make sure to check "Generate documentation with Roxygen".
  • The NAMESPACE contains only: exportPattern("^[[:alpha:]]+"), i.e., every function is exported!
  • The DESCRIPTION file contains auto-generated content. Modify it so that it fits your needs.

From scripts to functions

From scripts to functions

An unfortunate start …

a <- 10:50
print(a)
plot(a)
b <- 210:250
plot(a, b)
A <- a * b
c <- 5
V <- A * c
print(A)
print(V)
plot(a,V)

From scripts to functions

… ok …

a <- 10:50
b <- 210:250
c <- 5

A <- a * b
V <- A * c

plot(a)
plot(a, b)
plot(a,V)

print(a)
print(A)
print(V)

From scripts to functions

… better …

## define object geometry
a <- 10:50
b <- 210:250
c <- 5

## calculate area and volume
A <- a * b
V <- A * c

## plot object dimensions
plot(a)
plot(a, b)
plot(a, V)

## print values
print(a)
print(A)
print(V)

From scripts to functions

Wrapping code to functions

f <- function(a, b, c) {
  
  ## calculate area and volume
  A <- a * b
  V <- A * c

  ## plot object dimensions
  plot(a)
  plot(a, b)
  plot(a, V)

  ## return values
  return(list(A = A,
              V = V))
}

From scripts to functions

Wrapping code to functions

f <- function(a, b, c, plot = TRUE) {
  
  ## calculate area and volume
  A <- a * b
  V <- A * c

  ## optionally plot object dimensions
  if(plot == TRUE) {
    
    plot(a)
    plot(a, b)
    plot(a, V)
  }

  ## return values
  return(list(A = A,
              V = V))
}

From scripts to functions

Wrapping code to functions

f(a = 10, b = 100, c = 5, plot = FALSE)
## $A
## [1] 1000
## 
## $V
## [1] 5000

From scripts to functions

In words

  • Structure your script
    • Variable/object/argument definitions
    • Data/variable checks and automatic assignments
    • Data manipulation/evaluation
    • Optional further outputs
    • Return object creation
  • Wrap it into a function definition
    • FUNCTION_NAME(ARUMENT_1, ARGUMENT_2) {FUNCTION BODY}

From scripts to functions

In words

  • Ooops, what did we forget?

From scripts to functions

In words

  • Ooops, what did we forget?

  • Function documentation (in a separate file)

\name{f}
\alias{f}
\title{Calculate and plot cuboid areas and volumes.}
\usage{f(a, b, c, plot = TRUE)}
\arguments{
\item{a}{\code{Numeric} vector, length of the cuboid.}
\item{b}{\code{Numeric} vector, width of the cuboid.}
\item{c}{\code{Numeric} vector, height of the cuboid.}}
\value{A list with cuboid area and volume.}
\description{The function takes numeric vectors of the cardinal 
  dimensions of a cuboid object and calculates area and volume. The results can optionally be plotted.}
\examples{f(a = 10, b = 100, c = 5, plot = FALSE)}
\author{Michael Dietze}

From scripts to functions

Documentation using 'roxygen2'

  • So why not writing documentation into the function definition file?

Another brief function example

f <- function(x, p = 2) {
  ## calculate the power of x
  y <- x^p

  ## return value
  return(y)
}

From scripts to functions

Documentation using roxygen2

Can be rewritten like this:

#' @title Calculate the power of a vector               # TITLE
#' 
#' @description The function calculates something       # DESCRIPTION
#' 
#' @details The function simply combines the arguments. # DETAILS 
#'
#' @param x input vector                                # ARGUMENTS
#' @param p power exponent                              # ARGUMENTS
#' @return vector of the power p of x.                  # VALUE
#' @author Michael Dietze                               # AUTHOR(S)
#' @examples
#' f(x = 10, p = 3)                                     # EXAMPLES
#' @export                                              # NAMESPACE ENTRY
f <- function(x, p = 2) {                               # USAGE
  return(x^p)
}

From scripts to functions

Documentation using roxygen2

And become something like:

From scripts to functions

Documentation using roxygen2

  • 'roxygen2' is a package that parses function source files for tags (e.g., #' @param) and converts them to the structure of a *.Rd-file.

  • Usually the first line becomes the title (thus, keep it to one line)
  • Second set of lines becomes becomes description
  • Third and further set of lines becomes details (optional)

  • Further down follow tagged items

From scripts to functions

Documentation using roxygen2

  • @param - Function arguments, note argument and then description
  • @return - Function value
  • @examples - Examples section
  • @export - Namespace export, usually the function name
  • @seealso - Related functions to link to
  • @keywords - Well, keywords
  • @section - Arbitrary sections to further structure the documentation

From scripts to functions

Documentation using roxygen2

  • Further LaTeX-like tags to structure the text can be
    • Text formatting (\emph{}, \strong{}, \code{})
    • Links (\code{\link{}}, \href{}{})
    • Lists (\enumerate{}, \itemize{})
    • Equations (\eqn{}, \deqn{})
    • Tables (\tabular{}{\tab \cr})
  • Details see here, Example see source of Luminescence::analyse_baSAR.R

From scripts to functions

What else to document?

Are we finished? Not yet.

  • Data sets need to be documented, as well
  • The package should to be documented, likewise

From scripts to functions

Documentation of data sets using roxygen2

  • Further tags
    • @format - Overview of data structure (copy-paste output of str())
    • @source - Source of the data set, e.g., the internet link
  • Documentation in a file called like the dataset and saved in R/DATA_SET.R
  • Alternatively, all documentation in package documentation (see next slide)
#' Ten numbers from 1 to 10
#'
#' A dataset containing ten ordered natural numbers
#'
#' @format A vector with 10 variables:
#' int [1:10] 1 2 3 4 5 6 7 8 9 10
"x"

From scripts to functions

Documentation of packages using roxygen2

  • Similar to data sets, but the package is defined as a NULL-object
  • Definition of imports (entire packages or external functions)
  • Save as file called PACKAGE_NAME-package.Rin R/
#' A package of diverse functions
#'
#' The package is used to store all my functions, save from my brain.
#'
#' @docType package
#' @name PACKAGE_NAME
#' @import stats
#' @importFrom utils read.table, write.table
NULL

From scripts to functions

HAVE WE FINALLY LOST EVERYONE? :)

In a nutshell: documenting package and datasets means:

  • create an empty R file pkg_name-package.R in pkg_name/R
  • write the 'roxygen2' tags for package documentation (followed by NULL)
  • write the 'roxygen2' tags for data set documentation (followed by data set name)

From scripts to functions

HAVE WE FINALLY LOST EVERYONE? :)

#' eseis: Environmental Seismology Toolbox
#' 
#' This package eseis provides functions to read/write seismic data files...
#'
#' @name eseis
#' @aliases eseis
#' @docType package
#' @author Michael Dietze
#' @importFrom graphics image plot axis axis.POSIXct box mtext
NULL

#' Seismic trace of a rockfall event.
#' 
#' The dataset comprises the seismic signal of a rockfall.
#' 
#' @name rockfall
#' @docType data
#' @format The format is: num [1:98400] 65158 65176 65206 65194 65155 ...
#' @examples
#' ## load example data set and plot it.
#' data(rockfall)
#' plot(rockfall, type = "l")
"rockfall"

Time to move the fingers!

Pffffft, no more details please!

Hands-on session

Task A: Create a simple example dataset: a sequence of numbers from 1 to 10 called x. Save it as x.rda in your package. Maybe you need to create a directory called data, first?

Task B: Write a function that can multiply a numeric vector (x) by a constant (c, default is 1) and return the result. Document the function using roxygen2 tags, including an example.

Task C: Document the package and the example data set using roxygen2.

Time: 5-8 min

Hands-on session

Task A

x <- seq(from = 1, to = 10)
save(x, file = "data/x.rda")

Hands-on session

Task B

#' Multiply a vector by a constant
#' 
#' The function uses simple R functionalities to multiply a numeric 
#' vector x by a constant c and returns the resulting vector.
#' 
#' @param x Numeric vector to be multiplied
#' @param c Numeric value multiplicator, default is \code{1}
#' @return Numeric vector, product of \code{x} and \code{c}
#' @author Michael Dietze
#' @examples
#' data(x)
#' mtp(x = x, c = 2)
#' @export mtp
mtp <- function(x, c = 1) {
  return(x * c)
}

Hands-on session

Task C

#' toolbox: A growing set of handy functions
#' 
#' This package contains one function to help solving problems in R.
#'
#' @name toolbox
#' @docType package
#' @author Michael Dietze
NULL

#' Numeric vector
#' 
#' The dataset contains a numeric vector from 1 to 10.
#' 
#' @name x
#' @docType data
#' @format The format is: num [1:10] 1 2 3 4 5 6 7 8 9 10
#' @examples
#' ## load example data set and plot it.
#' data(x)
#' plot(x)
"x"

Building a package

Building a package in RStudio

Ready to go!

  • Click in the upper right section of RStudio at the "Build" tab.
    • "Build & Reload" will compile the package and restart R
    • "Check" will run the R-internal check routines
    • "More" well, will show you the above and some more options
  • Click on "Check". See what happens. Are there any warnings? Notes? Errors?

Building a package in RStudio

Build a source version hardcopy

  • To share the package, you need to build a source version of it
    • Menu Build > Build Source Package
    • You find a PACKAGE_VERSION_tar.gz-file in your R directory
  • Packages for Windows-users need to be compiled in a different way
  • These packages are ready, they can be freely distributed and installed locally.

The deliberately omitted points

  • Test your functions
  • What if Check stops with an error? See where the error happened:
    • If in the examples, then check the respective function
    • If in the overhead tests before, well, … google for it
  • What if a package user reports an error?
    • Fix it, update package version and notify all relevant persons
    • Thus, use a versioning system (next section)

Building a package in RStudio

Done!

Now, that is it. All that is left is for you to build your package. So, go ahead!

Maintaining a package

Maintaining a package

  • Software lives, it evolves, it ages and it gets kicked by changed hardware.

  • Thus, software needs to be maintained and updated. But by more than just fixing a bug and raising the version number.

  • Maintenance should be handled as open, transparent and coherent as possible.

  • Git is a powerful versioning tool, and R and RStudio perfectly align with it.

Tracking changes

Using the versioning tool Git

  • The process tracks all code changes on your computer
  • GitHub is an online platform to share and maintain your code based on
  • SmartGit is a local client software for (alternative via console)
  • and Github are far from being user friendly, but they are powerful
  • It is very likely that you use only a tiny fraction of the functionality

Tracking changes

Using the versioning tool Git the idea behind

Git and GitHub beyond versioning

  • Github makes sharing up to date software as easy as possible
  • Github is a pillar of transparent, reproducible science
  • Github paves the way for collaborative package/software development
  • Git(Hub) makes spotting well-hidden coding accidents deliberately easy

  • Further information about Git: well, I had not found time for everything :)

Converting a package to a local Git repository

Prerequisites

  • Let us hope you have succesfully installed Git, e.g., via SmartGit
  • Register yourself at , via console
git config --global user.name "YOUR FULL NAME"
git config --global user.email "YOUR EMAIL ADDRESS"
  • To check things;
git config --global --list
  • Don't worry, this was only necessary once.

Converting a package to a local Git repository

Back to RStudio

  • Menu >> Tools > Project Options > Git/SVN
  • Select as version control option
  • Confirm and let RStudio restart itself

  • Note the changes in the restarted version
    • A new icon in the menu bar (green plus, red minus, grey circle)
    • A new tab (Git) in the upper right window of RStudio
  • Click on the tab in the upper right window

Converting a package to a local Git repository

The Git pane

Converting a package to a local Git repository

The Git pane

  • The Git pane shows all items (files) that have experienced a change
  • There are three categories:
    • Untracked (yellow ?)
    • Modified (blue M)
    • Deleted (red D)
  • Click on icon Diff shows in-depth view on changes to a file

Converting a package to a local Git repository

  • That was it! All changes made will now be tracked.

  • First things first: account for untracked items
    • Mark all untracked files and check them
    • Then click "Commit"
    • In the new window, you need to add a comment to the commit (don't comment silly things, commits will be there for eternity)
  • After clicking on "Commit" explore the Commit History (click on "History")
    • All changes are listed (green = added, red = deleted)

Converting a package to a local Git repository

  • So let us make some changes.
    • Add a new function.
    • Modify the silly "Hello World" function.
    • Build the package.
  • Explore the Git pane
    • Which files are there?
    • Commit the changes.
    • Explore the history.

Some more details about Git

Commits

  • Commits are the fundamental unit of version control in Git (like saving a file under a different name)
  • Commit whenever you have solved an isolated issue but not before it is done properly
  • Commit comments should allow understanding what was done and why it was done, after years

Some more details about Git

Traceback and correct mistakes

  • Before you committed a change (but after saving the file), right-click on the file to restore and choose "Revert"
  • Use the history to trace back when something was changed longer ago
    • Copy the SHA (secure hash algorithm, i.e. unique commit ID)
    • Use the console for restoring this commit (git checkout <SHA> <filename>)
    • Looks clumsy, right? See SmartGit slide for an alternative.

What about SmartGit? Why did we install this?

  • RStudio covers only a very tiny fraction of the Git functionality
  • SmartGit is much more elaborated – and complicated to use
  • SmartGit is a powerful GUI for Git. Whatever you cannot do in RStudio, you can do it in SmartGit

  • We installed SmartGit mainly for getting access to additional options

  • Instead of using RStudio for tracking changes, we can use SmartGit as well

What about SmartGit? Why did we install this?

Testing

Everything is a test!

Testing

Why?

  • Testing is an integral part of the development process
  • Testing ensures that the function/package does what it should
  • Testing improves stability and code quality
  • Testing leads to better structured and robust code

Testing

What?

  • Allowed input/output scenarios
  • Algorithm robustness
  • Changed dependencies
  • Depending packages

Testing

Who?

  • The package developer (basically you)
  • Beta-testers (a fortunate situation)
  • User (good luck)
  • Machine

Testing

How?

  • Manual testing (bascially you when writing the code)
  • Semi-automated testing (tests based on previously desgined scenarios)
  • Automated testing (e.g., bootstrapping methods)

Manual testing gives you a good start, but it becomes a tedious task for large packages.

Testing

Solutions for the R environment

Testing

Some solutions for the R environment - CRAN vs Travis CI

Testing

Example for the R package 'Luminescence'

Testing

The R package 'testthat'

The R package 'testthat' by Hadley Wickham to cover commonly encountered test scenarios in a semi-automatised test environment. Features:

  • Write your test scenarios using familiar syntax and grammar
  • Run tests every time you build the package
  • Run the test automatically in combination with GitHub and Travis CI

Testing

The R package 'testthat' - implementation

  • Create a folder 'tests/' in your package path, add the follwing subfolders
    • 'tests/testdata/'
    • 'tests/testthat/'
  • Create a file 'tests/testthat.R' with the following content
library(testthat)
library(YOURPACKAGE)

test_check("YOURPACKAGE")
  • Add the package 'testthat' in the 'Suggests' file in the DESCRIPTION file

Testing

The R package 'testthat' - simple example

library(testthat)

##create a simple function
f <- function(x){factorial(x)}

##test the output for two scenarios
testthat::expect_silent(f(10))
testthat::expect_error(f("a"))

Recall: Test scripts are stored within separate R-files in the folder 'tests/'

Testing

The R package 'testthat' - package example

context("analyse_Al2O3_ITC")

test_that("Full check", {
  skip_on_cran()

   ##check stops
   expect_error(object = analyse_Al2O3_ITC(object = "test"))

   ##input curve type
   a <- set_RLum(class = "RLum.Data.Curve", recordType = "OSL", data = matrix(1:20, ncol = 2))
   b <- set_RLum(class = "RLum.Data.Curve", recordType = "TL")
   object <- set_RLum(class = "RLum.Analysis", records = list(a,b))
   expect_error(object = analyse_Al2O3_ITC(object))

   ##check with example data
   data(ExampleData.Al2O3C, envir = environment())
   expect_is(analyse_Al2O3C_ITC(data_ITC), "RLum.Results")

})

Wrapping up, apart from "R is great"

This course covered…

  • What packages are and why they are like they are
  • Structure/contents of a package
  • How to turn a script into a function

  • Creating a package from scratch
  • Adding functions and their documentations
  • Using Git via RStudio to maintain a package
  • Code testing

What did we (deliberately) forget?

  • How to write "good" R-code for functions
  • Including low level code and Shiny-apps
  • How to submit a package to CRAN and GitHub
  • Writing Vignettes
  • Debugging strategies

Last note

It doesn’t matter if your first version isn’t perfect as long as the next version is better. (H. Wickham)