Writing and maintaining packages

EGU 2018 | Wed, 11 Apr, 13:30-15:00 /Room 2.16

Overview

R packages – an introduction (5 min)
Structure and contents of R packages (15 min)
Initiating an R package with RStudio (5 min)
Writing a function with 'roxygen2' (20 min)
Building the package (5 min)
Maintaining a package with Git (15 min)
Testing and integrating checks (15 min)
Sharing a package (10 min)

R packages – What?

This course does cover
- a brief idea of the concept of R packages
- a discussion/justification of package contents
- hands-on work to build a package
- work beyond heaving a package onto the shelf and vanish
You are looking for an introduction to R … sorry, bad luck!

R packages – What?

What you should be already familiar with

R and RStudio
Installing and using R packages
Writing scripts (and functions)
Structuring code and following good practice rules
You are looking for an introduction to R … sorry, bad luck!(*)

(*) Don't repeat yourself.

R packages

Further resources

This course heavily borrows ideas and concepts from
- Hadley Wickham's book: R packages
- The official R-documentation for writing packages
- Dicussions among the R package 'Luminescence' developer team

By the way, Hadley Wickham's Advanced R is a warm recommendation.

R packages

Prerequisites and computational minions

RStudio: The skin for R, a proper developer (and user) environment
- Support for creating scripts, packages, reports, books, slides, …
- Plots, data environment history, file browser, help viewer
- Auto completion, syntax highlighting, context help
- Version control and project management
The R package 'devtools' by H. Wickham, J. Hester & W. Chang
The R package 'roxygen2' by H. Wickham, P. Danenberg & M. Eugster

R packages

Concepts

R lives from sharing code, and packages are the vehicles for this idea
Packages can be the pillars for open and reproducible science
Packages comprise the full set of self-contained components

Libraries are no packages! Libraries contain packages. You pull a package from a library.

R packages

Some illusions

R packages have to go to the Comprehensive R Archive Network (CRAN)
Writing R packages is a lot of boring effort
You can start easy and add further details later
People will use your package in the way you designed it

R packages

Why writing packages

You want to share your code with others
The code is organised in a coherent way
You want to handle/distribute only a single file
Bundling code makes keeping track much easier than collecting scripts
The code is automatically tested (by your examples and other routines)

Working with packages simply saves time and brain cells

R packages – Contents

R package contents

An overview

What do you think? What belongs to an R package?

R package contents

An overview of obvious material

A name (yes, a package name)
A description (the meta information)
A function (yes, there are packages with only one function)
A documentation (you want to understand what the function does)
A working examples (you don't want to rely on the documentation, only)
A set of further stuff that will be covered later

R package contents

An overview

R package contents

The package name

This is (or should be) the hardest job when creating a package
Requirements
- Only letters, numbers and periods are allowed
- Start with a letter, do not end with a period
Advice
- Pick a unique name you can google, best describing your package
- Check if the name already exists beforehand
Example: 'praise' by G. Csardi & S. Sorhus

R package contents

The DESCRIPTION file

The mandatory file that defines all the metadata of the package: name & short title, version & date, author & contact, license & dependencies:

R package contents

The DESCRIPTION file - Title and description

Title must be capitalised, only one line, not ending with a period
Description can be several sentences, but only one paragraph. Lines can only contain 80 characters and must be indented by 4 spaces.

Title: Environmental seismology toolbox
Description: A collection of functions to handle seismic data for the
    purpose of investigating the seismic signals emitted by Earth surface
    processes. The package supports inporting standard formats, data 
    preparation and analysis techniques and data visualisation.

Both elements are important. They will be indexed and Google has learned a lot to spot R packages.

R package contents

The DESCRIPTION file – Dependencies

Dependency options in short (read more):
- Depends: all packages your package essentially needs to run
- Imports: will be covered by the namespace section
- Suggests: optionally needed packages
- LinkingTo: needed to reference C++/C libraries
CRAN became strict with the number entries in 'Depends'. Use importFrom() in the file NAMESPACE, instead.

R package contents

The DESCRIPTION file – Author information and roles

Author information can be defined more comprehensively (read more):

Authors@R: person("First", "Last", email = "first.last@example.com",
                  role = c("aut", "cre"))

Essential for correct citation of packages! Nota bene: Type citation("PACKAGENAME") to see how a package should be cited
Don't use fake mails. CRAN and users cannot communicate with you.

Since end of 2017, CRAN supports ORCID

Authors@R: person(...,
                  comment = c(ORCID = "0000-0002-9079-593X"))

R package contents

The DESCRIPTION file – License issues

The key element to inform who can use the package for which purpose!
Either a link to a license file (License: file LICENSE) or a keyword of standard licenses (read more):
- GPL-2 or GPL-3: copy-left license , other users must license code GPL-compatile. Common for CRAN-submission.
- CC0: give away all rights, anything can be done with the code
- BSD or MIT: permissive licenses, require additional file LICENSE.

R package contents

The DESCRIPTION file – Version patterns

Version numbers must be numeric and separated by a period.
They are more than just counters, they define dependency satisfactions
Format: MAJOR.MINOR.PATCH (start a released package with 0.1.0)
- MAJOR releases should be rare
- MINOR releases should keep the package up to date
- PATCH releases may be frequent (but think of CRAN team time budget)
Make use of a NEWS file to announce version history and changes.

R package contents

Further contents

R package contents

R code

The actual function definitions
Will be covered by the next section

R package contents

Code documentation

The (second) most important part of a reasonable package
Omitting it means, nobody will be able to use your package
Documentation in R is reference documentation, similar to dictionaries
Additional documentation is covered by vignettes (not covered here)
In R, documentation in *.Rd-files is formalised and follows a pseudo LaTeX scheme (short version, long version)

R package contents

Code documentation - code example

R package contents

Code documentation - how it looks like

R package contents

Code documentation

Why should you be not attempt to build documentation files manually?
- Tedious, clumsy, not intuitive
- Prone to forget updating after changing the function
Alternative, write documentation in function definitions
- roxygen2
- inlinedocs (no longer updated)
Application see next section

R package contents

Examples and example data

Working (and worked) examples are mandatory documentation items
Serve two things
- Explain usage of the function
- Creates a test, excecuted every time the package is built
Typically useful to include example data sets
- Must be provided as *.rda files (generated with save())
- Must be stored in directory data
- Must be documented individually

R package contents

Further contents

Namespace (will be dealt with automatically, read more)
- Defines function name assignments to packages
Compiled code (not covered here)
- C++ code present in src will be compiled during installation
Shiny apps and other installed software (not covered here)
- further files/software in inst will be copied to main directory

Time for a short brainstorming break!

Pffffft, a lot of dense and boring input, right?

Brainstorming break

The task: Think about and collect the essential items for your own package. Note the results in a plain text document for later use. Time: about 4 minutes.

We will need this material soon to build a package, … an empty one, … which will finally only have one function.

Brainstorming break

The task: Think about and collect the essential items for your own package. Note the results in a plain text document for later use. Time: about 4 minutes.

We will need this material soon to build a package, … an empty one, … which will finally only have one function.

Already need a short reminder? :)

Brainstorming break

package name
package title
package description
version
date
author(s)
maintainer (plus email address)
license

Creating a package in RStudio

Start with a new project: Menu >> File > New Project…

Creating a package in RStudio

Start with a new project: Menu File > New Project…

Creating a package in RStudio

The template as overview

A new RStudio-project is started (file PACKAGE_NAME.Rproj)
Close and delete the template function and its documentation straight away.
Configure the Build tools: Menu >> Tools > Project Options > Build Tools
- make sure to check "Generate documentation with Roxygen".
The NAMESPACE contains only: exportPattern("^[[:alpha:]]+"), i.e., every function is exported!
The DESCRIPTION file contains auto-generated content. Modify it so that it fits your needs.

From scripts to functions

An unfortunate start …

a <- 10:50
print(a)
plot(a)
b <- 210:250
plot(a, b)
A <- a * b
c <- 5
V <- A * c
print(A)
print(V)
plot(a,V)

From scripts to functions

… ok …

a <- 10:50
b <- 210:250
c <- 5

A <- a * b
V <- A * c

plot(a)
plot(a, b)
plot(a,V)

print(a)
print(A)
print(V)

From scripts to functions

… better …

## define object geometry
a <- 10:50
b <- 210:250
c <- 5

## calculate area and volume
A <- a * b
V <- A * c

## plot object dimensions
plot(a)
plot(a, b)
plot(a, V)

## print values
print(a)
print(A)
print(V)

From scripts to functions

Wrapping code to functions

f <- function(a, b, c) {
  
  ## calculate area and volume
  A <- a * b
  V <- A * c

  ## plot object dimensions
  plot(a)
  plot(a, b)
  plot(a, V)

  ## return values
  return(list(A = A,
              V = V))
}

From scripts to functions

Wrapping code to functions

f <- function(a, b, c, plot = TRUE) {
  
  ## calculate area and volume
  A <- a * b
  V <- A * c

  ## optionally plot object dimensions
  if(plot == TRUE) {
    
    plot(a)
    plot(a, b)
    plot(a, V)
  }

  ## return values
  return(list(A = A,
              V = V))
}

From scripts to functions

Wrapping code to functions

f(a = 10, b = 100, c = 5, plot = FALSE)

## $A
## [1] 1000
## 
## $V
## [1] 5000

From scripts to functions

In words

Structure your script
- Variable/object/argument definitions
- Data/variable checks and automatic assignments
- Data manipulation/evaluation
- Optional further outputs
- Return object creation
Wrap it into a function definition
- FUNCTION_NAME(ARUMENT_1, ARGUMENT_2) {FUNCTION BODY}

From scripts to functions

In words

Ooops, what did we forget?

From scripts to functions

In words

Ooops, what did we forget?
Function documentation (in a separate file)

\name{f}
\alias{f}
\title{Calculate and plot cuboid areas and volumes.}
\usage{f(a, b, c, plot = TRUE)}
\arguments{
\item{a}{\code{Numeric} vector, length of the cuboid.}
\item{b}{\code{Numeric} vector, width of the cuboid.}
\item{c}{\code{Numeric} vector, height of the cuboid.}}
\value{A list with cuboid area and volume.}
\description{The function takes numeric vectors of the cardinal 
  dimensions of a cuboid object and calculates area and volume. The results can optionally be plotted.}
\examples{f(a = 10, b = 100, c = 5, plot = FALSE)}
\author{Michael Dietze}

From scripts to functions

Documentation using 'roxygen2'

So why not writing documentation into the function definition file?

Another brief function example

f <- function(x, p = 2) {
  ## calculate the power of x
  y <- x^p

  ## return value
  return(y)
}

From scripts to functions

Documentation using roxygen2

Can be rewritten like this:

#' @title Calculate the power of a vector               # TITLE
#' 
#' @description The function calculates something       # DESCRIPTION
#' 
#' @details The function simply combines the arguments. # DETAILS 
#'
#' @param x input vector                                # ARGUMENTS
#' @param p power exponent                              # ARGUMENTS
#' @return vector of the power p of x.                  # VALUE
#' @author Michael Dietze                               # AUTHOR(S)
#' @examples
#' f(x = 10, p = 3)                                     # EXAMPLES
#' @export                                              # NAMESPACE ENTRY
f <- function(x, p = 2) {                               # USAGE
  return(x^p)
}

From scripts to functions

Documentation using roxygen2

And become something like:

From scripts to functions

Documentation using roxygen2

'roxygen2' is a package that parses function source files for tags (e.g., #' @param) and converts them to the structure of a *.Rd-file.
Usually the first line becomes the title (thus, keep it to one line)
Second set of lines becomes becomes description
Third and further set of lines becomes details (optional)
Further down follow tagged items

From scripts to functions

Documentation using roxygen2

@param - Function arguments, note argument and then description
@return - Function value
@examples - Examples section
@export - Namespace export, usually the function name
@seealso - Related functions to link to
@keywords - Well, keywords
@section - Arbitrary sections to further structure the documentation

From scripts to functions

Documentation using roxygen2

Further LaTeX-like tags to structure the text can be
- Text formatting (\emph{}, \strong{}, \code{})
- Links (\code{\link{}}, \href{}{})
- Lists (\enumerate{}, \itemize{})
- Equations (\eqn{}, \deqn{})
- Tables (\tabular{}{\tab \cr})
Details see here, Example see source of Luminescence::analyse_baSAR.R

From scripts to functions

What else to document?

Are we finished? Not yet.

Data sets need to be documented, as well
The package should to be documented, likewise

From scripts to functions

Documentation of data sets using roxygen2

Further tags
- @format - Overview of data structure (copy-paste output of str())
- @source - Source of the data set, e.g., the internet link
Documentation in a file called like the dataset and saved in R/DATA_SET.R
Alternatively, all documentation in package documentation (see next slide)

#' Ten numbers from 1 to 10
#'
#' A dataset containing ten ordered natural numbers
#'
#' @format A vector with 10 variables:
#' int [1:10] 1 2 3 4 5 6 7 8 9 10
"x"

From scripts to functions

Documentation of packages using roxygen2

Similar to data sets, but the package is defined as a NULL-object
Definition of imports (entire packages or external functions)
Save as file called PACKAGE_NAME-package.Rin R/

#' A package of diverse functions
#'
#' The package is used to store all my functions, save from my brain.
#'
#' @docType package
#' @name PACKAGE_NAME
#' @import stats
#' @importFrom utils read.table, write.table
NULL

From scripts to functions

HAVE WE FINALLY LOST EVERYONE? :)

In a nutshell: documenting package and datasets means:

create an empty R file pkg_name-package.R in pkg_name/R
write the 'roxygen2' tags for package documentation (followed by NULL)
write the 'roxygen2' tags for data set documentation (followed by data set name)

From scripts to functions

HAVE WE FINALLY LOST EVERYONE? :)

#' eseis: Environmental Seismology Toolbox
#' 
#' This package eseis provides functions to read/write seismic data files...
#'
#' @name eseis
#' @aliases eseis
#' @docType package
#' @author Michael Dietze
#' @importFrom graphics image plot axis axis.POSIXct box mtext
NULL

#' Seismic trace of a rockfall event.
#' 
#' The dataset comprises the seismic signal of a rockfall.
#' 
#' @name rockfall
#' @docType data
#' @format The format is: num [1:98400] 65158 65176 65206 65194 65155 ...
#' @examples
#' ## load example data set and plot it.
#' data(rockfall)
#' plot(rockfall, type = "l")
"rockfall"

Time to move the fingers!

Pffffft, no more details please!

Hands-on session

Task A: Create a simple example dataset: a sequence of numbers from 1 to 10 called x. Save it as x.rda in your package. Maybe you need to create a directory called data, first?

Task B: Write a function that can multiply a numeric vector (x) by a constant (c, default is 1) and return the result. Document the function using roxygen2 tags, including an example.

Task C: Document the package and the example data set using roxygen2.

Time: 5-8 min

Hands-on session

Task A

x <- seq(from = 1, to = 10)
save(x, file = "data/x.rda")

Hands-on session

Task B

#' Multiply a vector by a constant
#' 
#' The function uses simple R functionalities to multiply a numeric 
#' vector x by a constant c and returns the resulting vector.
#' 
#' @param x Numeric vector to be multiplied
#' @param c Numeric value multiplicator, default is \code{1}
#' @return Numeric vector, product of \code{x} and \code{c}
#' @author Michael Dietze
#' @examples
#' data(x)
#' mtp(x = x, c = 2)
#' @export mtp
mtp <- function(x, c = 1) {
  return(x * c)
}

Hands-on session

Task C

#' toolbox: A growing set of handy functions
#' 
#' This package contains one function to help solving problems in R.
#'
#' @name toolbox
#' @docType package
#' @author Michael Dietze
NULL

#' Numeric vector
#' 
#' The dataset contains a numeric vector from 1 to 10.
#' 
#' @name x
#' @docType data
#' @format The format is: num [1:10] 1 2 3 4 5 6 7 8 9 10
#' @examples
#' ## load example data set and plot it.
#' data(x)
#' plot(x)
"x"

Building a package

Building a package in RStudio

Ready to go!

Click in the upper right section of RStudio at the "Build" tab.
- "Build & Reload" will compile the package and restart R
- "Check" will run the R-internal check routines
- "More" well, will show you the above and some more options
Click on "Check". See what happens. Are there any warnings? Notes? Errors?

Building a package in RStudio

Build a source version hardcopy

To share the package, you need to build a source version of it
- Menu Build > Build Source Package
- You find a PACKAGE_VERSION_tar.gz-file in your R directory
Packages for Windows-users need to be compiled in a different way
- The tedious way: CRAN website and Biostat description
- The shortcut: Winbuilder, but resources are precious!
These packages are ready, they can be freely distributed and installed locally.

The deliberately omitted points

Test your functions
- Either write test scripts (apart from the examples)
- Or be holistic and rigorous
What if Check stops with an error? See where the error happened:
- If in the examples, then check the respective function
- If in the overhead tests before, well, … google for it
What if a package user reports an error?
- Fix it, update package version and notify all relevant persons
- Thus, use a versioning system (next section)

Building a package in RStudio

Done!

Now, that is it. All that is left is for you to build your package. So, go ahead!

Maintaining a package

Software lives, it evolves, it ages and it gets kicked by changed hardware.
Thus, software needs to be maintained and updated. But by more than just fixing a bug and raising the version number.
Maintenance should be handled as open, transparent and coherent as possible.
Git is a powerful versioning tool, and R and RStudio perfectly align with it.

Tracking changes

Using the versioning tool Git

The process tracks all code changes on your computer
GitHub is an online platform to share and maintain your code based on
SmartGit is a local client software for (alternative via console)
and Github are far from being user friendly, but they are powerful
It is very likely that you use only a tiny fraction of the functionality

Tracking changes

Using the versioning tool Git the idea behind

The basic idea of Git according to https://git-scm.com/

Git and GitHub beyond versioning

Github makes sharing up to date software as easy as possible
Github is a pillar of transparent, reproducible science
Github paves the way for collaborative package/software development
Git(Hub) makes spotting well-hidden coding accidents deliberately easy
Further information about Git: well, I had not found time for everything :)

Converting a package to a local Git repository

Prerequisites

Let us hope you have succesfully installed Git, e.g., via SmartGit
Register yourself at , via console

git config --global user.name "YOUR FULL NAME"
git config --global user.email "YOUR EMAIL ADDRESS"

To check things;

git config --global --list

Don't worry, this was only necessary once.

Converting a package to a local Git repository

Back to RStudio

Menu >> Tools > Project Options > Git/SVN
Select as version control option
Confirm and let RStudio restart itself
Note the changes in the restarted version
- A new icon in the menu bar (green plus, red minus, grey circle)
- A new tab (Git) in the upper right window of RStudio
Click on the tab in the upper right window

Converting a package to a local Git repository

The Git pane

Converting a package to a local Git repository

The Git pane

The Git pane shows all items (files) that have experienced a change
There are three categories:
- Untracked (yellow ?)
- Modified (blue M)
- Deleted (red D)
Click on icon Diff shows in-depth view on changes to a file

Converting a package to a local Git repository

That was it! All changes made will now be tracked.
First things first: account for untracked items
- Mark all untracked files and check them
- Then click "Commit"
- In the new window, you need to add a comment to the commit (don't comment silly things, commits will be there for eternity)
After clicking on "Commit" explore the Commit History (click on "History")
- All changes are listed (green = added, red = deleted)

Converting a package to a local Git repository

So let us make some changes.
- Add a new function.
- Modify the silly "Hello World" function.
- Build the package.
Explore the Git pane
- Which files are there?
- Commit the changes.
- Explore the history.

Some more details about Git

Commits

Commits are the fundamental unit of version control in Git (like saving a file under a different name)
Commit whenever you have solved an isolated issue but not before it is done properly
Commit comments should allow understanding what was done and why it was done, after years

Some more details about Git

Traceback and correct mistakes

Before you committed a change (but after saving the file), right-click on the file to restore and choose "Revert"
Use the history to trace back when something was changed longer ago
- Copy the SHA (secure hash algorithm, i.e. unique commit ID)
- Use the console for restoring this commit (git checkout <SHA> <filename>)
- Looks clumsy, right? See SmartGit slide for an alternative.

What about SmartGit? Why did we install this?

RStudio covers only a very tiny fraction of the Git functionality
SmartGit is much more elaborated – and complicated to use
SmartGit is a powerful GUI for Git. Whatever you cannot do in RStudio, you can do it in SmartGit
We installed SmartGit mainly for getting access to additional options
Instead of using RStudio for tracking changes, we can use SmartGit as well

What about SmartGit? Why did we install this?

Testing

Everything is a test!

Testing

Why?

Testing is an integral part of the development process
Testing ensures that the function/package does what it should
Testing improves stability and code quality
Testing leads to better structured and robust code

Testing

What?

Allowed input/output scenarios
Algorithm robustness
Changed dependencies
Depending packages
…

Testing

Who?

The package developer (basically you)
Beta-testers (a fortunate situation)
User (good luck)
Machine

Testing

How?

Manual testing (bascially you when writing the code)
Semi-automated testing (tests based on previously desgined scenarios)
Automated testing (e.g., bootstrapping methods)

Manual testing gives you a good start, but it becomes a tedious task for large packages.

Testing

Solutions for the R environment

Testing

Some solutions for the R environment - CRAN vs Travis CI

Testing

Example for the R package 'Luminescence'

Testing

The R package 'testthat'

The R package 'testthat' by Hadley Wickham to cover commonly encountered test scenarios in a semi-automatised test environment. Features:

Write your test scenarios using familiar syntax and grammar
Run tests every time you build the package
Run the test automatically in combination with GitHub and Travis CI

Testing

The R package 'testthat' - implementation

Create a folder 'tests/' in your package path, add the follwing subfolders
- 'tests/testdata/'
- 'tests/testthat/'
Create a file 'tests/testthat.R' with the following content

library(testthat)
library(YOURPACKAGE)

test_check("YOURPACKAGE")

Add the package 'testthat' in the 'Suggests' file in the DESCRIPTION file

Testing

The R package 'testthat' - simple example

library(testthat)

##create a simple function
f <- function(x){factorial(x)}

##test the output for two scenarios
testthat::expect_silent(f(10))
testthat::expect_error(f("a"))

Recall: Test scripts are stored within separate R-files in the folder 'tests/'

Testing

The R package 'testthat' - package example

context("analyse_Al2O3_ITC")

test_that("Full check", {
  skip_on_cran()

   ##check stops
   expect_error(object = analyse_Al2O3_ITC(object = "test"))

   ##input curve type
   a <- set_RLum(class = "RLum.Data.Curve", recordType = "OSL", data = matrix(1:20, ncol = 2))
   b <- set_RLum(class = "RLum.Data.Curve", recordType = "TL")
   object <- set_RLum(class = "RLum.Analysis", records = list(a,b))
   expect_error(object = analyse_Al2O3_ITC(object))

   ##check with example data
   data(ExampleData.Al2O3C, envir = environment())
   expect_is(analyse_Al2O3C_ITC(data_ITC), "RLum.Results")

})

Wrapping up, apart from "R is great"

This course covered…

What packages are and why they are like they are
Structure/contents of a package
How to turn a script into a function
Creating a package from scratch
Adding functions and their documentations
Using Git via RStudio to maintain a package
Code testing

What did we (deliberately) forget?

How to write "good" R-code for functions
Including low level code and Shiny-apps
How to submit a package to CRAN and GitHub
Writing Vignettes
Debugging strategies

Last note

It doesn’t matter if your first version isn’t perfect as long as the next version is better. (H. Wickham)