Basic for loop

  • Loops are the fundamental structure for repetition in programming
  • for loops perform the same action for each item in a list of things
for (item in list_of_items) {
  do_something(item)
}
  • To see an example of this let’s calculate masses from volumes using a loop
  • Need print() to display values inside a loop or function
volumes = c(1.6, 3, 8)
for (volume in volumes){
  print(2.65 * volume^0.9)
}
  • Code takes the first value from volumes and assigns it to volume and does the calculation and prints it
  • Then it takes the second value from volumes and assigns it to volume and does the calculation and prints it
  • And so on
  • So, this loop does the same exact thing as
volume <- volumes[1]
print(2.65 * volume ^ 0.9)
volume <- volumes[2]
print(2.65 * volume ^ 0.9)
volume <- volumes[3]
print(2.65 * volume ^ 0.9)
  • Like with functions and conditionals loops can have many rows of code
  • Everything between the curly brackets is executed each time through the loop
  • Let’s expand our look so that it first estimates the mass, then converts it from kilograms to pounds, and then prints out the value
for (volume in volumes){
   mass <- 2.65 * volume ^ 0.9
   mass_lb <- mass * 2.2
   print(mass_lb)
}

Do Tasks 1 & 2 in Basic For Loops.

Looping with an index & storing results

  • In the last video we saw that in R loops iterate over a series of values in a vector or other list like object
  • When we use that value directly this is called looping by value
  • But there is another way to loop, which is called looping by index
  • Looping by index loops over a list of integer index values, typically starting at 1
  • These integers are then used to access values in one or more vectors at the position inicated by the index
  • If we modified our previous loop to use an index it would look like this
  • We often use i to stand for “index” as the variable we update with each step through the loop
  • We then create a vector of position values starting at 1 (for the first value) and ending with the length of the object we are looping over
  • Then inside the loop instead of doing the calculation on the index (which is just a number between 1 and 3 in our case)
  • We use square brackets and the index to get the appropriate value out of our vector
volumes = c(1.6, 3, 8)
for (i in 1:length(volumes)){
   mass <- 2.65 * volumes[i] ^ 0.9
   print(mass)
}
  • This gives us the same result, but it’s more complicated to understand
  • So why would we loop by index?
  • The advantage to looping by index is that it lets us do more complicated things
  • One of the most common things we use this for are storing the results we calculated in the loop
  • To do this we start by creating an empty object the same length as the results will be
  • To store results in a vector we use the function vector to create an empty vector of the right length
  • mode is the type of data we are going to store
  • length is the length of the vector
masses <- vector(mode = "numeric", length = length(volumes))
masses
  • Then add each result in the right position in this vector
  • For each trip through the loop put the output into the empty vector at the ith position
for (i in 1:length(volumes)){
   mass <- 2.65 * volumes[i] ^ 0.9
   masses[i] <- mass
}
masses
  • Walk through iteration in debugger

Do Tasks 3-4 in Basic For Loops.

Looping over multiple values

  • Looping with an index also allows us to access values from multiple vectors
b0 <- c(2.65, 1.28, 3.29)
b1 <- c(0.9, 1.1, 1.2)
volumes = c(1.6, 3, 8)
masses <- vector(mode="numeric", length=length(volumes))
for (i in seq_along(volumes)){
   mass <- b0[i] * volumes[i] ^ b1[i]
   masses[i] <- mass
}

Do Task 5 in Basic For Loops.

Looping with functions

  • It is common to combine loops with with functions by calling one or more functions as a step in our loop
  • For example, let’s take the non-vectorized version of our est_mass function that returns an estimated mass if the volume > 5 and NA if it’s not.
est_mass <- function(volume){
  if (volume > 5) {
    mass <- 2.65 * volume ^ 0.9
  } else {
    mass <- NA
  }
  return(mass)
}

volumes = c(1.6, 3, 8)
  • We can’t pass the vector to the function and get back a vector of results because of the if statements
  • So let’s loop over the values
  • First we’ll create an empty vector to store the results
  • And them loop by index, callling the function for each value of volumes
masses <- vector(mode="numeric", length=length(volumes))
for (i in length(volumes)){
   mass <- est_mass(volumes[i])
   masses[i] <- mass
}
  • This is the for loop equivalent of an sapply statement we used in a previous lesson
masses_apply <- sapply(volumes, est_mass)
  • How to choose when there are many ways to do the same thing?
    • Speed
      • Matters in few cases
      • Hard to identify bottlenecks
    • Readability
      • Easy to understand
    • Personal preference
  • There is single best choice

Do Size Estimates By Name Loop.

Looping over files

  • Repeat same actions on many similar files
  • Let’s download some simulated satellite collar data
download.file("http://www.datacarpentry.org/semester-biology/data/locations.zip",
              "locations.zip")
unzip("locations.zip")
  • Now we need to get the names of each of the files we want to loop over
  • We do this using list.files()
  • If we run it without arguments it will give us the names of all files in the directory
list.files()
  • But we just want the data files so we’ll add the optional pattern argument to only get the files that start with "locations-"
  • The * is a wild card, so this means “starts with locations- and includes anything afterwards”
data_files = list.files(pattern = "locations-*", 
                        full.names = TRUE)
  • Once we have this list we can loop over it count the number of observations in each file
  • First create an empty vector to store those counts
results <- vector(mode = "integer", length = length(data_files))
  • Then write our loop
for (i in 1:length(data_files){
  data <- read.csv(data_files[i])
  count <- nrow(data)
  results[i] <- count
}

Do Task 1 of Multiple-file Analysis. Exercise uses different collar data

Storing loop results in a data frame

  • We often want to calculate multiple pieces of information in a loop making it useful to store results in things other than vectors
  • We can store them in a data frame instead by creating an empty data frame and storing the results in the ith row of the appropriate column
  • Associate the file name with the count
  • Start by creating an empty data frame
  • Use the data.frame function
  • Provide one argument for each column
  • “Column Name” = “an empty vector of the correct type”
results <- data.frame(file_name = vector(mode = "character", length = length(data_files)))
                      count = vector(mode = "integer", length = length(data_files)))
  • Now let’s modify our loop from last time
  • Instead of storing count in results[i] we need to first specify the count column using the $: results$count[i]
  • We also want to store the filename, which is data_files[i]
for (i in 1:length(data_files){
  data <- read.csv(data_files[i])
  count <- nrow(data)
  results$file_name[i] <- data_files[i]
  results$count[i] <- count
}
  • We could also rewrite this a little to make it easier to understand by getting the file name at the begging
for (i in 1:length(data_files){
  filename <- data_files[i]
  data <- read.csv(filename)
  count <- nrow(data)
  results$file_name[i] <- filename
  results$count[i] <- count
}

Do Task 2 Multiple-file Analysis. Exercise uses different collar data

Subsetting Data (optional)

  • Loops can subset in ways that are difficult with things like group_by
  • Look at some data on trees from the National Ecological Observatory Network
library(ggplot2)
library(dplyr)

neon_trees <- read.csv('data/HARV_034subplt.csv')
ggplot(neon_trees, aes(x = easting, y = northing)) +
  geom_point()
  • Look at a north-south gradient in number of trees
  • Need to know number of trees in each band of y values
  • Start by defining the size of the window we want to use
    • Use the grid lines which are 2.5 m
window_size <- 2.5
  • Then figure out the edges for each window
south_edges <- seq(4713095, 4713117.5, by = window_size)
north_edges <- south_edges + window_size
  • But we don’t want to go all the way to the far edge
south_edges <- seq(4713095, 4713117.5 - window_size, by = window_size)
north_edges <- south_edges + window_size
  • Set up an empty data frame to store the output
counts <- vector(mode = "numeric", length = length(left_edges))
  • Look over the left edges and subset the data occuring within each window
for (i in 1:length(south_edges)) {
  data_in_window <- filter(neon_trees, northing >= south_edges[i], northing < north_edges[i])
  counts[i] <- nrow(data_in_window)
}
counts

Nested Loops (optional)

  • Sometimes need to loop over multiple things in a coordinate fashion
  • Pass a window over some spatial data
  • Look at full spatial pattern not just east-west gradient

  • Basic nested loops work by putting one loop inside another one
for (i in 1:10) {
  for (j in 1:5) {
    print(paste("i = " , i, "; j = ", j))
  }
}
  • Loop over x and y coordinates to create boxes
  • Need top and bottom edges
east_edges <- seq(731752.5, 731772.5 - window_size, by = window_size)
west_edges <- east_edges + window_size

  • Redefine out storage
output <- matrix(nrow = length(south_edges), ncol = length(east_edges))
for (i in 1:length(south_edges)) {
  for (j in 1:length(east_edges)) {
    data_in_window <- filter(neon_trees,
                            northing >= south_edges[i], northing < north_edges[i],
                            easting >= left_edges[j], easting < right_edges[j],)
    output[i, j] <- nrow(data_in_window)
  }
}
output

Sequence along (optional)

  • seq_along() generates a vector of numbers from 1 to length(volumes)