R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate vs plyr

From slide 21 of http://www.slideshare.net/hadley/plyr-one-data-analytic-strategy:

via plyr – R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate vs – Stack Overflow.

  • applyWhen you want to apply a function to the rows or columns of a matrix (and higher-dimensional analogues).
    # Two dimensional matrix
    M <- matrix(seq(1,16),4,4)# apply min to rows
    apply(M,1, min)[1]1234# apply min to columns
    apply(M,2, max)[1]481216# 3 dimensional array
    M <- array( seq(32), dim = c(4,4,2))# Apply sum across each M[*, , ] - i.e Sum across 2nd and 3rd dimension
    apply(M,1, sum)# Result is one-dimensional[1]120128136144# Apply sum across each M[*, *, ] - i.e Sum across 3rd dimension
    apply(M, c(1,2), sum)# Result is two-dimensional[,1][,2][,3][,4][1,]18263442[2,]20283644[3,]22303846[4,]24324048

    If you want row/column means or sums for a 2D matrix, be sure to investigate the highly optimized, lightening-quick colMeans, rowMeans, colSums, rowSums.

  • lapplyWhen you want to apply a function to each element of a list in turn and get a list back.This is the workhorse of many of the other *apply functions. Peel back their code and you will often find lapply underneath.
       x <- list(a =1, b =1:3, c =10:100) 
       lapply(x, FUN = length) 
       $a 
       [1]1
       $b 
       [1]3
       $c 
       [1]91
    
       lapply(x, FUN = sum) 
       $a 
       [1]1
       $b 
       [1]6
       $c 
       [1]5005
  • sapplyWhen you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.If you find yourself typing unlist(lapply(...)), stop and consider sapply.
       x <- list(a =1, b =1:3, c =10:100)#Compare with above; a named vector, not a list 
       sapply(x, FUN = length)  
       a  b  c   
       1391
    
       sapply(x, FUN = sum)   
       a    b    c    
       165005

    In more advanced uses of sapply it will attempt to coerce the result to a multi-dimensional array, if appropriate. For example, if our function returns vectors of the same length, sapply will use them as columns of a matrix:

       sapply(1:5,function(x) rnorm(3,x))

    If our function returns a 2 dimensional matrix, sapply will do essentially the same thing, treating each returned matrix as a single long vector:

       sapply(1:5,function(x) matrix(x,2,2))

    Unless we specify simplify = "array", in which case it will use the individual matrices to build a multi-dimensional array:

       sapply(1:5,function(x) matrix(x,2,2), simplify ="array")

    Each of these behaviors is of course contingent on our function returning vectors or matrices of the same length or dimension.

  • vapplyWhen you want to use sapply but perhaps need to squeeze some more speed out of your code.For vapply, you basically give R an example of what sort of thing your function will return, which can save some time coercing returned values to fit in a single atomic vector.
    x <- list(a =1, b =1:3, c =10:100)#Note that since the adv here is mainly speed, this# example is only for illustration. We're telling R that# everything returned by length() should be an integer of # length 1. 
    vapply(x, FUN = length, FUN.VALUE =0) 
    a  b  c  
    1391
  • mapplyFor when you have several data structures (e.g. vectors, lists) and you want to apply a function to the 1st elements of each, and then the 2nd elements of each, etc., coercing the result to a vector/array as in sapply.This is multivariate in the sense that your function must accept multiple arguments.
    #Sums the 1st elements, the 2nd elements, etc. 
    mapply(sum,1:5,1:5,1:5)[1]3691215#To do rep(1,4), rep(2,3), etc.
    mapply(rep,1:4,4:1)[[1]][1]1111[[2]][1]222[[3]][1]33[[4]][1]4
  • rapplyFor when you want to apply a function to each element of a nested list structure, recursively.To give you some idea of how uncommon rapply is, I forgot about it when first posting this answer! Obviously, I’m sure many people use it, so YMMV. This one is best illustrated with a user-defined function to apply:
    #Append ! to string, otherwise increment
    myFun <-function(x){if(is.character(x)){return(paste(x,"!",sep=""))}else{return(x +1)}}#A nested list structure
    l <- list(a = list(a1 ="Boo", b1 =2, c1 ="Eeek"), 
              b =3, c ="Yikes", 
              d = list(a2 =1, b2 = list(a3 ="Hey", b3 =5)))#Result is named vector, coerced to character           
    rapply(l,myFun)#Result is a nested list like l, with values altered
    rapply(l, myFun, how ="replace")
  • tapplyFor when you want to apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.The black sheep of the *apply family, of sorts. The help files use of the phrase “ragged array” can be a bit confusing, but it is actually quite simple.A vector:
       x <-1:20

    A factor (of the same length!) defining groups:

       y <- factor(rep(letters[1:5], each =4))

    Add up the values in x within each subgroup defined by y:

       tapply(x, y, sum)  
        a  b  c  d  e  
       1026425874

    More complex examples can be handled where the subgroups are defined by the unique combinations of a list of several factors. tapply is similar in spirit to the split-apply-combine functions that are common in R (aggregate, by, ave, ddply, etc.) Hence it’s black sheep status.

via plyr – R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate vs – Stack Overflow.

Advertisements
This entry was posted in R. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s