# Aggregating Data in R

Like any multi-paradigm language, there are a number of options for looping in R. R’s declarative constructs provide a convenient mechanism for aggregating data. This Experimental-RLab aggregates the data before calculating the correlation between two variables. In SQL, this would look like:

```SELECT month,
fn_correlate(Ozone, Wind) AS 'Correlation'
FROM airquality
GROUP BY month;
```

```Method 1: Imperative Loops
```
```aggregateC <- function (x) {

results <- c()    # initialise an empty vector

for (lclMonth in 1:12) {

# clean the data, and extract the current month
lclData <- x[complete.cases(), ]
lclData <- lclData[lclData\$Month == lclMonth, ]

# calculate the correlation, and extend `results`
results <- c(results, cor(lclData\$Ozone, lclData\$Wind))

}
return ( Filter(Negate(is.na), results) )
}
```

Things I like about the imperative approach:

• it is simple and there is clarity to the code
• it is easy to grasp what this function is doing
• as a “practical” method, imperative loops scale well with increasingingly large datasets (on a single machine with ample memory)

• in general, I am not crazy about explicit loops
• explicit looping requires you to think about individual elements, as opposed to the dataset as a whole.
• Imerpative concepts do not scale in my head! I find it more difficult to grasp a problem this way when it gets bigger
• practically, the imperative paradigm does not distribute easily, which is important when you are considering very large problems.

```Method 2: Declarative Loops
```
```aggregateC <- function(x) {

# clean data
lclData <- x[complete.cases(x), ]

# aggregate the data by month, and apply cor() to each aggregate
aggregateData <- split(lclData, lclData\$Month)

results <- sapply(aggregateData, function(z) cor(z\$Ozone, z\$Wind))

return ( Filter(Negate(is.na), results) )
}
```

Things I like about declarative loops:

• conceptually, the code addresses the entire dataset (not individual elements)
• conceptually, a declarative approach scales better in my head (simplifies the way I think about a problem)
• more likely to scale gracefully to large distributed applications