This document explores methods to calculate the sum of integers ≤ N divisible by 3 or 5. It includes R implementations, performance comparisons, and mathematical optimizations.
I’m exploring the following problem as a potential exercise for students in my Scientific Computing course:
Problem
Calculate the sum of all positive integers smaller or equal to N which are divisible by 3 or 5. For example, if N = 20, then the answer is 3+5+6+9+10+12+15+18+20 = 98
Unit tests
test <-function(.f) {structure(list( tinytest::expect_equal(.f(3), 3, info ="Input: N = 3"), tinytest::expect_equal(.f(-100), 0, info ="Input: N = -100"), tinytest::expect_equal(.f(20), 98, info ="Input: N = 20"), tinytest::expect_equal(.f(6), 14, info ="Input: N = 6"), tinytest::expect_equal(.f(0), 0, info ="Input: N = 0") ),class ="tinytests" )}
Code smell
Here is a horrible way to implement this in R:
sum_multiples_loop <-function(N) { out <-0for (i in1:N) {if (i %%3==0|| i %%5==0) { out <- out + i } }return(out)}sum_multiples_loop(20)
[1] 98
It works, but there are a ton of problems that make this code really slow and we can make it several thousand times faster by the end of this post. But it is also a good idea to test more than one case, and since I don’t want to copy paste calls to each function for many different scenarios, I have written a small suite of unit tests:
test(sum_multiples_loop)
----- FAILED[data]: <-->
call| test(sum_multiples_loop)
diff| Expected '0', got '-2418'
info| Input: N = -100
Showing 1 out of 5 results: 1 fails, 4 passes fubar!
Not surprisingly, our function fails when the input is less than 0, so this is something that we need to address later.
Standard vectorization
We can of course vectorize a lot and simplify the whole thing dramatically by using standard R vectorization. Instead of looping over positive integers from 1 to N, we create a vector x which stores those integers. Then we take advantage some vectorizations
sum_multiples_vectorized <-function(N) {1if (N <=0) return(0)2 x <-seq_len(N)3 is_divisible <- x %%3==0| x %%5==04sum(x[is_divisible])}test(sum_multiples_vectorized)
1
Take care of edge cases with an early return; alternative is to give an error
2
seq_len(N) is a safer alternative to 1:N (doesn’t matter here because we return early if N is not-positive, but good practice in general)
3
Boolean operations on vectors produce vectors of TRUE and FALSE values, so no need for loops and if statements
4
Subseting a vector with a boolean vector will return a new vector with only those values for which is_divisible is TRUE; sum is vectorized and will return the sum of all elements in x
All ok, 5 results fubar!
We can now compare the performance of the two functions. There are many ways to do this. Here is one popular option:
of course, in all cases we can parameterize the function rather than hard-coding it. The challenge would be to find the least common multiple. You could use the function pracma::Lcm
sum_series_divisors <-function(N, a, b) {if (N <=0) {return(0) } triangular_number <-function(n) { n * (n +1) /2 } lcm <- pracma::Lcm(a, b) a *triangular_number(floor(N / a)) + b *triangular_number(floor(N / b)) - lcm *triangular_number(floor(N / lcm))}test(\(x) sum_series_divisors(x, a =3, b =5))
All ok, 5 results fubar!
which incurs a cost of course, but that cost is minuscule relative to the other approaches above:
bench::mark(sum_multiples_math(1e6),sum_series_divisors(1e6, a =3, b =5))
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr><bch:t><bch:t><dbl><bch:byt><dbl>1 sum_multiples_math(1e+06) 615ns 697ns 1326018. 11.5KB 0
2 sum_series_divisors(1e+06, a = 3… 5.7µs 6.4µs 151825. 432.8KB 60.8
but let’s not use any libraries and write our own lcm function…