Caching in Excel using VBA – Coffee with Dennis

If you’ve been using Excel for a long time, you probably had written a macro or a VBA script.

This blog isn’t about writing a VBA script, this blog would assume you already know how to.

Instead, we’ll explore how and why to implement cache or caching in your VBA codes.

What is caching?

To me, caching is storing some data in memory so that it can be accessed very quickly later on (emphasis on “accessed very quickly”).

Why would you need caching?

In the context of Excel, you’d want to at least consider caching if you have a function that performs an expensive calculation. Or, maybe you integrated your Excel to an external data source like a database or a web service.

The other factor to consider is that that routine or function must be “pure” — meaning, given the same inputs, it returns the exact same value all the time.

To not overcomplicate our blog, we’ll just pretend to be using a factorial function that we wrote. This function returns the exact same value for a specific number so it’s a good candidate for caching. We can also opt for a simple function like adding 2 numbers, but won’t really be that expensive compared to factorial.

Setting up

Open the VBA IDE by pressing Alt and F11.

Next, click the “Tools” menu the under it click on “Reference” — we need to add a library that will provide storage of data.

A new window would pop up. From the list, scroll down until you find “Microsoft Scripting Runtime“.

Once you locate it, click on it then press OK.

VBA Code

Now let’s write the factorial function with caching.

To make our code reusable, add a (basic) Module to the project. That’s where we’ll write the factorial function.

Add (Basic) Module

You can do this via the “Insert” menu…

… or right-click on the project then click “Insert”:

Factorial Function

Since we’d want to store computed factorial values, we’ll add a module-level variable called “factorials” and auto-initialize it as a Scripting.Dictionary (this way, can just call the factorial function without needing to explicitly initialize it):

Private factorials          as New Scripting.Dictionary

Write the actual function as you’d normally do, we’ll just insert some logic there for storage and retrieval.

First, we convert the input value (num) into a string and we’ll use it as the “key” in our dictionary.
Next, before we even calculate the factorial, we always check if the factorial for the given number has already been computed — that is, if it’s already stored in our factorials dictionary. If it’s not in the dictionary, we calculate it.
After calculation, we store the result of it in the dictionary. The “key” is the input number, and the value is the factorial of it.

Our final code looks something like this:

Before we perform the test, make sure to open the Immediate Window in the VBA IDE (press Ctrl and G).

On my test, I placed the Excel and VBA window side by side so we can see in real-time, the printed messages in the Immediate window.

Test #1

To test it, we’ll switch to the Excel worksheet and type (unique) numbers 1 to 5 on column B.

Next, we’ll go to C1 and type:

=factorial(B1)

When we press Enter, it will print something on the Immediate window.

We then just copy the formula downwards to compute up to factorial of 5. You’ll see that all these unique calls invoked the calculation of factorial:

Test #2

For this, we’ll just repeat what we did in Column C, but this time on Column E (essentially invoking the exact same set of calls).

Now for this run, you’ll see that all five calls never invoked the calculation routine, which means the value we got this time immediately came from the cache.

When NOT to use caching?

So why would you not do this all the time for all your functions?

To understand when (or why) not to use caching, we have to go back and understand why we’d need it. Otherwise we’re just guessing.

Here are some of the reasons when NOT to use cache/ing:

Don’t use cache if you’d rarely pass repeating parameters to a function. In our factorial example, if you’re going to call it within a loop (say 1 to 30) and nothing repeats, caching will actually be counterproductive. The extra time to check the cache and to store new values would mean longer execution time, than if you just did it directly.

If you’re going to call the function once then there’s no point adding caching to your function.

If the function doesn’t do anything expensive or time-consuming, then you don’t need to use caching.

If you have function that accepts no parameter, it’s likely not a good candidate for caching (since there’d be no “unique key” to store).

Closing

As they say “better to have it and not need it, than not have it and need it (later)” —- caching and other techniques are like that.

Just because you learned a new technique doesn’t mean you apply it on everything.

You have to know what you’re doing.
You have to know why you’re doing it.
You should have considered other approaches and proved that it’s the one that makes most sense.
You have to know when you should stop using it (in our example, if there’s rarely a repeating number, it would be better to strip off the caching mechanism).