Script day: persistent memoize in bash

Script day: persistent memoize in bash Thursday, September 10th, 2015

One type of task that I often find myself implementing as a bash script, is to periodically generate some data and display or operate on it – maybe through a cron job, watch or simply a loop. Sometimes part of the process is an expensive computation (could be network based, IO intensive or simply subject to throttling by another entity). The way to deal with issues like that in modern programming languages is a caching technique known as “memoization” (based on the word “memorandum”) in the results of an expensive call is retained in memory after the first time, and returned for future calls instead of running the expensive calculation. We also need to clear the cache every once in a while, but that’s another issue.

So, how to implement in bash?

One of the main problems I normally have with just putting the data in a local variable and returning it, is that often the script is run to generate the data and then terminate, only to be started a second later. So we want the cache to be persistent – so the next process can take advantage of the cache.

Normally that type of cache is stored on the disk, but that has some disadvantages – you need to prepare a directory for the file, make sure permissions are correct and handle cleanup. Instead we’ll use the default memory file system on Linux which is available on /dev/shm. The content of that directory gets cleared on reboot as it isn’t really stored anywhere, but it is persistent enough for our use.

So how would the code look? Its pretty simple – assume we have a function that generates some output (to standard output):


function get_data() {
    expensive_process
}

So instead of running our expensive process each time, we can cache the results very simply:


function get_data() {
    cache_file=/dev/shm/get-data.cache
    if [ -f $cache_file ]; then
        cat $cache_file
    else
        expensive_process | tee $cache_file
    fi
}

So now get_data will run the expensive process once, and then return the cached results for every call after that. You probably want to recompute the expensive every once in a while without rebooting the system so you may want to have some logic to delete the cache file from time to time.

One way to clear the cache is to have get_data clear the cache itself after a some time has passed, by looking at the creation time of the cache:


function detect_worker_count() {
    cache_file=/dev/shm/get-data.cache
    if [ -f $cache_file ] && [ "$(( $(date +"%s") - 60 ))" -ge "$(date +"%s" -f $cache_file)" ]; then
        cat $cache_file
    else
        expensive_process | tee $cache_file
    fi
}

So now the expensive process would run at most every 60 seconds, which is basically what we wanted.

This entry was posted on Thursday, September 10th, 2015 at 17:55 and is filed under Articles, Programming, Script Day. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to “Script day: persistent memoize in bash”

May 9th, 2018 at 18:47 Michael D:

Take a look at http://hg.mwdiamond.com/bash-cache – it’s a Bash memoization library that transparently caches Bash functions.

Loading...

Reply to this comment
May 9th, 2018 at 19:23 Oded:

Looks interesting – thanks for pointing it out.

Loading...

Reply to this comment

Things n' Stuff Thoughts about the universe in general