Archive for November 14th, 2011

Script Day: find the oldest file in a directory structure

This piece of script came in handy when I wrote a utility that “recycles” space on a logging partition: before log rotation archives the current log file, we move some old log files (depending on some archive freshness policy) to a remote storage that archives older files.

The problem is that the “old archive storage” also has limited disk space and I got fed up managing the archive by hand. The solution I came up is to scan the hierarchy of  log files in the storage (logs are stored hierarchically according to origin and type) and delete old files until I have enough room to move some newer files in. That way the “old archive storage” is always kept full and keeps as much back-log as possible and does this automatically.

The piece of code that determines which files we want to delete works like this:

  1. Use find to list all the files in the directory structure
  2. Pipe it to perl and collect all the file names in a list
  3. Use perl’s sort operator to compare the modification times of each file in the list and show them in the order (i.e. oldest first)
  4. Use head to get just the first file

So it looks like this:

find /mnt/httpd_back/ -type f | perl -nle 'next unless -f; push @files, $_; END { foreach $file (sort { @a=stat($a); @b=stat($b); $a[9] <=> $b[9] } @files) { print $file; }}' | head -n1

Note: normally we use head to get some initial output and terminate the process early before it does more costly work – when head has enough data it terminates the pipe sending SIGPIPE to the upstream process and that usually terminates the process that generates the data. In this case – and in all other cases involving sort – the upstream process buffers all the data in its own memory before outputting anything, so it can sort everything, and using head here is just a filter to get what I want and does not actually save me from doing all the work. I could have easily done the same thing inside the perl script itself by replacing the block of  print $file; with print $file; last; – this has the same effect as using head, because head will send SIGPIPE to perl after getting the first print and will terminate it. Deciding which way you want to go is probably more about readability of the code and I prefer my original version because its easier to read to non-perl specialists.

I can then just remove that file, see if I have enough room to move in the newer log file and if no – repeat the process.

This would work well, I believe, but it may be inefficient if I find a bunch of small files and I want to copy in a large file. So what I did next is to take advantage of the fact that all the log files I have are named using the following simple format:

<service>-<type>_log-<year><month><day>.gz

and that allows me to easily find all the log files that record the same day and eliminate them at the same time. Subsequent moving of additional files will likely succeed because I cleared out all the log files of an entire day. If not, I can always go and clear up another day’s worth of logs.

Enhanced by Zemanta