Script day: persistent memoize in bash

Thursday, September 10th, 2015

One type of task that I often find myself implementing as a bash script, is to periodically generate some data and display or operate on it – maybe through a cron job, watch or simply a loop. Sometimes part of the process is an expensive computation (could be network based, IO intensive or simply subject to throttling by another entity). The way to deal with issues like that in modern programming languages is a caching technique known as “memoization” (based on the word “memorandum”) in the results of an expensive call is retained in memory after the first time, and returned for future calls instead of running the expensive calculation. We also need to clear the cache every once in a while, but that’s another issue.

So, how to implement in bash?

(more…)

Script Day: Cloud-init for MS-Windows, The Poor Man’s Version

Thursday, August 20th, 2015

Cloud-init is a Linux technology that allows easy setup and automation of virtual machines. The concept is very simple – the VM infrastructure provides some way of setting some custom data for each virtual machine (many providers call this “user data”), and when the operating system starts the cloud-init service reads that configuration, loads a bunch of modules to handle various parts and let them configure the system. As a user it is very convenient – you write a setup scenario using the variety of tools offered by cloud-init, you can store the scenario in a source control to allow to develop the scenario further, then just launch a bunch of machines with the specified scenario and watch them configure themselves.

The situation is much worse on the MS-Windows side of the fence: want to have an MS-Windows server configured and ready to go? Start a virtual machine, connect to is using RDP and Next, Next, Finish until your fingers are sore. Need to deploy a new version? either retrofit an existing image (again, manually) and risk deployment side effects, or do the whole process again from scratch.

Here’s a script to try to help a bit with the problem – at least on Amazon Web Services: a poor man’s cloud-init-like for MS-Windows server automation.

(more…)

Fix another ‘curl|sh’ bogus installation – Heroku

Friday, June 19th, 2015

The Heroku toolbelt (which I don’t remember if its mentioned in the “curlpipesh” tumbler mentioned by Amir in his response to my “Fix RVM” post), is a CLI to manage applications on the Heroku PaaS platform. As is common (and horrible) in this day and age they also offer a ‘curl|sh’ type install on their home page.

While the Debian/Ubuntu specific installer is not entirely horrible – it basically adds the Heroku Toolbelt debian repository to the APT source list, updates the package list and installs it, the “standalone” version is as horrible as it can get: download an unsigned binary from the internet, get root permissions and then do something.

For users of Fedora and other distributions, or just Ubuntu users who don’t like installing external repositories on their system, here is a simpler method to get the Heroku Toolbelt running on your system without root permissions and downloading scripts off the internet:

(more…)

Fix RVM “run script from the internet to install”

Friday, May 8th, 2015

On Wednesday I complained about the latest UN*X fad of installing software by running scripts from the internet, without any regard to how your operating system handles software installation.

Docker, that I complained about last time, at least has a script that takes into account the local software management solution (uses apt for Ubuntu, yum for Fedora, etc), but RVM – the Ruby Version Manager which is a popular tool among rubyists everywhere, just downloads a bunch of executable stuff (granted, most of it are scripts, but the difference is lost on most people) into arbitrary location on your file system. At least it doesn’t install system software, oh wait – it does.

While I can’t help with RVM’s desire to install system level software (that it actually needs because one of the things you want RVM to do for you is to compile ruby versions from source), I can try to help you figure out how to install RVM where you want it and use it how you want it.

(more…)

Docker and the horrible “one line installation” fad

Wednesday, May 6th, 2015

One of the weird things that sane (or some would say “old skool”) system administrators complain about lately is that with the rising popularity of UN*X systems (mostly Mac OS X and Linux) in the world, and in particular in the software development world, people using UN*X system want less and less to understand how to manage their systems and the culmination is the

to install this complicated system level software, just copy and paste simple wget command to your terminal

with Docker being the most horrible example of that behavior. No sane person (who understand UN*X) will ever think that installing Docker by feeding the content of a URL to bash is a good idea, but for some reason this is the documented and recommended way by the Docker people. Other examples are abound, but lets concentrate on fixing the Docker scenario.

(more…)

Script Day: Upload Files to Amazon S3 Using Bash

Monday, May 26th, 2014

Here is a very simple Bash script that uploads a file to Amazon’s S3. I’ve looked for a simple explanation on how to do that without perl scripts or C# code, and could find none. So after a bit of experimentation and some reverse engineering, here’s the simple sample code:

(more…)

Script Day: SSH to a host behind a NAT

Sunday, April 27th, 2014

I use SSH daily to work with different remote services, and its always a very straight-forward process… unless the remote server you want to work with is on LAN somewhere behind NAT(1). When you need to access such an internal server, the only option is to SSH into the firewall(2), and then SSH again to your server of choice.

But there’s a better way, and you don’t even have to fiddle with the firewall server!

(this is not actually a script, though minimal text editing is required)

The solution is actually quite simple: set up an alias in your .ssh/config file that you can use to call the remote server when you are outside the LAN (if you are inside the LAN its better to access it directly), and for that alias we will set up a ProxyCommand that will tell SSH to first access the firewall server and open a tunnel to the target LAN server.

It looks like this:

Host remote-alias
ProxyCommand ssh firewall-user@firewally-server nc lan-server 22

This set up works best if your access to the firewall-user account is without password or passphrase (using an SSH private key that is either without passphrase or already loaded in the agent), then the login is as streamlines as a direct access – but the worst is that you’d need to type in two passwords.

Enhanced by Zemanta

  1. router that does Network Address Translation so the servers address is not accessible from outside the LAN []
  2. or some other server that has legs both inside and outside the LAN – I’m using a DNATed server, what most off-the-shelf routers incorrectly call “DMZ“ []

Recovering InnoDB Tables In MySQL 5

Saturday, April 27th, 2013

The following procedure can be used to recover InnoDB database tables from a backup of a MySQL server that had the “innodb_file_per_table” setting but all the metadata (in the “ibdata*” files) was lost – for example due to a server crash.

The process involves two steps:

  1. Recover the table structure from the .frm files
  2. Recover the data from the .ibd files (InnoDB tablespace)

There is a lot of copying the backed up files over and over to the MySQL datadir, so its useful to have the backup available on the database server machine. In my setup the backup for the databases was copied to the directory “backup” under the database’s datadir, so – for example – for the table somedb.sometable there exist files somedb/backup/sometable.frm and somedb/backup/sometable.ibd.

Additionally the process for recovering the table structures creates a lot of superfluous metadata in the InnoDB data files, so after the first stage I’m going to destroy the InnoDB data files and let the InnoDB engine re-generate them – as a result any existing InnoDB tables will be destroyed. This is important so I’ll iterate: using the procedure detailed here will destroy any existing and working InnoDB databases! So this procedure is useful to recover a destroyed database server to a new server or as a temporary measure on a temporary server to be able to dump the data to SQL files that will later be loaded into an existing server.

There likely a way to do this which is less heavy handed – for example, check out this article from Percona’s MySQL blog, but for my purpose this is enough.

(more…)

Script Day: Automatically backup your EC2 instance using snapshots

Thursday, December 27th, 2012

The following script I install as a cron job on Amazon AWS virtual machines I deploy, to allow them to backup themselves automatically. The script uses the EC2 management utilities that are normally available on “Amazon Linux” installations (and can be easily installed on other Linux distributions) to create EBS snapshots of the current mounted root EBS volume(1).
(more…)


  1. I don’t expect this script to work for instances that have an instance-stored root device, but I don’t expect to encounter these any more []

Script Day: find the oldest file in a directory structure

Monday, November 14th, 2011

This piece of script came in handy when I wrote a utility that “recycles” space on a logging partition: before log rotation archives the current log file, we move some old log files (depending on some archive freshness policy) to a remote storage that archives older files.

The problem is that the “old archive storage” also has limited disk space and I got fed up managing the archive by hand. The solution I came up is to scan the hierarchy of  log files in the storage (logs are stored hierarchically according to origin and type) and delete old files until I have enough room to move some newer files in. That way the “old archive storage” is always kept full and keeps as much back-log as possible and does this automatically.

The piece of code that determines which files we want to delete works like this:

  1. Use find to list all the files in the directory structure
  2. Pipe it to perl and collect all the file names in a list
  3. Use perl’s sort operator to compare the modification times of each file in the list and show them in the order (i.e. oldest first)
  4. Use head to get just the first file

So it looks like this:

find /mnt/httpd_back/ -type f | perl -nle 'next unless -f; push @files, $_; END { foreach $file (sort { @a=stat($a); @b=stat($b); $a[9] <=> $b[9] } @files) { print $file; }}' | head -n1

Note: normally we use head to get some initial output and terminate the process early before it does more costly work – when head has enough data it terminates the pipe sending SIGPIPE to the upstream process and that usually terminates the process that generates the data. In this case – and in all other cases involving sort – the upstream process buffers all the data in its own memory before outputting anything, so it can sort everything, and using head here is just a filter to get what I want and does not actually save me from doing all the work. I could have easily done the same thing inside the perl script itself by replacing the block of  print $file; with print $file; last; – this has the same effect as using head, because head will send SIGPIPE to perl after getting the first print and will terminate it. Deciding which way you want to go is probably more about readability of the code and I prefer my original version because its easier to read to non-perl specialists.

I can then just remove that file, see if I have enough room to move in the newer log file and if no – repeat the process.

This would work well, I believe, but it may be inefficient if I find a bunch of small files and I want to copy in a large file. So what I did next is to take advantage of the fact that all the log files I have are named using the following simple format:

<service>-<type>_log-<year><month><day>.gz

and that allows me to easily find all the log files that record the same day and eliminate them at the same time. Subsequent moving of additional files will likely succeed because I cleared out all the log files of an entire day. If not, I can always go and clear up another day’s worth of logs.

Enhanced by Zemanta

Spam prevention powered by Akismet