Migrating Google Groups Archives Between Accounts

Recently I had to move a lot of data between an old GSuite Business account and a new GSuite Business account. Google support for such a migration is… well… can stand to be improved. The main pain points are email(1), Drive(2), Team Drives(3), Calendar(4), and for everything else – there is no migration. Google+ posts, password storage, Sites, Forms have to be recreated manually, and my pet peeve: Google Groups Archives.

If you’re not familiar with this awesome GSuite feature, which is based on the Google Groups usenet-like service, Google Groups for Business is a mailing list manager that in addition to distributing emails to recipients, also stores each email in an accessible archive – so new users can have access to old communications (this is great for accounting and support) and on top of that you can get forum like features with posting replies to topics and such. Unfortunately Google Groups has no export feature and because the archive is all about past communications, you can’t actually reproduce that data manually when you move to a new GSuite account.

I’ve been looking around for how to migrate the Google Groups for Business archive and there aren’t many good solutions, especially as Google Groups has no export feature at all. Frustratingly Google offers a “Google Groups Migration API” whose only API call is to import an RFC822 formatted email into a Google Group archive – so it is only meant for importing other email archives into Google Groups and not out of Google Groups.

The internet to the rescue!

A developer who goes by the moniker Icy on Github has created a very useful crawler(5) that can be pointed at a Google Group archive – even a private Google Group for Business archive – and download all the messages from the group in RFC822 format, which is exactly what we need for the import. I had to additionally write the import call, but the result is quite useful, even if not really easy to run (Icy’s crawler has a really funky invocation syntax).

Here’s how to operate it in order to migrate your private Google Groups for Business archive to another GSuite account:

Setup

  1. First thing you need to gain authorization to crawl the old group – so use your web browser to log in to your old GSuite account and browse to the group archive you want to migrate.
  2. Now you need to grab all the cookies with the authorization tokens for the Google Group and save them to a Netscape-style cookie jar file. For Firefox you can use the cookies.txt add-on which puts up a button that you can click on to download all the cookies to a correctly formatted file. For Chrome I use the identically named extension cookies.txt (not by the same author) which offers a very similar functionality.
  3. Lastly, take note of the group’s email “user name” and domain name – they both appear in the URL for the group’s archive page: the domain is at the beginning of the path after the “/a/” prefix, and the user part is at the end of the URL – for example, if the URL is “https://groups.google.com/a/example.com/forum/#!forum/info” then the group’s email address is “info@example.com“.
  4. Now that you have your cookie jar file, and know the group’s email address, download/clone Icy’s google-group-crawler linked above and cd into the crawler’s directory.
  5. You’d also want to download my import script and put it in the same directory. You can find the Google Group import script in this gist, or you can download it directly to the current directory using the command
    wget https://gist.githubusercontent.com/guss77/44369c39b6ce0cfa488ef476ea477c0b/raw/a51d76a44ac9cc0fdaa0784f2379b7d3fc5115a8/google-group-import.sh
  6. You should give execute permissions to both scripts (the crawler and importer), for example by running chmod a+x crawler.sh google-group-import.sh
  7. Lastly you’d need to get authorization to import the messages into the new group – to do that you’d need to go into Google’s OAUTH 2.0 API playground page at https://developers.google.com/oauthplayground/
  8. Scroll down the list of APIs in the playground (under the “Step 1” section) until you find the “Groups Migration API v1” category, open it and select the only URL there.
  9. Click the “Authorize APIs” button at the bottom of the list.
  10. You’d be directed to a “Sign in with Google” dialog – choose the correct Google user account (or sign in with the correct user account), then approve the request to grant authorization.
  11. You’d be returned to the playground page with the “Step 1” section closed and “Step 2” open and showing an authorization code. Click the “Exchange authorization code for tokens” button, and copy the “Access token” down in the box under it. You’d only have a few seconds to copy it before the UI closes “Step 2” and opens “Step 3”, but the access token is also shown on the right side in the response section of the playground’s “Request / Response” view.

Running

To run the crawler you have to specify the crawl parameters – authorization cookies file, domain name and “email user name” part separately using environment variables. The crawler documentation has some good examples to first set the variables, then call the crawler, but I prefer to run everything on a single line, including setting up the cookie jar file, so watch for that below.

Additionally, the crawler has a weird invocation syntax in that it has two parts – the first one crawls the group and retrieves references to all the messages, but doesn’t actually download anything, and creates a new bash script that will actually do the download. Again – I like to run all this in a single command.

To download all messages, you can run the following command, replacing “./cookies.txt” to the actual path to the cookie file you downloaded; “example.com” with the actual domain part of the group’s email; and “info” with the actual “user name” part of the group’s email:


_WGET_OPTIONS="--load-cookies ./cookies.txt --keep-session-cookies" _ORG=example.com _GROUP=info ./crawler.sh -sh | bash -s

This command will create a directory named as the group’s email address “user name” part and store all the information there, including all downloaded emails from the archive – under a subdirectory named “mbox“.

Once that done, you can run the importer – it takes as arguments the access token from the Google API playground, the email address of the new group you want to import the messages into, and the “mbox” directory created by the crawler. For example, to import the emails downloaded in the previous command into the group at “info@example.org” using the access token “1234”(6), you’d run the import command like so:


./google-group-import.sh 1234 info@example.org ./info/mbox

You’d need to pay attention to the output of the script as it will show the output from the Google Group Import API calls which will tell you if you have authorization problems.

After the script completes, you should be able to see all the messages in the new group’s archive. Note that it may be ordered slightly differently, and I’ve seen cases where it broke apart topics so a topic with a long list of messages might be broken into 2 or 3 topics with the same name.

You can import multiple groups in the same sitting – one after the other – just make sure they all use the same authorization for export and the same authorization for import, and that they don’t have conflicting email “user parts”, or make sure to delete the downloaded archive after each run. If the migration process takes more than an hour, you will likely need to refresh the export cookie jar file and the import access token.


  1. The built-in data migration tool in the admin console – which is the only data migration tool available – only moves emails, and is not 100% reliable with that, and doesn’t move rules or other settings []
  2. There are external tools available, I’m using Multcloud, but sharing is a problem – the best you can get is to get a copy of each shared file and Google Docs without any sharing information attached, so that breaks the sharing. Other tools may convert all your Google Docs to Microsoft formats []
  3. Which surprisingly works very well – you just share the team drive to a user on the new domain and they can move all the files to a new team drive they create on the new account. Sharing information is lost and you have to reshare, but documents retain comments by the old users and there is no duplications []
  4. You can manually export all calendars to ical format and then manually import them one by one. Also not 100% []
  5. written in Bash – I literally love this guy! []
  6. Access tokens are usually very very long, I’m using this obviously incorrect short example token to fit the command in the page []

Leave a Reply

 

 


Spam prevention powered by Akismet

%d bloggers like this: