The Beauty of CloudInit

Geek15 Comments

Over the past few months, for work and for side-projects, I have moved to deploying my web applications into the cloud. Why not. Its the all the rage right now. But it really was always the cloud to begin with. It used to be dedicated or virtual private servers (VPS). And that is how I looked at Amazon AWS in the beginning. I was always busy with the app itself so it took some time before I could get my hands around the platform and learn what new opportunities it provided.

Just before the recent recent AWS EC2 outage, I had been investigating RightScale, a cloud management application. My main issue was that they were hella expensive. I just couldn’t see it. But I took some time and played with their free edition and it got me thinking about how you are really supposed to deploy into the cloud. And how I wasn’t doing this.

So I set out to use various built-in utilities and hooks to implement something similar that satisfied my needs. I looked a bunch of tools from Chef/Knife to Puppet. Ultimately, none of those really got me excited. Different technologies. Licensing. Didn’t see the value with the limited set of things I needed to do. But again, the examples there showed me some of the ways that I should be thinking about what I wanted to do. Ultimately, here is what I came up with:

+ I want to be able to go from nothing to a server running my application in two minutes

+ I want to be able to repeat this process over and over with no intervention

+ I want to be able to apply the process to any Amazon Machine Image (AMI) of choice for upgrades/shifts; so I don’t want to bundle assets into a custom AMI

+ I want to be able to securely and reliably host/store my application’s assets

+ I may want to be able to parameterize parts of the setup so I can have different server roles, or deployment configurations

+ I want the whole process to be secure

Ultimately, this lead me to the “User Data” box that appears when you launch an EC2 instance. There is also a command line flag to pass in this data in the EC2 command line tools.

What goes in that box? Well, its processed by a module called CloudInit. At least if you are using a Ubuntu AMI or an Amazon AMI (which I am). There isn’t much documentation on the format though. That link earlier is really all I found. So I decided to start with that and see what I could come up with.

So basically CloudInit can process a shell script or other types of data and run it. Cool. But I don’t want to paste a bunch of shell script code in that box on the EC2 launch page. Hmmm. Turns out you can give that page a syntax like:




It will download those 2 files from their URLs and execute them in order. Execution goes back to the various CloudInit formats so it can be a shell script itself or another one of the formats. So I could put a series of scripts on another “host” server and pull them down and it would run them. I am starting to like the sound of this.

The next catch was that I wanted these scripts and other assets (the application server, my code, etc.) to be secured. The scripts might need to have some security token stuff in them too. So serving them from some random web server isn’t going to work. I looked back at AWS and saw a solution using S3. S3 is the “super data storage” feature of the cloud. You can put files up there and they are virtually indestructible. Great place to keep stuff safe. Plus, you don’t have to pay bandwidth charges between an EC2 instance and an S3 bucket. That should help a bit. So what I will do is put all my assets up in an S3 bucket and build URLs to fetch all of it via CloudInit. Sweet.

The next catch was that I didn’t want these files on S3 to be public but I needed some way of getting my EC2 server to actually download them. To do that, the EC2 server would need my AWS credentials and that is a bit of a no-no. Plus, there was this chicken-egg problem. If I put the credentials in a file on S3 to download, I would need credentials to download them. I don’t want to put the credentials into the User Data because that is stored in text in various parts of EC2. Thought I was at a dead end. Until I learned about “expiring signed S3 URLs”. These are URLs that you can generate for your assets that include your AWS ID, a signature, and an expires timeout. You can generate them for however long you want. Anyone with the URL can get the file. So what I can do is generate these URLs for every asset I want and give them say a 5 minute window. My boot only is going to take 2 minutes so that should be long enough. More on the URLs in a minute.

To generate the URLs, you might want a helper library. I used this Perl one. It had some sample code in it that got me thinking further. It had the ability to add stuff to an S3 bucket, iterate the bucket, generate signed URLs, and more. So what if I adapt that to look into some specific bucket and iterate over everything in there in sorted order. Then build signed URLs for everything it finds. The result was something like this:


So I run my new script and it generates this output which is in CloudInit “User Data” format for EC2 launches. Notice the signed URLs. Click them if you like. They are all expired by now. I added all these scripts to my bucket and ordered them by number so I could perform this server start-up in a stepwise manner. So it first installs the web server, then the application server, then configures everything, then starts it up. Sure there is a bunch more to the scripts but they are all mostly just common UNIX shell scripts that do what I need done.

A few things I ran into:

+ To debug, I looked into the cloud log at /var/log

+ I was trying to use the #cloud-config file in multiple files but it wasn’t working well. It seemed like the last #cloud-config ended up being the only one executed. So I ended up using only 1 #cloud-config to install packages and left everything else up to regular old shell scripts that I know how to write and debug. Plus, you can just test them by hand.

+ The process runs as root so you don’t need to mess with “su” or “sudo”

+ I had to remember to keep specifying full paths like a good scripter

I later moved all my assets that I wanted to install into a different S3 bucket and wrote a little section in that Perl script that created little fetch scripts using WGET to pull down the assets.

Took me a few tries but now I can run my script, get the output, launch an instance with it, and be up and running with a fully available server in minutes. You could imagine taking this a step further and tying that logic into scripts that detect the health or load of a machine as well. Will worry about that some day in the future.

Thought about posting public versions of the scripts but not sure that the interesting part here. Plus, I didn’t want to sanitize them or support them. The real key lessons here are:

+ Using S3 for asset storage

+ Using S3 for a script repository

+ Accessing S3 via signed timed URLs

+ Using CloudInit to setup your server in minutes

Hope that helps you in your cloud efforts.

  • Jvoclv

    What is a “cloud”? I get the “ran Green/Bear/SoBo at X:XX pace at XXX HR with X,XXX vertical”, but you lost me on this one. Completely.

  • fetheryfeet

    The way my feeble mind explains it to folks (when I worked in telecommunications) is it’s like the roads that connect your house to Brandon’s to mine, and everyone else’s. There are all sorts of resources along the network of roads that we have access to but don’t have to store at our homes. OR, it’s the dark thing above Boulder that drops snow on you on May 1st.

  • I enjoyed the write-up. I like being able to find a way to do what you want/need to be done. Coming from an admin perspective, I would probably lean towards making some sort of gold standard system, then cloning that entire image as needed. Rather than build and pull in post-install scripts and data. I don’t have any EC2 experience, though – so am not sure what challenges there would be with that approach. Got me thinking…

  • Thanks. The problem there was 2 fold for me: 1) you would have to embed the credentials into the image; 2) you wouldn’t be able to easily update your OS/packages/etc. without rebuilding your images. Here you can do both.

  • David

    Thanks! I’m working through this problem and would love to see an example of the CloudInit script you’re including.

  • Its lots of different shell scripts, if that’s the one you are asking for.  None of which are interesting in themselves.

  • David

    Makes sense. I’ve got cloudinit installing all the packages I need and setting permissions I need, but then I have to SCP my actual site files (php) to the instance. So, I was thinking of having cloudinit pull them over from S3, but I’m struggling a bit to figure out what the cloudinit syntax for that would be or whether I use another tool and a script to do it, etc. I was hoping that looking at your scripts would help me figure that out!

  • I used those “2-fetch” scripts to pull things from S3.  They are /bin/sh scripts that merely do a wget of the file I need and store it to temp.  A later script does the work to install the file (copy, permissions, etc.).  The wget uses the S3 time bound URLs as well.

  • Jeremy

    I came to the same conclusions and have been working on implementing just this myself.

    A problem that I’m running across is that the #include http://public_url works fine in user data for me, but a non-public #include http://signed_url doesn’t seem to work for me.  It’s showing up in /var/lib/cloud/data/user-data.txt, but I don’t see anything even attempted in /var/log/cloud-init.txt

    I’m glad to see that it is supposed to work, but I’m baffled at why it’s not working for me.

  • Jeremy

    nevermind.  It looks like I was using an expired URL.  There’s no message in the log if it gets a 403.

  • Jeremy

    note that if it’s a signed url, it’ll need to be https in your #include, rather than http.

  • Sol Kindle

    As far as the multiple #cloud-config files issue you ran into, it looks like there has been a bit of work done around allowing you to merge. You should be able to use multiple files now, although I have not tested it.

  • Charles Mulder


    I’ve run out of ideas, would appreciate some help.

    I’m starting and EC2 Ubuntu 12.04 instance and adding the following script to the user data:

    #!/usr/bin/env python

    import sys

    from boto.s3.connection import S3Connection


    AWS_ACCESS_KEY_ID = ‘MyAccessId’



    install = s3.generate_url(300, ‘GET’, bucket=AWS_BOOTSTRAP_BUCKET, key=’bash1.txt’, force_http=True)

    config = s3.generate_url(300, ‘GET’, bucket=AWS_BOOTSTRAP_BUCKET, key=’cloud-config.txt’, force_http=True)

    start = s3.generate_url(300, ‘GET’, bucket=AWS_BOOTSTRAP_BUCKET, key=’bash2.txt’, force_http=True)





    After the instance has started, I can right click on the instance and View Sys Log.

    I can see the following near the bottom:

    Generating locales…

    en_US.UTF-8… done

    Generation complete.


    I can run wget from the instance on the provided url’s and see the contents of the txt files.

    Why aren’t the scripts added via #include working? Any help would be appreciated.

    Kind regards,


  • I was thinking of having cloudinit pull them over from S3, but I’m

    struggling a bit to figure out what the cloudinit syntax for that would

    be or whether I use another tool and a script to do it, etc. I was

    hoping that looking at your scripts would help me figure that out!

  • I was thinking of having cloudinit pull them over from S3, but I’m
    struggling a bit to figure out what the cloudinit syntax for that would
    be or whether I use another tool and a script to do it, etc. I was
    hoping that looking at your scripts would help me figure that out!