This post illustrates how you can have a single script on your workstation (yes, of course, it’s a Mac) that provisions a new Windows EC2 instance and bootstraps it using Opscode Chef – written from the point of view of someone who is used to doing this all the time with ease for Linux instances using the knife-ec2
gem. I’ll assume the reader:
- has a basic working knowledge of Opscode Chef
- is using Hosted Chef
- already has a working chef-repo workstation with
knife
configured - already has (or can figure out)
knife-ec2
installed and configured with AWS API credentials - is on their own for creating actual cookbooks and roles to configure their Windows instances
This is fairly easy to do with Linux instances. Using knife ec2 server create
and a bunch of parameters, a single command provisions a new Linux instance in EC2, waits for it to come up, connects to it over SSH using the specified key pair, installs chef-client
, and bootstraps the node using the specified run_list. Done.
However, things are not so simple for Windows Server instances.
Working with Windows instances in EC2 using Chef presents a few hurdles:
- Windows doesn’t natively support SSH.
- The
knife ec2 server create
command waits for the instance to accept SSH connections. There is no option to circumvent this. - Windows takes forever to provision.
- Windows instances typically get a random Administrator password generated for them that takes over 15 minutes to retrieve.
- The
knife-windows
gem provides aknife bootstrap windows winrm
command that can bootstrap an existing Windows instance with Chef, but cannot provision a new instance. - The
knife bootstrap windows winrm
command requires WinRM to be configured on the instance (which it isn’t be default), requires the Administrator password of the instnace (which defaults to a random value), and requires the public IP address of the instance (which we don’t know until the instance is up).
Below I’ll provide a simplified example script that demonstrates how we can hack together a few techniques to create an all-in-one solution for bringing up new Windows Server nodes in the Amazon cloud. Other than knife
and all the other pre-requisites mentioned above, you’ll need to make sure you have the following Ruby gems installed:
1 2 |
|
The Tricks
The Ruby script below uses a few nasty tricks to make this all work:
First, we write a temporary “user data” file to pass to the EC2 API. This gets executed by the new instance when it is first provisioned. There are two tricks we need to stick into the user data file:
- A BAT
<script>
that configures WinRM (Windows Remote Management), which is what we’ll use to connect to and bootstrap the instance. - A
<powershell>
script that sets the Administrator password to a value we define. This makes it so we don’t have to wait 15+ minutes for EC2 to generate a password for us, and retrieve it manually through the GUI.
Then, we use knife ec2 server create
to provision the Windows instance to specification, passing in that user data file. This works great for provisioning the instance, but since it was not really designed for Windows and WinRM, there are two tricks we have to employ here:
- Execute the
knife
command in a sub-process and read itsSTDOUT
until we see it output the new instances public IP address. We’ll grab that and save that for the next step. - This is also our cue to bail out of
knife ec2 server create
. If you were doing this manually, you’d hitCTRL-C
here, whichknife
is saying “Waiting for sshd” (which is never going to come up). We do that by sending the sub-process aSIGTERM
signal.
Now, we can’t just move on to bootstrapping the node, because it is still booting up, and WinRM may not be configured yet. The trick here is to create a TCP socket to the WinRM port, using the IP address we aquired in the previous step, and wait for it to connect. If it fails to connect, try again until it does. By the time this succeeds, we know WinRM is up and accepting connections. However, we don’t know if the rest of the system is ready. Moving on to the next bootstrapping step immediately will run into intermittent errors. I’ve seen this manifest as an authentication erorr, presumably because we tried to bootstrap over WinRM before the PowerShell script set the password. There may be other mysteries of the Windows universe lurking here as well. My solution: sleep for two minutes. Lame, I know…but so far it is the only thing that has reliably worked.
Finally, we can bootstrap the new running Windows instance with the knife bootstrap windows winrm
command, using the IP address we acquired, the password we specified in the user data, and the other knife
params we want to use such as the run_list and environment.
The Script
Here is a stripped down version of this script demonstrating all these tricks. As you can see, all the custom configuration is hard-coded in constants at the top of the script. You would obviously fill in your own information however you like – via command-line params, interactive prompts, config files, etc.
Big thanks to my colleauge Jeremy Groh who paired through this with me and did the bulk of the heavy lifting on the Windows side, especially with the WinRM and password-reset parts.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|