Categories Industry Insights

Customizing Chef Bootstrap Templates

At AppNeta, we use Chef to set up EC2 instances for the purpose of testing our TraceView instrumentation modules.  TraceView has agents for Java, .NET, Python, Ruby, and PHP, and one of the most important requirements for all of these is that they’re easy to deploy (typically less than 5 minutes). This means it should be a single step across OSes, across versions, across VM versions, and across deployment methods. To make sure everything works, we extensively test all of these combinations with Chef.

Specifically, we use a combination of the knife-ec2 plugin to spin up and bootstrap Chef onto an instance, and then the knife-solo plugin to sync the chef repository onto the node and launch a chef-solo run. This configures the node with the everything needed for that particular test, which we then run our installer against. Unfortunately, many times setting up the stack is more complex than testing the install itself!

While most of these test platforms can be bootstrapped via the default chef-full.erb template which uses the omnibus installer, there are a few cases where a customized template is either needed or expedient.  I’ll talk about a few of them below.

Note that all the example customized templates below are geared toward solo runs. We only need to configure one of each node type, so introducing the additional complexity of a client/server architecture simply wasn’t worth it. Unlike the client/server bootstrap process that installs Chef installation, sets up the node configuration, and starts a chef run, these solo bootstraps just install Chef and rsync, then leave the rest to knife-solo’s cook command.

Ancient OS Support

Head over to our supported OS list, and you’ll see a few that are not supported by the Chef omnibus installer, one example being Debian Etch.  Fortunately we don’t have to start from scratch, since the chef installation (at least for now) comes with a few bootstrap templates that show the way.  For Etch, I started with the ubuntu10.04-gems.erb template and made a few needed modifications.  The complete template can be seen here.

Update APT Repository

The Etch AMI has outdated repository setup, so need to rip them out and put in working ones.

<%# need to use archive repo sources for etch -%>
rm /etc/apt/sources.list
echo "deb http://archive.debian.org/debian etch main contrib non-free" >> /etc/apt/sources.list
echo "deb http://archive.debian.org/debian-security etch/updates main contrib non-free" >> /etc/apt/sources.list
echo "deb http://archive.debian.org/debian-backports etch-backports main" >> /etc/apt/sources.list

<%# add backports signer key then update -%>
apt-key adv --keyserver pgp.mit.edu --recv EA8E8B2116BA136C &&
apt-get update

Install Chef via Ruby Gem

Since there is no Chef apt package for Etch, we’ll install Chef from rubygems.  However, need to do a little ruby environment setup first.

if [ ! -f /usr/bin/chef-client ]; then
  # get rid of existing ruby
  apt-get remove -y --purge ruby ruby1.8 ruby1.8-dev libruby1.8 libopenssl-ruby libopenssl-ruby1.8
  # install ruby 1.8.7 from backports repo
  apt-get install -y libruby1.8= ruby1.8= ruby1.8-dev= libopenssl-ruby1.8= rdoc ri irb build-essential wget ssl-cert curl
  # set ruby 1.8.7 as default ruby
  update-alternatives --install /usr/bin/ruby ruby /usr/bin/ruby1.8 100
  # install rubygems
  wget <%= "--proxy=on " if knife_config[:bootstrap_proxy] %>http://production.cf.rubygems.org/rubygems/rubygems-1.8.10.tgz -O - | tar zxf -
  (cd rubygems-1.8.10 && ruby setup.rb --no-format-executable --no-rdoc --no-ri)

  # install chef
  gem update --no-rdoc --no-ri
  gem install ohai --no-rdoc --no-ri --verbose
  gem install mime-types --no-rdoc --no-ri --verbose --version 1.25
  gem install chef --no-rdoc --no-ri --verbose <%= bootstrap_version_string %>

It should be pretty clear from the above snippets that the bootstrap template is basically one big shell script, which is first wrung through the ERB template engine so that it has access to some predefined Chef variables.

Locking to Point Release

We ran into a situation where the instrumentation modules needed to be built (and thus tested) on CentOS that is locked to the 6.4 point release.  On the CentOS AMIs we use, this can be done by disabling all repositories and enabling only the “vaulted” one for the release — before any packages are installed on the system, since an installation could drag in updates from 6.5.

This time I based the custom template off of the default chef-full.erb and just added in a section to deal with the yum repository right after the Chef install.  The complete template can be seen here.

<%# ensure system is locked to 6.4 -%>
if [ ! -f "$new_repo" ]; then
  echo "Locking to CentOS 6.4 vault repo..."
  mkdir -p $save_dir
  if ls ${repo_dir}/*.repo >/dev/null 2>&1 ; then
     cp -n ${repo_dir}/*.repo $save_dir
     rm -f ${repo_dir}/*.repo
  cat > $new_repo <<"EOF"
# Generated by Chef during bootstrap
name=CentOS-6.4 - Base
 yum makecache
 echo "Done. System locked to CentOS 6.4"

Our setup of Windows stacks via Chef is still a work in progress, one early roadblock was whether / how to do solo runs on Windows.  The pieces were all there:

  • use the knife-windows plugin to create and bootstrap a Windows EC2 instance
  • install Cygwin to get an SSH server and rsync on the Windows node
  • then just use the knife-solo cook command as usual to sync cookbooks and start a solo run

However, there were a few gotchas getting things to cook under Cygwin.  Maybe that’s why the knife-solo docs mention WinSSHd instead?  But Cygwin is free and comes with both sshd and rsync, so I’ll forge ahead and deal with the issues in a customized template.  What we have is working for us so far, but we’re keeping an eye on exactly what the tradeoffs between these two choices is. The complete template can be seen here, but here are a few of the problems we’ve dealt with so far.

Pathname Confusion

knife-solo constructs a command like chef-solo -c ~/chef-solo/solo.rb -j ~/chef-solo/dna.json to launch a solo run.  But when this is run under Windows > Cygwin > ssh we get a few problems:

Administrator@WIN-G3UP33FLLO5 ~
$ chef-solo -c ~/chef-solo/solo.rb -j ~/chef-solo/dna.json
-bash: chef-solo: command not found

OK, let’s try specifying the absolute path to chef-solo, and use the .bat version::

Administrator@WIN-G3UP33FLLO5 ~
$ /cygdrive/c/opscode/chef/bin/chef-solo.bat -c ~/chef-solo/solo.rb -j ~/chef-solo/dna.json
[2014-02-28T21:01:03+00:00] WARN: *****************************************
[2014-02-28T21:01:03+00:00] WARN: Did not find config file: /home/Administrator/chef-solo/solo.rb, using command line options.
[2014-02-28T21:01:03+00:00] WARN: *****************************************
[2014-02-28T21:01:03+00:00] FATAL: Cannot load configuration from /home/Administrator/chef-solo/dna.json

Turns out under Cygwin the drives are prefixed with /cygdrive, so C: becomes /cygdrive/c.  But Ruby under Windows still goes by C:, so to really make it work the command needs to look like /cygdrive/c/opscode/chef/bin/chef-solo.bat -c C:/cygwin/home/Administrator/chef-solo/solo.rb -j C:/cygwin/home/Administrator/chef-solo/dna.json

How to get knife-solo to construct this command without patching it?  Well, it has kindly provided the –startup-script option, with which you can specify a file that will be sourced right before the solo run.  Suppose this startup script defined a shell function that wraps the call to chef-solo and fixes the path, it should fix my problems.  This is where the custom bootstrap template comes in, this time base off of knife-windows’ default windows-chef-client-msi.erb.  I’ll have it create a startup script after Chef is installed:

@echo Set up Cygwin environment for user...
@rem Part 1: create a file with helper function and env var
> c:\cygwin\home\administrator\chef_helper (
echo.function chef-solo^(^) {
echo.    # change the home path so ruby understands, i.e. C:/cygwin/home
echo.    processed_args=${*/\/home/C:\/cygwin\/home}
echo.    /cygdrive/c/opscode/chef/bin/chef-solo.bat $processed_args

Appropriately for Windows, the template is one big batch file full of DOS commands (still pre-processed by ERB).  The above snippet creates a file called chef_helper on the Windows instance, with the following content:

function chef-solo() {
    # change the home path so ruby understands, i.e. C:/cygwin/home
    /cygdrive/c/opscode/chef/bin/chef-solo.bat $processed_args

Finally, we can launch the solo run!  I’ll just test this by ssh-ing into the Windows node and simulate what the cook command would do, i.e. source our chef_helper before launching the solo run:

Administrator@WIN-G3UP33FLLO5 ~
$ . chef_helper && chef-solo -c ~/chef-solo/solo.rb -j ~/chef-solo/dna.json
Starting Chef Client, version 11.10.4
Chef Client failed. 0 resources updated in 10.639275 seconds
[2014-02-28T21:04:31+00:00] FATAL: TypeError: powershell_script[fix-explorer] (stack::default_windows line 21) had an error: TypeError: can't convert nil into String

As usual things don’t work the first time.  Digging into the failure, it seems Chef expects a Windows environment variable PROCESSOR_ARCHITECTURE which goes missing when you ssh into Windows.  A quick search on “PROCESSOR_ARCHITECTURE missing under ssh” explains that Cygwin only imports a defined set of system variables, so we’ll have to somehow propagate this value on our own — by tacking it onto the helper script.  And while at it, I’ll put in a step to make sure the helper script has Unix line-endings:

@echo Set up Cygwin environment for user...
@rem Part 1: create a file with helper function and env var
> c:\cygwin\home\administrator\helper.temp (
echo.function chef-solo^(^) {
echo.    # change the home path so ruby understands, i.e. C:/cygwin/home
echo.    processed_args=${*/\/home/C:\/cygwin\/home}
echo.    /cygdrive/c/opscode/chef/bin/chef-solo.bat $processed_args
echo.export PROCESSOR_ARCHITECTURE=%processor_architecture%
   @echo Failed to create helper script with status code !ERRORLEVEL!. > "&2"
@rem Part 2: clean up line endings and do a sanity test
@set cygexec=c:\cygwin\bin\bash.exe --login -c
@call !cygexec! "cat helper.temp | tr -d '\r' >chef_helper"
@call !cygexec! "rm helper.temp"
@call !cygexec! "test -f chef_helper && . chef_helper"
   @echo Failed to finalize helper script with status code !ERRORLEVEL!. > "&2"

Missing Chef Log Output

Now we’re finally getting a completed solo run, but why don’t I see any log output to STDOUT?  Turns out under Cygwin, ruby’s IO.tty? method returns false (you can find more details from searching on “ruby io.tty? under cygwin”) so the solo run doesn’t bother logging to standard output.  My workaround is definitely a hack, but I’ll do pretty much anything at this point to get things going so the final bit is to override IO.tty? to return true and trick Chef into giving more feedback.

@rem Part 3: customize chef-solo to override the IO.tty? method
@copy c:\opscode\chef\bin\chef-solo c:\opscode\chef\bin\chef-solo-orig
@call !cygexec! "sed -i '/require .rubygems./a ## override to always return true\nclass IO; def tty?; true end end' /cygdrive/c/opscode/chef/bin/chef-solo"
@call !cygexec! "test -f /cygdrive/c/opscode/chef/bin/chef-solo-orig && /cygdrive/c/opscode/chef/bin/chef-solo.bat --version"
   @echo Failed to customize chef-solo with status code !ERRORLEVEL!. > "&2"
) else (
   @echo Set up Cygwin environment for user succeeded.

The final result is that after bootstrapping a Windows instance with this customized template, there is a chef_helper wrapper script under the login user’s home directory:

$ cat chef_helper
function chef-solo() {
    # change the home path so ruby understands, i.e. C:/cygwin/home
    /cygdrive/c/opscode/chef/bin/chef-solo.bat $processed_args


And a slightly tweaked chef-solo that overrides IO.tty? in Chef’s ruby process:

$ diff -bu /cygdrive/c/opscode/chef/bin/chef-solo-orig /cygdrive/c/opscode/chef/bin/chef-solo
--- /cygdrive/c/opscode/chef/bin/chef-solo-orig	2014-02-20 21:11:48.000000000 +0000
+++ /cygdrive/c/opscode/chef/bin/chef-solo	2014-02-28 21:45:41.657481600 +0000
@@ -7,6 +7,8 @@

 require 'rubygems'
+## override to always return true
+class IO; def tty?; true end end

 version = ">= 0"

Beyond Instrumentation

While customizing templates under omnibus has certainly worked for us so far, testing instrumentation is certainly application of this strategy. I hope this article has shown some interesting ways in which a customized Chef bootstrap template can be part of the automation toolbox.  It can play well with other similar mechanisms like EC2’s User Data and Ubuntu cloud-init, and in fact we are using both User Data (to enable WinRM and install Cygwin) and the custom bootstrap template as shown above to automate Windows stack setup.

We’re looking for other opportunities to use this approach in our infrastructure. If you’re doing something similar, let us know in the comments!

Team AppNeta: