Building a Functional Puppet Workflow Part 3: Dynamic Environments With R10k

Feb 18th, 2014

Workflows are like kickball games: everyone knows the general idea of what’s going on, there’s an orderly progression towards an end-goal, nobody wants to be excluded, and people lose their shit when they get hit in the face by a big rubber ball. Okay, so maybe it’s not a perfect mapping but you get the idea.

The previous two posts (one and two) focused on writing modules, wrapping modules, and classification. While BOTH of these things are very important in the grand scheme of things, one of the biggest problems people get hung-up on is how do you iterate upon your modules, and, more importantly, how do you eventually get these changes pushed into production in a reasonably orderly fashion?

This post is going to be all over the place. We’re gonna cover the idea of separate environments in Puppet, touch on dynamic environments, and round it out with that mother-of-a-shell-script-turned-personal-savior, R10k. Hold on to your shit.

Puppet Environments

Puppet has the concept of ‘environments’ where you can logically separate your modules and manifest (read: site.pp) into separate folders to allow for nodes to get entirely separate bits of code based on which ‘environment’ the node belongs to.

Puppet environments are statically set in puppet.conf, but, as other blog posts have noted, you can do some crafty things in puppet.conf to give you the solution of having ‘dynamic environments’.

NOTE: The solutions in this post are going to rely on Puppet environments, however environments aren’t without their own shortcomings namely, this bug on Ruby plugins in Puppet). For testing and promoting Puppet classes written in the DSL, environments will help you out greatly. For complete separation of Ruby instances and any plugins to Puppet written in Ruby, however, you’ll need separate masters (which is something that I won’t be covering in this article).

One step further – ‘dynamic’ environments

Adrien Thebo, hitherto known as ‘Finch’, – who is known for building awesome things and talking like he’s fresh from a Redbull binge – created the now-famous blog post on creating dynamic environments in Puppet with git. That post relied upon a post-commit hook to do all the jiggery-pokery necessary to checkout the correct branches in the correct places, and thus it had a heavy reliance upon git.

Truly, the only magic in puppet.conf was the inclusion of ‘$environment’ in the modulepath configuration entry on the Puppet master (literally that string and not the evaluated form of your environment). By doing that, the Puppet master would replace the string ‘$environment’ with the environment of the node checking in and would look to that path for Puppet manifests and modules. If you use something OTHER than git, it would be up to you to create a post-receive hook that populated those paths, but you could still replicate the results (albiet with a little work on your part).

People used this pattern and it worked fairly well. Hell, it STILL works fairly well, nothing has changed to STOP you from using it. What changed, however, was the ecosystem around modules, the need for individual module testing, and the further need to automate this whole goddamn process.

Before we deliver the ‘NEW SOLUTION’, let’s provide a bit of history and context.

Module repositories: the one-to-many problem

I touched on this topic in the first post, but one of the first problems you encounter when putting your modules in version control is whether or not to have ONE GIANT REPO with all of your modules, or a repository for every module you create. In the past we recommended putting every module in one repository (namely because it was easier, the module sharing landscape was pretty barren, and teams were smaller). Now, we recommend the opposite for the following reasons:

Individual repos mean individual module development histories
Most VCS solutions don’t have per-folder ACLs for a single repositories; having multiple repos allows per-module security settings.
With the one-repository-per-module solution, modules you pull down from the Forge (or Github) must be committed to your repo. Having multiple repositories for each module allow you to keep everything separate
Publishing this module to the Forge (or Github/Stash/whatever) is easier with separate repos (rather than having to split-out the module later).

The problem with having a repository for every Puppet module you create is that you need a way to map every module with every Puppet master (and, also which version of every module should be installed in which Puppet environment).

A project called librarian-puppet sprang up that created the ‘Puppetfile’, a file that would map modules and their versions to a specific directory. Librarian was awesome, but, as Finch noted in his post, it had some shortcomings when used in an environment with many and fast-changing modules. His solution, that he documented here,, was the tool we now come to know as R10k.

Enter R10k

R10k is essentially a Ruby project that wraps a bunch of shell commands you would NORMALLY use to maintain an environment of ever-changing Puppet modules. Its power is in its ability to use Git branches combined with a Puppetfile to keep your Puppet environments in-sync. Because of this, R10k is CURRENTLY restricted to git. There have been rumblings of porting it to Hg or svn, but I know of no serious attempts at doing this (and if you ARE doing this, may god have mercy on your soul). Great, so how does it work?

Well, you’ll need one main repository SIMPLY for tracking the Puppetfile. I’ve got one right here, and it only has my Puppetfile and a site.pp file for classification (should you use it).

NOTE: The Puppetfile and librarian-puppet-like capabilities under the hood are going to be doing most of the work here – this repository is solely so you can create topic branches with changes to your Puppetfile that will eventually become dynamically-created Puppet environments.

Let’s take a look at the Puppetfile and see what’s going on:

Puppetfile

forge "http://forge.puppetlabs.com"

# Modules from the Puppet Forge
mod "puppetlabs/stdlib"
mod "puppetlabs/apache", "0.11.0"
mod "puppetlabs/pe_gem"
mod "puppetlabs/mysql"
mod "puppetlabs/firewall"
mod "puppetlabs/vcsrepo"
mod "puppetlabs/git"
mod "puppetlabs/inifile"
mod "zack/r10k"
mod "gentoo/portage"
mod "thias/vsftpd"


# Modules from Github using various references
mod "wordpress",
  :git => "git://github.com/hunner/puppet-wordpress.git",
  :ref => '0.4.0'

mod "property_list_key",
  :git => "git://github.com/glarizza/puppet-property_list_key.git",
  :ref => '952a65d9ea2c5809f4e18f30537925ee45548abc'

mod 'redis',
  :git => 'git://github.com/glarizza/puppet-redis',
  :ref => 'feature/debian_support'

This example lists the syntax for dealing with modules from both the Forge and Github, as well as pulling specific versions of modules (whether versions in the case of the Forge, or Github references as tags, branches, or even specific commits). The syntax is not hard to follow – just remember that we’re mapping modules and their versions to a set/known environment.

For every topic branch on this repository (containing the Puppetfile), R10k will in turn create a Puppet environment with the same name. For this reason, it’s convention to rename the ‘master’ branch to ‘production’ since that’s the default environment in Puppet (note that renaming branches locally is easy – renaming the branch on Github can sometimes be a pain in the ass). You will also note why it’s going to be somewhat hard to map R10k to subversion, for example, due to the lack of lightweight branching schemes.

To explain any more of R10k reads just as if I were describing its installation, so let’s quit screwing around and actually INSTALL/SETUP the damn thing.

Setting up R10k

As I mentioned before, we have the main repository that will be used to track the Puppetfile, which in turn will track the modules to be installed (whether from The Forge, Github, or some internal git repo). Like any good Puppet component, R10k itself can be setup with a Puppet module. The module I’ll be using was developed by Zack Smith, and is pretty simple to get started. Let’s download it from the forge first:

[root@master1 vagrant]# puppet module install zack/r10k
Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└─┬ zack-r10k (v1.0.2)
  ├─┬ gentoo-portage (v2.1.0)
  │ └── puppetlabs-concat (v1.0.1)
  ├── mhuffnagle-make (v0.0.2)
  ├── puppetlabs-gcc (v0.1.0)
  ├── puppetlabs-git (v0.0.3)
  ├── puppetlabs-inifile (v1.0.1)
  ├── puppetlabs-pe_gem (v0.0.1)
  ├── puppetlabs-ruby (v0.1.0)
  └── puppetlabs-vcsrepo (v0.2.0)

The module will be installed into the first path in your modulepath, which in the case above is /etc/puppetlabs/puppet/modules. This modulepath will change due to the way we’re going to setup our dynamic Puppet environments. For this example, I’m going to have environments dynamically generated at /etc/puppetlabs/puppet/environments, so let’s create that directory first:

[root@master1 vagrant]# mkdir -p /etc/puppetlabs/puppet/environments

Now, we need to setup R10k on this machine. The module we downloaded will allow us to do that, but we’ll need to create a small Puppet manifest that will allow us to setup R10k out-of-band from a regular Puppet run (you CAN continuously-enforce R10k configuration in-band with your regular Puppet run, but if we’re setting up a Puppet master to use R10k to serve out dynamic environments it’s possible to create a chicken-and-egg situation.). Let’s generate a file called r10k_installation.pp in /var/tmp and have it look like the following:

/var/tmp/r10k_installation.pp

class { 'r10k':
  version           => '1.1.3',
  sources           => {
    'puppet' => {
      'remote'  => 'https://github.com/glarizza/puppet_repository.git',
      'basedir' => "${::settings::confdir}/environments",
      'prefix'  => false,
    }
  },
  purgedirs         => ["${::settings::confdir}/environments"],
  manage_modulepath => true,
  modulepath        => "${::settings::confdir}/environments/\$environment/modules:/opt/puppet/share/puppet/modules",
}

So what is every section of that declaration doing?

version => '1.1.3' sets the version of the R10k gem to install
sources => {...} is a hash of sources that R10k is going to track. For now it’s only our main Puppet repo, but you can also track a Hiera installation too. This hash accepts key/value pairs for configuration settings that are going to be written to /etc/r10k.yaml, which is R10k’s main configuration file. The keys in-use are remote, which is the path to the repository to-be-checked-out by R10k, basedir, which is the path on-disk to where dynamic environments are to be created (we’re using the $::settings::confdir variable which maps to the Puppet master’s configuration directory, or /etc/puppetlabs/puppet), and prefix which is a boolean to determine whether to use R10k’s source-prefixing feature. NOTE: the false value is a BOOLEAN value, and thus SHOULD NOT BE QUOTED. Quoting it turns it into a string, which matches as a boolean TRUE value. Don’t quote false – that’s bad, mmkay.
purgedirs=> ["${::settings::confdir}/environments"] is configuring R10k to implement purging on the environments directory (so any folders that R10k doesn’t create it will delete). This configuration MAY be moot with newer versions of R10k as I believe it implements this behavior by default.
manage_modulepath => true will ensure that this module sets the modulepath configuration item in /etc/puppetlabs/puppet/puppet.conf
modulepath => ... sets the modulepath value to be dropped into /etc/puppetlabs/puppet/puppet.conf. Note that we are interpolating variables ($::settings::confdir again), AND inserting the LITERAL string of $environment into the modulepath – this is because Puppet will replace $environment with the value of the agent’s environment at catalog compilation.

JUST IN CASE YOU MISSED IT: Don’t quote the false value for the prefix setting in the sources block. That is all.

Okay, we have our one-time Puppet manifest, and now the only thing left to do is to run it:

[root@master1 tmp]# puppet apply /var/tmp/r10k_installation.pp
Notice: Compiled catalog for master1 in environment production in 2.05 seconds
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/ensure: defined content as '{md5}0b619d5148ea493e2d6a5bb205727f0c'
Notice: /Stage[main]/R10k::Config/Ini_setting[R10k Modulepath]/value: value changed '/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules' to '/etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules'
Notice: /Package[r10k]/ensure: created
Notice: /Stage[main]/R10k::Install::Pe_gem/File[/usr/bin/r10k]/ensure: created
Notice: Finished catalog run in 10.55 seconds

At this point, it goes without saying that git needs to be installed, but if you’re firing up a new VM that DOESN’T have git, then R10k is going to spit out an awesome error – so ensure that git is installed. After that, let’s synchronize R10k with the r10k deploy environment -pv command (-p for Puppetfile synchronization and -v for verbose mode):

[root@master1 puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

I ran this first synchronization with verbose mode so you can see exactly what’s getting copied where. Futher synchronizations don’t have to be in verbose mode, but it’s good for debugging. After all of that, we have an /etc/puppetlabs/puppet/environments folder containing our dynamic Puppet environments based off of the branches of the main Puppet repo:

[root@master1 puppet]# ls -lah /etc/puppetlabs/puppet/environments/
total 20K
drwxr-xr-x 5 root root 4.0K Feb 19 11:44 .
drwxr-xr-x 7 root root 4.0K Feb 19 11:25 ..
drwxr-xr-x 4 root root 4.0K Feb 19 11:44 development
drwxr-xr-x 5 root root 4.0K Feb 19 11:43 master
drwxr-xr-x 5 root root 4.0K Feb 19 11:42 production

[root@master1 puppet]# cd /etc/puppetlabs/puppet/environments/production/
[root@master1 production]# git branch -a
  master
* production
  remotes/origin/HEAD -> origin/master
  remotes/origin/development
  remotes/origin/master
  remotes/origin/production

As you can see (at the time of this writing), my main Puppet repo has three main branches: development, master, and production, and so R10k created three Puppet environments matching those names. It’s somewhat of a convention to rename the master branch to production, but in this case I left it alone to demonstrate how this works.

ONE OTHER BIG GOTCHA: R10k does NOT resolve dependencies, and so it is UP TO YOU to track them in your Puppetfile. Check this out:

[root@master1 production]# puppet module list
Warning: Module 'puppetlabs-firewall' (v1.0.0) fails to meet some dependencies:
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'puppetlabs-firewall' (v0.3.x)
Warning: Module 'puppetlabs-stdlib' (v4.1.0) fails to meet some dependencies:
  'puppetlabs-pe_accounts' (v2.0.1) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-pe_mcollective' (v0.1.14) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-request_manager' (v0.0.10) requires 'puppetlabs-stdlib' (v3.2.x)
Warning: Missing dependency 'cprice404-inifile':
  'puppetlabs-pe_puppetdb' (v0.0.11) requires 'cprice404-inifile' (>=0.9.0)
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'cprice404-inifile' (v0.10.x)
  'puppetlabs-puppetdb' (v1.5.1) requires 'cprice404-inifile' (>= 0.10.3)
Warning: Missing dependency 'puppetlabs-concat':
  'puppetlabs-apache' (v0.11.0) requires 'puppetlabs-concat' (>= 1.0.0)
  'gentoo-portage' (v2.1.0) requires 'puppetlabs-concat' (v1.0.x)
Warning: Missing dependency 'puppetlabs-gcc':
  'zack-r10k' (v1.0.2) requires 'puppetlabs-gcc' (>= 0.0.3)
/etc/puppetlabs/puppet/environments/production/modules
├── gentoo-portage (v2.1.0)
├── mhuffnagle-make (v0.0.2)
├── property_list_key (???)
├── puppetlabs-apache (v0.11.0)
├── puppetlabs-firewall (v1.0.0)  invalid
├── puppetlabs-git (v0.0.3)
├── puppetlabs-inifile (v1.0.1)
├── puppetlabs-mysql (v2.2.1)
├── puppetlabs-pe_gem (v0.0.1)
├── puppetlabs-ruby (v0.1.0)
├── puppetlabs-stdlib (v4.1.0)  invalid
├── puppetlabs-vcsrepo (v0.2.0)
├── redis (???)
├── ripienaar-concat (v0.2.0)
├── thias-vsftpd (v0.2.0)
├── wordpress (???)
└── zack-r10k (v1.0.2)
/opt/puppet/share/puppet/modules
├── cprice404-inifile (v0.10.3)
├── puppetlabs-apt (v1.1.0)
├── puppetlabs-auth_conf (v0.1.7)
├── puppetlabs-firewall (v0.3.0)  invalid
├── puppetlabs-java_ks (v1.1.0)
├── puppetlabs-pe_accounts (v2.0.1)
├── puppetlabs-pe_common (v0.1.0)
├── puppetlabs-pe_mcollective (v0.1.14)
├── puppetlabs-pe_postgresql (v0.0.5)
├── puppetlabs-pe_puppetdb (v0.0.11)
├── puppetlabs-postgresql (v2.5.0)
├── puppetlabs-puppet_enterprise (v3.1.0)
├── puppetlabs-puppetdb (v1.5.1)
├── puppetlabs-reboot (v0.1.2)
├── puppetlabs-request_manager (v0.0.10)
├── puppetlabs-stdlib (v3.2.0)  invalid
└── ripienaar-concat (v0.2.0)

I’ve installed Puppet Enterprise 3.1.0, and so /opt/puppet/share/puppet/modules reflects the state of the Puppet Enterprise (also known as ‘PE’) modules at that time. You can see that there are some conflicts because certain modules require certain versions of other modules. This is currently the nature of the beast with regard to Puppet modules. Some of these errors are loud and incidental (i.e. someone set a dependency on a version and forgot to update it), some are due to namespace changes (i.e. cfprice-inifile being ported over to puppetlabs-inifile), and so on. Basically, ensure that you handle the dependencies you care about inside the Puppetfile as R10k won’t do it for you.

There – we’ve done it! We’ve configured R10k! Now how the hell do you use it?

R10k demonstration – from module iteration to environment iteration

Let’s take the environment we’ve setup in the previous steps and walk you through adding a new module to your production environment, iterating upon that module, pushing the changes to that module, pushing the changes to a Puppet environment, and then promoting those changes to production.

NOTES ON THE SETUP OF THIS DEMO:

In this demonstration, classification method is going to be left to the user (i.e. it’s not a part of the magic). So, when I tell you to classify your node with a specific class, I don’t care if you use the Puppet Enterprise Console, site.pp, or any other manner.
I’m using Github for my repositories so that you folk watching and playing along at home can have something to follow. Feel free to substitute Github for something like Atlassian Stash/Bitbucket, internal repos, or whatever.

Add the module to an environment

The module we’ll be working with, a simple module called ‘notifyme’, will notify a message that will help us track the module’s process through all phases of iteration.

The first thing we need to do is to add the module to an environment, so let’s dynamically create a NEW environment by creating a new topic branch and pushing it up to the main puppet repo. I will perform this step on my laptop and outside of the VM I’m using to test R10k:

└(~/src/puppet_repository)▷ git branch
  master
* production

└(~/src/puppet_repository)▷ git checkout -b notifyme
Switched to a new branch 'notifyme'

└(~/src/puppet_repository)▷ vim Puppetfile

# Perform the changes to Puppetfile here

└(~/src/puppet_repository)▷ git add Puppetfile
└(~/src/puppet_repository)▷ git commit
[notifyme 5239538] Add the 'notifyme' module
 1 file changed, 3 insertions(+)

└(~/src/puppet_repository)▷ git push origin notifyme:notifyme
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 348 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
 * [new branch]      notifyme -> notifyme

The contents I added to my Puppetfile look like this:

Puppetfile

mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git"

Perform an R10k synchronization

To pull the new dynamic environment down to the Puppet master, do another R10k synchronization with r10k deploy environment -pv:

[root@master1 production]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
<snip for brevity>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment notifyme
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/notifyme/modules
<more snipping>

I only included the relevant messages, but you can see that it pulled in a new environment called ‘notifyme’ that ALSO pulled in a module called ‘notifyme’

Rename the branch to avoid confusion

Suddenly I realize that this may get confusing having both an environment called ‘notifyme’ with a module/class called ‘notifyme’. No worries, how about we rename that branch?

└(~/src/puppet_repository)▷ git branch -m notifyme garysawesomeenvironment

└(~/src/puppet_repository)▷ git push origin :notifyme
To https://github.com/glarizza/puppet_repository.git
 - [deleted]         notifyme

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 348 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
 * [new branch]      garysawesomeenvironment -> garysawesomeenvironment

That bit of git renamed the ‘notifyme’ branch to ‘garysawesomeenvironment’. The next git command is a bit tricky – when you git push to a remote, it’s supposed to be:

git push name_of_origin local_branch_name:remote_branch_name

In our case, the name of our origin is LITERALLY ‘origin’, but we actually want to DELETE a remote branch. The way to delete a local branch is with git branch -d branch_name, but the way to delete a REMOTE branch is to push NOTHING to it. So consider the following command:

git push origin :notifyme

We’re pushing to the origin named ‘origin’, but providing NO local branch name and pushing that bit of nothing to the remote branch of ‘notifyme’. This kills (deletes) the remote branch.

Finally, we push to our origin named ‘origin’ again and push the contents of the local branch ‘garysawesomeenvironment’ to the remote branch of ‘garysawesomeenvironment’ which in turn CREATES that branch if it doesn’t exist. Whew. Let’s run another damn synchronization:

[root@master1 production]# `r10k deploy environment -pv`
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
<more snippage>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
<more of that snipping shit>
R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

Cool, let’s check out our environments folder on our VM:

[root@master1 production]# ls -lah /etc/puppetlabs/puppet/environments/
total 24K
drwxr-xr-x 6 root root 4.0K Feb 19 13:34 .
drwxr-xr-x 7 root root 4.0K Feb 19 12:09 ..
drwxr-xr-x 4 root root 4.0K Feb 19 11:44 development
drwxr-xr-x 5 root root 4.0K Feb 19 13:33 garysawesomeenvironment
drwxr-xr-x 5 root root 4.0K Feb 19 11:43 master
drwxr-xr-x 5 root root 4.0K Feb 19 11:42 production

[root@master1 production]# cd /etc/puppetlabs/puppet/environments/garysawesomeenvironment/

[root@master1 garysawesomeenvironment]# git branch
* garysawesomeenvironment
  master

Run Puppet to test the new environment

Perfect! Now classify your node to include the ‘notifyme’ class, and let’s run Puppet to see what we get when we try to join the environment called ‘garysawesomeenvronment’:

[root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snipping facts loading for brevity>
Info: Caching catalog for master1
Info: Applying configuration version '1392845863'
Notice: This is the notifyme module and its master branch
Notice: /Stage[main]/Notifyme/Notify[This is the notifyme module and its master branch]/message: defined 'message' as 'This is the notifyme module and its master branch'
Notice: Finished catalog run in 11.10 seconds

Cool! Now let’s try to run Puppet with another environment, say ‘production’:

[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snipping facts loading for brevity>
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class notifyme for master1 on node master1
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

We get an error because that module hasn’t been loaded by R10k for that environment.

Tie a module version to an environment

Okay, so we added a module to a new environment, but what if we want to test out a specific commit, branch, or tag of a module and test it in this new environment? This is frequently what you’ll be doing – making a change to an existing module, pushing your change to a topic branch of that module’s repository, tying it to an environment (or creating a new environment by branching the main Puppet repository), and then testing the change.

Let’s go back to my ‘notifyme’ module that I’ve cloned to my laptop and push a change to a BRANCH of that module’s Github repository:

└(~/src/puppet-notifyme)▷ git branch
* master

└(~/src/puppet-notifyme)▷ git checkout -b change_the_message
Switched to a new branch 'change_the_message'

└(~/src/puppet-notifyme)▷ vim manifests/init.pp
## Make changes to the notify message

└(~/src/puppet-notifyme)▷ git add manifests/init.pp

└(~/src/puppet-notifyme)▷ git commit
[change_the_message bc3975b] Change the Message
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin change_the_message:change_the_message
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 448 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
 * [new branch]      change_the_message -> change_the_message

└(~/src/puppet-notifyme)▷ git branch -a
* change_the_message
  master
  remotes/origin/change_the_message
  remotes/origin/master

└(~/src/puppet-notifyme)▷ git log
commit bc3975bb5c75ada86bfc2c45db628b5a156f85ce
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 13:55:26 2014 -0800

    Change the Message

    This commit changes the message to test my workflow.

What I’m showing you is the workflow that creates a new local branch called ‘change_the_message’ to the notifyme module, changes the message in my notify resource, commits the change, and pushes the changes to a remote branch ALSO called ‘change_the_message’.

Because I created a topic branch, I can provide that branch name in the Puppetfile located in the ‘garysawesomeenvironment’ branch of the main Puppet repo. THAT is the piece that ties together the specific version of the module with the Puppet environment we want on the Puppet master. Here’s that change:

Puppetfile

mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git",
  :ref => 'change_the_message'

Again, that change gets put into the ‘garysawesomeenvironment’ branch of the main Puppet repo and pushed up to the remote:

└(~/src/puppet_repository)▷ vim Puppetfile
## Make changes

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[garysawesomeenvironment 89b139c] Update garysawesomeenvironment
 1 file changed, 2 insertions(+), 1 deletion(-)

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 411 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   5239538..89b139c  garysawesomeenvironment -> garysawesomeenvironment

└(~/src/puppet_repository)▷ git log -p
commit 89b139c8c2faa888a402b98ea76e4ca138b3463d
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 14:04:18 2014 -0800

    Update garysawesomeenvironment

    Tie this environment to the 'change_the_message' branch of my notifyme module.

diff --git a/Puppetfile b/Puppetfile
index 5e5d091..27fc06e 100644
--- a/Puppetfile
+++ b/Puppetfile
@@ -31,4 +31,5 @@ mod 'redis',
   :ref => 'feature/debian_support'

 mod "notifyme",
-  :git => "git://github.com/glarizza/puppet-notifyme.git"
+  :git => "git://github.com/glarizza/puppet-notifyme.git",
+  :ref => 'change_the_message'

Now let’s synchronize again!!

[root@master1 garysawesomeenvironment]# `r10k deploy environment -pv`
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
<snip>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
<snip>

Cool, let’s check our work on the VM:

[root@master1 garysawesomeenvironment]# pwd
/etc/puppetlabs/puppet/environments/garysawesomeenvironment
[root@master1 garysawesomeenvironment]# git branch
* garysawesomeenvironment
  master

And finally, let’s run Puppet:

root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392847743'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 12.10 seconds

TADA! We’ve successfully tied a specific version of a module to a specific dynamic environment, deployed it to a master, and tested it out! Smell that? That’s the smell of awesome. Or Jeff in the next cubicle eating a burrito. Either way, I like it.

Merge your changes with master/production

It’s green – fuck it; ship it! NOW you’re speaking ‘agile’! Assuming everything went according to plan, let’s merge our changes in with the production environment and synchronize. This is up to your company’s workflow docs (whether you use pull requests, a merge master, or poke Patrick and tell him to tell Andy to merge in your change). I’m using git and Github, so let’s merge.

First, do the Module:

└(~/src/puppet-notifyme)▷ git checkout master
Switched to branch 'master'

└(~/src/puppet-notifyme)▷ git merge change_the_message
Updating d44a790..bc3975b
Fast-forward
 manifests/init.pp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin master:master
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
   d44a790..bc3975b  master -> master

└(~/src/puppet-notifyme)▷ cat manifests/init.pp
class notifyme {
  notify { "This is the changed message in the change_the_message branch": }
}

So now we have an issue, and that issue is that the production environment has YET to have the ‘notifyme’ module added to it. If we merge the contents of the ‘garysawesomeenvironment’ branch with the ‘production’ branch of the main Puppet repo, then we’re going to be pointing at the ‘change_the_message’ branch of the ‘notifyme’ module (because that was our last commit).

Because of this, I can’t do a straight merge, can I? For posterity’s sake (in the event that someone in the future wants to look for that branch on my Github repo), I’m going to keep that branch alive. In a production environment, I most likely would NOT have additional branches open for all my component modules as that would get pretty annoying/confusing. Understand that this is a one-off case because I’m doing a demo. BECAUSE of this, I’m going to modify the Puppetfile in the ‘production’ branch of the main Puppet repo:

└(~/src/puppet_repository)▷ git checkout production
Switched to branch 'production'

└(~/src/puppet_repository)▷ vim Puppetfile
## Make changes here

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[production a74f269] Add notifyme module to Production environment
 1 file changed, 4 insertions(+)

└(~/src/puppet_repository)▷ git push origin production:production
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 362 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   5ecefc8..a74f269  production -> production

└(~/src/puppet_repository)▷ git log -p
commit a74f26975102f3786eedddace89bda086162d801
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 14:24:05 2014 -0800

    Add notifyme module to Production environment

diff --git a/Puppetfile b/Puppetfile
index 0b1da68..9168a81 100644
--- a/Puppetfile
+++ b/Puppetfile
@@ -29,3 +29,7 @@ mod "property_list_key",
 mod 'redis',
   :git => 'git://github.com/glarizza/puppet-redis',
   :ref => 'feature/debian_support'
+
+mod 'notifyme',
+  :git => 'git://github.com/glarizza/puppet-notifyme'
+

Alright, we’ve updated the production environment, now synchronize again (I’ll spare you and do it WITHOUT verbose mode):

[root@master1 garysawesomeenvironment]# r10k deploy environment -p

Okay, now run Puppet with the PRODUCTION environment:

[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snipping fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392848588'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 12.66 seconds

Beautiful, we’re synchronized!!!

Making a change to an EXISTING module in an environment

Okay, so we saw previously how to add a NEW module to an environment, but what if we already HAVE a module in an environment and we want to make an update/change to it? Well, it’s largely the same process:

Cut a branch to the module
Commit your code and push it up to the module’s repo
Cut a branch to the main Puppet repo
Push that branch up to the main Puppet repo
Perform an R10k synchronization to sync the environments
Test your changes
Merge the changes with the master branch of the module
DONE!

Let’s go back and change that notify message again, shall we?

└(~/src/puppet-notifyme)▷ git checkout -b 'another_change'
Switched to a new branch 'another_change'

└(~/src/puppet-notifyme)▷ vim manifests/init.pp
## Make changes to the message

└(~/src/puppet-notifyme)▷ git add manifests/init.pp

└(~/src/puppet-notifyme)▷ git commit
[another_change 608166e] Change the message that already exists!
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin another_change:another_change
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 426 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
 * [new branch]      another_change -> another_change

Okay, let’s re-use ‘garysawesomeenvironment’ because I like the name, but tie it to the new ‘another_change’ branch of the ‘notifyme’ module:

└(~/src/puppet_repository)▷ git checkout garysawesomeenvironment
Switched to branch 'garysawesomeenvironment'

└(~/src/puppet_repository)▷ vim Puppetfile
## Make change to Puppetfile to tie it to 'another_change' branch

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[garysawesomeenvironment ce84a30] Tie garysawesomeenvironment to 'another_change'
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 386 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   89b139c..ce84a30  garysawesomeenvironment -> garysawesomeenvironment

The Puppetfile for that branch now has an entry for the ‘notifyme’ module that looks like this:

Puppetfile

mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git",
  :ref => 'another_change'

Okay, synchronize again!

[root@master1 garysawesomeenvironment]# r10k deploy environment -p

And now run Puppet in the ‘garysawesomeenvironment’ environment:

[root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392849521'
Notice: This changes the message that already exists!!!!
Notice: /Stage[main]/Notifyme/Notify[This changes the message that already exists!!!!]/message: defined 'message' as 'This changes the message that already exists!!!!'
Notice: Finished catalog run in 12.54 seconds

There’s the message that I changed in the ‘another_change’ branch of my ‘notifyme’ module! What’s it look like if I run in the ‘production’ environment, though?

ot@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392848588'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 14.11 seconds

There’s the old message that’s in the ‘master’ branch of the ‘notifyme’ module (which is where the ‘production’ branch Puppetfile is pointing). To merge the changes into the production environment, we now only have to do one thing: that’s merge the changes in the ‘another_change’ branch of the ‘notifyme’ module to the ‘master’ branch – that’s it! Why? Because the Puppetfile in the production branch of the main Puppet repo (and thus the production Puppet ENVIRONMENT) is already POINTING at the master branch of the ‘notifyme’ module. Let’s do the merge:

└(~/src/puppet-notifyme)▷ git checkout master
Switched to branch 'master'

└(~/src/puppet-notifyme)▷ git merge another_change
Updating bc3975b..608166e
Fast-forward
 manifests/init.pp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin master:master
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
   bc3975b..608166e  master -> master

Another R10k synchronization is needed on the master:

[root@master1 garysawesomeenvironment]# r10k deploy environment -p

And now let’s run Puppet in the production environment:

[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392850004'
Notice: This changes the message that already exists!!!!
Notice: /Stage[main]/Notifyme/Notify[This changes the message that already exists!!!!]/message: defined 'message' as 'This changes the message that already exists!!!!'
Notice: Finished catalog run in 11.82 seconds

There’s the message that was previously in the ‘another_change’ branch that’s been merged to the ‘master’ branch (and thus is entered into the production Puppet environment).

OR, use tags

One more note – for production environments that want a BIT more stability (rather than hoping that someone follows the policy of pushing commits to a BRANCH of a module rather than pushing directly to master – by accident or otherwise – and allowing that commit to make DIRECTLY it into production), the better way is to tie all modules to some sort of release version. For modules released to the Puppet Forge, that’s a version, for modules stored in git repositories, that would be a tag. Tying all modules in your production environment (and thus production Puppetfile) to specific tags in git repositories IS a “best practice” to ensure that the code that’s executed in production has some sort of ‘safe guard’.

TL;DR: Example tied to ‘master’ branch above was demo, and not necessarily recommended for your production needs.

Holy crap, that’s a lot to take in…

Yeah, tell me about it. And, believe it or not, I’m STILL not done with everything that I want to talk about regarding R10k – there’s still more info on:

Using R10k with a monolithic modules repo
Incorporating Hiera data
Triggering R10k with MCollective
Tying R10k to CI workflow

Those will come in a later post once I have time to decide how to tackle them. Until then, this should give you more than enough information to get started with R10k in your own environment.

If you have any questions/comments/corrections, PLEASE enter them in the comments below and I’ll be happy to respond when I’m not flying from gig to gig! :) Cheers!

EDIT: 2/19/2014 – correct librarian-puppet assumption thanks to Reid Vandewiele

Building a Functional Puppet Workflow Part 2: Roles and Profiles

Feb 17th, 2014

In my first post, I talked about writing functional component modules. Well, I didn’t really do much detailing other than pointing out key bits of information that tend to cause problems. In this post, I’ll describe the next layer to the functional Puppet module workflow.

People usually stop once they have a library of component modules (whether hand-written, taken from Github, or pulled from The Forge). The idea is that you can classify all of your nodes in site.pp, the Puppet Enterprise Console, The Foreman, or with some other ENC, so why not just declare all your classes for every node when you need them?

Because that’s a lot of extra work and opportunities for fuckups.

People recognized this, so in the EARLY days of Puppet they would create node blocks in site.pp and use inheritance to inherit from those blocks. This was the right IDEA, but probably not the best PLACE for it. Eventually, ‘Profiles’ were born.

The idea of ‘Roles and Profiles’ originally came from a piece that Craig Dunn wrote while he worked for the BBC, and then Adrien Thebo also wrote a piece that documents the same sort of pattern. So why am I writing about it a THIRD time? Well, because I feel it’s only a PIECE of an overall puzzle. The introduction of Hiera and other awesome tools (like R10k, which we will get to on the next post) still make Roles and Profiles VIABLE, but they also extend upon them.

One final note before we move on – the terms ‘Roles’ and ‘Profiles’ are ENTIRELY ARBITRARY. They’re not magic reserve words in Puppet, and you can call them whatever the hell you want. It’s also been pointed out that Craig MIGHT have misnamed them (a ROLE should be a model for an individual piece of tech, and a PROFILE should probably be a group of roles), but, like all good Puppet Labs employees – we suck at naming things.

Profiles: technology-specific wrapper classes

A profile is simply a wrapper class that groups Hiera lookups and class declarations into one functional unit. For example, if you wanted Wordpress installed on a machine, you’d probably need to declare the apache class to get Apache setup, declare an apache::vhost for the Wordpress directory, setup a MySQL database with the appropriate classes, and so on. There are a lot of components that go together when you setup a piece of technology, it’s not just a single class.

Because of this, a profile exists to give you a single class you can include that will setup all the necessary bits for that piece of technology (be it Wordpress, or Tomcat, or whatever).

Let’s look at a simple profile for Wordpress:

profiles/manifests/wordpress.pp

class profiles::wordpress {

  ## Hiera lookups
  $site_name               = hiera('profiles::wordpress::site_name')
  $wordpress_user_password = hiera('profiles::wordpress::wordpress_user_password')
  $mysql_root_password     = hiera('profiles::wordpress::mysql_root_password')
  $wordpress_db_host       = hiera('profiles::wordpress::wordpress_db_host')
  $wordpress_db_name       = hiera('profiles::wordpress::wordpress_db_name')
  $wordpress_db_password   = hiera('profiles::wordpress::wordpress_db_password')
  $wordpress_user          = hiera('profiles::wordpress::wordpress_user')
  $wordpress_group         = hiera('profiles::wordpress::wordpress_group')
  $wordpress_docroot       = hiera('profiles::wordpress::wordpress_docroot')
  $wordpress_port          = hiera('profiles::wordpress::wordpress_port')

  ## Create user
  group { 'wordpress':
    ensure => present,
    name   => $wordpress_group,
  }
  user { 'wordpress':
    ensure   => present,
    gid      => $wordpress_group,
    password => $wordpress_user_password,
    name     => $wordpress_group,
    home     => $wordpress_docroot,
  }

  ## Configure mysql
  class { 'mysql::server':
    root_password => $wordpress_root_password,
  }

  class { 'mysql::bindings':
    php_enable => true,
  }

  ## Configure apache
  include apache
  include apache::mod::php
  apache::vhost { $::fqdn:
    port    => $wordpress_port,
    docroot => $wordpress_docroot,
  }

  ## Configure wordpress
  class { '::wordpress':
    install_dir => $wordpress_docroot,
    db_name     => $wordpress_db_name,
    db_host     => $wordpress_db_host,
    db_password => $wordpress_db_password,
  }
}

Name your profiles according to the technology they setup

Profiles are technology-specific, so you’ll have one to setup wordpress, and tomcat, and jenkins, and…well, you get the picture. You can also namespace your profiles so that you have profiles::ssh::server and profiles::ssh::client if you want. You can even have profiles::jenkins::tomcat and profiles::jenkins::jboss or however you need to namespace according to the TECHNOLOGIES you use. You don’t need to include your environment in the profile name (a la profiles::dev::tomcat) as the bits of data that make the dev environment different from production should come from HIERA, and thus aren’t going to be different on a per-profile basis. You CAN setup profiles according to your business unit if multiple units use Puppet and have different setups (a la security::profiles::tomcat versus ops::profiles::tomcat), but the GOAL of Puppet is to have one main set of modules that every group uses (and the Hiera data being different for every group). That’s the GOAL, but I’m pragmatic enough to understand that not everywhere is a shiny, happy ‘DevOps Garden.’

Do all Hiera lookups in the profile

You’ll see that I declared variables and set their values with Hiera lookups. The profile is the place for these lookups because the profile collects all external data and declares all the classes you’ll need. In reality, you’ll USUALLY only see profiles looking up parameters and declaring classes (i.e. declaring users and groups like I did above will USUALLY be left to component classes).

I do the Hiera lookups first to make it easy to debug from where those values came. I don’t rely on ‘Automatic Parameter Lookup’ in Puppet 3.x.x because it can be ‘magic’ for people who aren’t aware of it (for people new to Puppet, it’s much easier to see a function call and trace back what it does rather than experience Puppet doing something unseen and wondering what the hell happened).

Finally, you’ll notice that my Hiera lookups have NO DEFAULT VALUES – this is BY DESIGN! For most people, their Hiera data is PROBABLY located in a separate repository as their Puppet module data. Imagine making a change to your profile to have it lookup a bit of data from Hiera, and then imagine you FORGOT to put that data into Hiera. What happens if you provide a default value to Hiera? The catalog compiles, that default value gets passed down to the component module, and gets enforced on disk. If you have good tests, you MIGHT see that the component you configured has a bit of data that’s not correct, but what if you don’t have a great post-Puppet testing workflow? Puppet will correctly set this default value, according to Puppet everything is green and worked just fine, but now your component is setup incorrectly. That’s one of the WORST failures – the ones that you don’t catch. Now, imagine you DON’T provide a default value. In THIS case, Puppet will raise a compilation error because a Hiera lookup didn’t return a value. You’ll catch your error before anything gets pushed to Production and you can catch the screwup. This is a MUCH better solution.

Use parameterized class declarations and explicitly pass values you care about

The parameterized class declaration syntax can be dangerous. The difference between the include function and the parameterized class syntax is that the include function is idempotent. You can do the following in a Puppet manifest, and Puppet doesn’t raise an error:

include apache
include apache
include apache

This is because the include function checks to see if the class is in the catalog. If it ISN’T, then it adds it. If it IS, then it exits cleanly. The include function is your pal.

Consider THIS manifest:

class { 'apache': }
include apache
include apache

Does this work? Yep. The parameterized class syntax adds the class to the catalog, the include function detects this and exits cleanly twice. What about THIS manifest:

include apache
class { 'apache': }
include apache

Does THIS work? Nope! Puppet raises a compilation error because a class was declared more than once in a catalog. Why? Well, consider that Puppet is ‘declarative’…all the way up until it isn’t. Puppet’s PARSER reads from the top of the file to the bottom of the file, and we have a single-pass parser when it comes to things like setting variables and declaring classes. When the parser hits the first include function, it adds the class to the catalog. The parameterized class syntax, however, is a honey badger: it doesn’t give a shit. It adds a class to the catalog regardless of whether it already exists or not. So why would we EVER use the parameterized class declaration syntax? We need to use it because the include function doesn’t allow you to pass parameters when you declare a class.

So wait – why did I spend all this time explaining why the parameterized class syntax is more dangerous than the include function ONLY to recommend its use in profiles? For two reasons:

We need to use it to pass parameters to classes
We’re wrapping its use in a class that we can IN TURN declare with the include function

Yes, we can get the best of BOTH worlds, the ability to pass parameters and the use of our pal the include function, with this wrapper class. We’ll see the latter usage when we come to roles, but for now let’s focus on passing parameter values.

In the first section, we set variables with Hiera lookups, now we can pass those variables to classes we’re declaring with the parameterized class syntax. This allows the declaration of the class to be static, but the parameters we pass to that class to change according to the Hiera hierarchy. We’ve explicitly called the hiera function, so it makes it easier to debug, and we’re explicitly passing parameter values so we know definitively which parameters are being passed (and thus are overriding default values) to the component module. Finally, since our component modules do NOT use Hiera at all, we can be sure that if we’re not passing a parameter that it’s getting its value from the default set in the module’s ::params class.

Everything we do here is meant to make things easier to debug when it’s 3am and things aren’t working. Any asshole can do crazy shit in Puppet, but a seasoned sysadmin writes their code for ease of debugging during 3am pages.

An annoying Puppet bug – top-level class declarations and profiles

Oh, ticket 2053, how terrible are you? This is one of those bug numbers that I can remember by heart (like 8040 and 86). Puppet has the ability to do ‘relative namespacing’, which allows you to declare a variable called $port in a class called $apache and refer to it as $port instead of fully-namespacing the variable, and thus having to call it $apache::port inside the apache class. It’s a shortcut – you can STILL refer to the variable as $apache::port in the class – but it comes in handy. The PROBLEM occurs when you create a profile, as we did above, called profiles::wordpress and you try to declare a class called wordpress. If you do the following inside the profiles::wordpress class, what class is being declared:

include wordpress

If you think you’re declaring a wordpress class from within a wordpress module in your Puppet modulepath, you would be wrong. Puppet ACTUALLY thinks you’re trying to declare profiles::wordpress because you’re INSIDE the profiles::wordpress class and it’s doing relative namespacing (i.e. in the same way you refer to $port and ACTUALLY mean $apache::port it thinks you’re referring to wordpress and ACTUALLY mean profiles::wordpress.

Needless to say, this causes LOTS of confusion.

The solution here is to declare a class called ::wordpress which tells Puppet to go to the top-level namespace and look for a module called wordpress which has a top-level class called wordpress. It’s the same reason that we refer to Facter Fact values as $::osfamily instead of $osfamily in class definitions (because you can declare a local variable called $osfamily in your class). This is why in the profile above you see this:

class { '::wordpress':
  install_dir => $wordpress_docroot,
  db_name     => $wordpress_db_name,
  db_host     => $wordpress_db_host,
  db_password => $wordpress_db_password,
}

When you use profiles and roles, you’ll need to do this namespacing trick when declaring classes because you’re frequently going to have a profile::<sometech> that will declare the <sometech> top-level class.

Roles: business-specific wrapper classes

How do you refer to your machines? When I ask you about that cluster over there, do you say “Oh, you mean the machines with java 1.6, apache, mysql, etc…”? I didn’t think so. You usually have names for them, like the “internal compute cluster” or “app builder nodes” or “DMZ repo machines” or whatever. These names are your Roles. Roles are just the mapping of your machine’s names to the technology that should be ON them. In the past we had descriptive hostnames that afforded us a code for what the machine ‘did’ – roles are just that mapping for Puppet.

Roles are namespaced just like profiles, but now it’s up to your organization to fill in the blanks. Some people immediately want to put environments into the roles (a la roles::uat::compute_cluster), but that’s usually not necessary (as MOST LIKELY the compute cluster nodes have the SAME technology on them when they’re in dev versus when they’re in prod, it’s just the DATA – like database names, VIP locations, usernames/passwords, etc – that’s different. Again, these data differences will come from Hiera, so there should be no reason to put the environment name in your role). You still CAN put the environment name in the role if it makes you feel better, but it’ll probably be useless.

Roles ONLY include profiles

So what exactly is in the role wrapper class? That depends on what technology is on the node that defines that role. What I can tell you for CERTAIN is that roles should ONLY use the include function and should ONLY include profiles. What does this give us? This gives us our pal the include function back! You can include the same profile 100 times if you want, and Puppet only puts it in the catalog once.

Every node is classified with one role. Period.

The beautiful thing about roles and profiles is that the GOAL is that you should be able to classify a node with a SINGLE role and THAT’S IT. This makes classification simple and static – the node gets its role, the role includes profiles, profiles call out to Hiera for data, that data is passed to component modules, and away we go. Also, since classification is static, you can use version control to see what changes were introduced to the role (i.e. what profiles were added or removed). In my opinion, if you need to apply more than one role to a node, you’ve introduced a new role (see below).

Roles CAN use inheritance…if you like

I’ve seen people implement roles a couple of different ways, and one of them is to use inheritance to build a catalog. For example, you can define a base roles class that includes something like a base security profile (i.e. something that EVERY node in your infrastructure should have). Moving down the line, you COULD namespace according to function like roles::app for your application server machines. The roles::app class could inherit from the roles class (which gets the base security profile), and could then include the profiles necessary to setup an application server. Next, you could subclass down to roles::app::site_foo for an application server that supports some site in your organization. That class inherits from the roles::app class, and then adds profiles that are specific to that site (maybe they use Jboss instead of Tomcat, and thus that’s where the differentiation occurs). This is great because you don’t have a lot of repeated use of the include function, but it also makes it hard to definitively look at a specific role to see exactly what’s being declared (i.e. all the profiles). You have to weigh what you value more: less typing or greater visibility. I will err on the side of greater visibility (just due to that whole 3am outage thing), but it’s up to you to decide what to optimize for.

A role similar, yet different, from another role is: a new role

EVERYBODY says to me “Gary, I have this machine that’s an AWFUL LOT like this role over here, but…it’s different.” My answer to them is: “Great, that’s another role.” If the thing that’s different is data (i.e. which database to connect to, or what IP address to route traffic through), then that difference should be put in HIERA and the classification should remain the same. If that difference is technology-specific (i.e. this server uses JBoss instead of Tomcat) then first look and see if you can isolate how you know this machine is different (maybe it’s on a different subnet, maybe it’s at a different location, something like that). If you can figure that out and write a Fact for it (or use similar conditional logic to determine this logically), then you can just drop that conditional logic in your role and let it do the heavy lifting. If, in the end, this bit of data is totally arbitrary, then you’ll need to create another role (perhaps a subclass using the above namespacing) and assign it to your node.

The hardest thing about this setup is naming your roles. Why? Every site is different. It’s hard for me to account for differences in your setup because your workplace is dysfunctional (seriously).

Review: what does this get you?

Let’s walk through every level of this setup from the top to the bottom and see what it gets you. Every node is classified to a single role, and, for the most part, that classification isn’t going to change. Now you can take all the extra work off your classifier tool and put it back into the manifests (that are subject to version control, so you can git blame to your heart’s content and see who last changed the role/profile). Each role is going to include one or more profile, which gives us the added idempotent protection of the include function (of course, if profiles have collisions with classes you’ll have to resolve those. Say one or more profiles tries to include an apache class – simply break that component out into a separate profile, extract the parameters from Hiera, and include that profile at a higher level). Each profile is going to do Hiera lookups which should give you the ability to provide different data for different host types (i.e. different data on a per-environment level, or however you lay out your Hiera hierarchy), and that data will be passed directly to class that is declared. Finally, each component module will accept parameters as variables internal to that module, default parameters/variables to sane values in the ::params class, and use those variables when declaring each resource throughtout its classes.

Roles abstract profiles
Profiles abstract component modules
Hiera abstracts configuration data
Component modules abstract resources
Resources abstract the underlying OS implementation

Choose your level of comfortability

The roles and profiles pattern also buys you something else – the ability for less-skilled and more-skilled Puppet users to work with the same codebase. Let’s say you use some GUI classifier (like the Puppet Enterprise Console), someone who’s less skilled at Puppet looks and sees that a node is classified with a certain role, so they open the role file and see something like this:

include profiles::wordpress
include profiles::tomcat
include profiles::git::repo_server

That’s pretty legible, right? Someone who doesn’t regularly use Puppet can probably make a good guess as to what’s on the machine. Need more information? Open one of the profiles and look specifically at the classes that are being declared. Need to know the data being passed? Jump into Hiera. Need to know more information? Dig into each component module and see what’s going on there.

When you have everything abstracted correctly, you can have developers providing data (like build versions) to Hiera, junior admins grouping nodes for classification, more senior folk updating profiles, and your best Puppet people creating/updating component modules and building plugins like custom facts/functions/whatever.

Great! Now go and refactor…

If you’ve used Puppet for more than a month, you’re probably familiar with the “Oh shit, I should have done it THAT way…let me refactor this” game. I know, it sucks, and we at Puppet Labs haven’t been shy of incorporating something that we feel will help people out (but will also require some refactoring). This pattern, though, has been in use by the Professional Services team at Puppet Labs for over a year without modification. I’ve used this on sites GREAT and small, and every site with which I’ve consulted and implemented this pattern has been able to both understand its power and derive real value within a week. If you’re contemplating a refactor, you can’t go wrong with Roles and Profiles (or whatever names you decide to use).

Building a Functional Puppet Workflow Part 1: Module Structure

Feb 17th, 2014

Working as a professional services engineer for Puppet Labs, my life consists almost entirely of either correcting some of the worst code atrocities you’ve seen in your life, or helping people get started with Puppet so that they don’t need to call us again due to: A.) Said code atrocities or B.) Refactor the work we JUST helped them start. It wasn’t ALWAYS like this – I can remember some of my earliest gigs, and I almost feel like I should go revisit them if only to correct some of the previous ‘best practices’ that didn’t quite pan out.

This would be exactly why I’m wary of ‘Best Practices’ – because one person’s ‘Best Practice’ is another person’s ‘What the fuck did you just do?!’

Having said that, I’m finding myself repeating a story over and over again when I train/consult, and that’s the story of ‘The Usable Puppet Workflow.’ Everybody wants to know ‘The Right Way™’, and I feel like we finally have a way that survives a reasonable test of time. I’ve been promoting this workflow for over a year (which is a HELL of a long time in Startup time), and I’ve yet to really see an edge case it couldn’t handle.

(If you’re already savvy: yes, this is the Roles and Profiles talk)

I’ll be breaking this workflow down into separate blog posts for every component, and, as always, your comments are welcome…

It all starts with the component module

The first piece of a functional Puppet deployment starts with what we call ‘component modules’. Component modules are the lowest level in your deployment, and are modules that configure specific pieces of technology (like apache, ntp, mysql, and etc…). Component modules are well-encapsulated, have a reasonable API, and focus on doing small, specific things really well (i.e. the *nix way).

I don’t want to write thousands of words on building component modules because I feel like others have done this better than I. As examples, check out RI’s Post on a simple module structure, Puppet Labs’ very own docs on the subject, and even Alessandro’s Puppetconf 2012 session. Instead, I’d like to provide some pointers on what I feel makes a good component module, and some ‘gotchas’ we’ve noticed.

Parameters are your API

In the current world of Puppet, you MUST define the parameters your module will accept in the Puppet DSL. Also, every parameter MUST ultimately have a value when Puppet compiles the catalog (whether by explicitly passing this parameter value when declaring the class, or by assuming a default value). Yes, it’s funny that, when writing a Puppet class, if you typo a VARIABLE Puppet will not alert you to this (in a NON use strict-ian sort of approach) and will happily accept a variable in an undefined state, but the second you don’t pass a value to your class parameter you’re in for a rude compilation error. This is the way of Puppet classes at the time of this writing, so you’re going to see Puppet classes with LINES of defined parameters. I expect this to change in the future (please let this change in the near future), but for now, it’s a necessary evil.

The parameters you expose to your top-level class (i.e. given class names like apache and apache::install, I’m talking specifically about apache) should be treated as an API to your module. IDEALLY, they’re the ONLY THING that a user needs to modify when using your module. Also, whenever possible, it should be the case that a user need ONLY interact with the top-level class when using your module (of course, defined resource types like apache::vhost are used on an ad-hoc basis, and thus are the exception here).

Inherit the `::params` class

We’re starting to make enemies at this point. It’s been a convention for modules to use a ::params class to assign values to all variables that are going to be used for all classes inside the module. The idea is that the ::params class is the one-stop-shop to see where a variable is set. Also, to get access to a variable that’s set in a Puppet class, you have to declare the class (i.e. use the include() function or inherit from that class). When you declare a class that has both variables AND resources, those resources get put into the catalog, which means that Puppet ENFORCES THE STATE of those resources. What if you only needed a variable’s value and didn’t want to enforce the rest of the resources in that class? There’s no good way in Puppet to do that. Finally, when you inherit from a class in Puppet that has assigned variable values, you ALSO get access to those variables in the parameter definition section of your class (i.e. the following section of the class:

class apache (
  $port = $apache::params::port,
  $user = $apache::params::user,
) inherits apache::params {

See how I set the default value of $apache::port to $apache::params::port? I could only access the value of the variable $apache::params::port in that section by inheriting from the apache::params class. I couldn’t insert include apache::params below that section and be allowed access to the variable up in the parameter defaults section (due to the way that Puppet parses classes).

FOR THIS REASON, THIS IS THE ONLY RECOMMENDED USAGE OF INHERITANCE IN PUPPET!

We do NOT recommend using inheritance anywhere else in Puppet and for any other reason because there are better ways to achieve what you want to do INSTEAD of using inheritance. Inheritance is a holdover from a scarier, more lawless time.

NOTE: Data in Modules – There’s a ‘Data in Modules’ pattern out there that attempts to eliminate the ::params class. I wrote about it in a previous post, and I recommend you read that post for more info (it’s near the bottom).

Do NOT do Hiera lookups in your component modules!

This is something that’s really only RECENTLY been pushed. When Hiera was released, we quickly recognized that it would be the answer to quite a few problems in Puppet. In the rush to adopt Hiera, many people started adding Hiera calls to their modules, and suddenly you had ‘Hiera-compatible’ modules out there. This caused all kinds of compatibility problems, and it was largely because there wasn’t a better module structure and workflow by which to integrate Hiera. The pattern that I’ll be pushing DOES INDEED use Hiera, BUT it confines all Hiera calls to a higher-level wrapper class we call a ‘profile’. The reasons for NOT using Hiera in your module are:

By doing Hiera calls at a higher level, you have a greater visibility on exactly what parameters were set by Hiera and which were set explicitly or by default values.
By doing Hiera calls elsewhere, your module is backwards-compatible for those folks who are NOT using Hiera

Remember – your module should just accept a value and use it somewhere. Don’t get TOO smart with your component module – leave the logic for other places.

Keep your component modules generic

We always get asked “How do I know if I’m writing a good module?” We USED to say “Well, does it work?” (and trust me, that was a BIG hurdle). Now, with data separation models out there like Hiera, I have a couple of other questions that I ask (you know, BEYOND asking if it compiles and actually installs the thing it’s supposed to install). The best way I’ve found to determine if your module is ‘generic enough’ is if I asked you TODAY to give me your module, would you give it to me, or would you be worried that there was some company-specific data locked in there? If you have company-specific data in your module, then you need to refactor the module, store the data in Hiera, and make your module more generic/reusable. Also, does your module focus on installing one piece of technology, or are you declaring packages for shared libraries or other components (like gcc, apache, or other common components)? You’re not going to win any prizes for having the biggest, most monolithic module out there. Rather, if your module is that large and that complex, you’re going to have a hell of a time debugging it. Err on the side of making your modules smaller and more task-specific. So what if you end up needing to declare 4 classes where you previously declared 1? In the roles and profiles pattern we will show you in the next blog post, you can abstract that away ANYHOW.

Don’t play the “what if” game

I’ve had more than a couple of gigs where the customer says something along the lines of “What if we need to introduce FreeBSD/Solaris/etc… nodes into our organization, shouldn’t I account for them now?” This leads more than a few people down a path of entirely too-complex modules that become bulky and unwieldy. Yes, your modules should be formatted so that you can simply add another case in your ::params class for another OS’s parameters, and yes, your module should be formatted so that your ::install or ::config class can handle another OS, but if you currently only manage Redhat, and you’ve only EVER managed Redhat, then don’t start adding Debian parameters RIGHT NOW just because you’re afraid you might inherit Ubuntu machines. The goal of Puppet is to automate the tasks that eat up the MAJORITY of your time so you can focus on the edge cases that really demand your time. If you can eventually automate those edge cases, then AWESOME! Until then, don’t spend the majority of your time trying to automate the edge cases only to drown under the weight of deadlines from simple work that you COULD have already automated (but didn’t, because you were so worried about the exceptions)!

Store your modules in version control

This should go without saying, but your modules should be stored in version control (a la git, svn, hg, whatever). We tend to prefer git due to its lightweight branching and merging (most of our tooling and solutions will use git because we’re big git users), but you’re free to use whatever you want. The bigger question is HOW to store your modules in version control. There are usually two schools of thought:

One repository per module
All modules in a single repository

Each model has its pros and cons, but we tend to recommend one module per repository for the following reasons:

Individual repos mean individual module development histories
Most VCS solutions don’t have per-folder ACLs for a single repositories; having multiple repos allows per-module security settings.
With the one-repository-per-module solution, modules you pull down from the Forge (or Github) must be committed to your repo. Having multiple repositories for each module allow you to keep everything separate

NOTE: This becomes important in the third blog post in the series when we talk about moving changes to each Puppet Environment, but it’s important to introduce it NOW as a ‘best practice’. If you use our recommended module/environment solution, then one-module-per-repo is the best practice. If you DON’T use our solution, then the single repository per for all modules will STILL work, but you’ll have to manage the above issues. Also note that even if you currently have every module in a single repository, you can STILL use our solution in part 3 of the series (you’ll just need to perform a couple of steps to conform).

Best practices are shit

In general, ‘best practices’ are only recommended if they fit into your organizational workflow. The best and worst part of Puppet is that it’s infinitely customizable, so ‘best practices’ will invariably be left wanting for a certain subset of the community. As always, take what I say under consideration; it’s quite possible that I could be entirely full of shit.

Seriously, What Is This Provider Doing?

Dec 15th, 2013

Clarke’s third law states: “Any sufficiently advanced technology is indistinguishable from magic.” In the case of Ruby and Puppet provider interaction, I’m inclined to believe it. If you want proof, take a look at some of the native Puppet types – no amount of ‘Expecto Patronum’ will free you from the Ruby metaprogramming dementors that hover around lib/puppet/provider/exec-land.

In my first post tackling Puppet types and providers, I introduced the concept of Puppet types and the utility they provide. In the second post, I brought you to the great plain of Puppet providers and introduced the core methods necessary for creating a very basic Puppet provider with a single property (HINT: if you’ve not read either of those posts, or you’ve never dealt with basic types and providers, you might want to stop here and read up a bit on the topics). The problems with a provider like the one created in that post were:

puppet resource support wasn’t implemented, so you couldn’t query for existing instances of the type on the system (and their corresponding values)
The getter method would be called for EVERY instance of the type on the system, which would mean shelling-out multiple times during a run
Ditto for the setter method (if changes to multiple instances of the type were necessary)
That type was VERY basic (i.e. ensurable with a single property)

Unfortunately, when most of us have the need of a Puppet type and provider, we usually require multiple properties and reasonably complex system interaction. When it comes to creating both a getter and a setter method for every property (including the potential performance hit that could come from shelling-out many times during a Puppet run), ain’t nobody got time for that. And finally, puppet resource is a REALLY handy tool for querying the current state of your resources on a system. These problems all have solutions, but up until recently there was just one more problem:

Good luck finding documentation for those solutions.

NOTE: The Puppet Types and Providers book written by Nan and Dan is a great resource that provides a bit of a deeper dive than I’ll be doing in this post – DO check it out if you want to know more

Something, something, `puppet resource`

The puppet resource command (or ralsh, as it used to be known), is a very handy command for querying a system and returning the current state of resources for a specific Puppet type. Try it out if you never have (note that the following is being run on CentOS 6.4):

[root@linux ~]# puppet resource user
user { 'abrt':
  ensure           => 'present',
  gid              => '173',
  home             => '/etc/abrt',
  password         => '!!',
  password_max_age => '-1',
  password_min_age => '-1',
  shell            => '/sbin/nologin',
  uid              => '173',
}
user { 'adm':
  ensure           => 'present',
  comment          => 'adm',
  gid              => '4',
  groups           => ['sys', 'adm'],
  home             => '/var/adm',
  password         => '*',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/sbin/nologin',
  uid              => '3',
}
< ... and more users below ... >

The puppet resource command returns a list of all users on the system and their current property values (note you can only see the password hash if you’re running Puppet with sufficient privileges). You can even query puppet resource for the values of a specific resource:

[root@gary ~]# puppet resource user glarizza
user { 'glarizza':
  ensure           => 'present',
  gid              => '502',
  home             => '/home/glarizza',
  password         => '$1$hsUuCygh$kgLKG5epuRaXHMX5KmxrL1',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/bin/bash',
  uid              => '502',
}

puppet resource seems magical, and you might think that if you create a custom type and sync it to your machine then puppet resource will automatically work for you.

And you would be wrong.

puppet resource will only work if you’ve implemented a special method in your provider called self.instances.

`self.instances`

The self.instances method is pretty sparsely documented, so let’s go straight to the source…code, that is:

lib/puppet/provider.rb

  # Returns a list of system resources (entities) this provider may/can manage.
  # This is a query mechanism that lists entities that the provider may manage on a given system. It is
  # is directly used in query services, but is also the foundation for other services; prefetching, and
  # purging.
  #
  # As an example, a package provider lists all installed packages. (In contrast, the File provider does
  # not list all files on the file-system as that would make execution incredibly slow). An implementation
  # of this method should be made if it is possible to quickly (with a single system call) provide all
  # instances.
  #
  # An implementation of this method should only cache the values of properties
  # if they are discovered as part of the process for finding existing resources.
  # Resource properties that require additional commands (than those used to determine existence/identity)
  # should be implemented in their respective getter method. (This is important from a performance perspective;
  # it may be expensive to compute, as well as wasteful as all discovered resources may perhaps not be managed).
  #
  # An implementation may return an empty list (naturally with the effect that it is not possible to query
  # for manageable entities).
  #
  # By implementing this method, it is possible to use the `resources´ resource type to specify purging
  # of all non managed entities.
  #
  # @note The returned instances are instance of some subclass of Provider, not resources.
  # @return [Array<Puppet::Provider>] a list of providers referencing the system entities
  # @abstract this method must be implemented by a subclass and this super method should never be called as it raises an exception.
  # @raise [Puppet::DevError] Error indicating that the method should have been implemented by subclass.
  # @see prefetch
  def self.instances
    raise Puppet::DevError, "Provider #{self.name} has not defined the 'instances' class method"
  end

You’ll find that method around lines 348 – 377 of the lib/puppet/provider.rb file in Puppet’s source code (as of this writing, which is a Friday… on a flight from DC to Seattle). To summarize, implementing self.instances in your provider means that you need to return an array of provider instances that have been discovered on the current system and all the current property values (we call these values the ‘is’ values for the properties, since each value IS the current value of the property on the system). It’s recommended to only implement self.instances if you can gather all resource property values in a reasonably ‘cheap’ manner (i.e. a single system call, read from a single file, or some similar low-IO means). Implementing self.instances not only gives you the ability to run puppet resource (which also affords you a quick-and-dirty way of testing your provider without creating unit tests by simply running puppet resource in debug mode and checking the output), but it also allows the ‘resources’ resource to work its magic (If you’ve never heard of the ‘resources’ resource, check this link for more information on this terribly/awesomely named resource type).

An important note about scope and `self.instances`

The self.instances method is a method of the PROVIDER, which is why it is prefixed with self. Even though it may be located in the provider file itself, and even though it sits among other methods like create, exists?, and destroy (which are methods of the INSTANCE of the provider), it does NOT have the ability to directly access or call those methods. It DOES have the ability to access other methods of the provider directly (i.e. other methods prefixed with self.). This means that if you were to define a method like:

def self.proxy_type
  'web'
end

You could access that directly from self.instances by simply calling it:

type_of_proxy = proxy_type()

Let’s say you had a method of the INSTANCE of the provider, like so:

def system_type
  'OS X'
end

You COULD NOT access this method from self.instances directly (there are always hacky ways around EVERYTHING in Ruby, sure, but there is no easy/straightforward way to access this method).

And here’s where it gets confusing…

Methods of the INSTANCE of the provider CAN access provider methods directly. Given our previous example, what if the system_type method wanted to access self.proxy_type for some reason? It could be done like so:

def system_type
  type_of_proxy = self.class.proxy_type()
  'OS X'
end

A method of the instance of the provider can access provider methods by simply calling the class method on itself (which returns the provider object). This is a one-way street for method creation that needs to be heeded when designing your provider.

Building a provider that uses `self.instances` (or: more Mac problems)

In the previous two posts on types/providers, I created a type and provider for managing bypass domains for network proxies on OS X. For this post, let’s create a provider for actually MANAGING the proxy settings for a given network interface. Here’s a quick type for managing a web proxy on a network interface on OS X:

puppet-mac_proxy/lib/puppet/type/mac_web_proxy.rb

Puppet::Type.newtype(:mac_web_proxy) do
  desc "Puppet type that models a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
    munge do |value|
      value.downcase
    end
    def insync?(is)
      is.downcase == should.downcase
    end
  end

  newproperty(:proxy_server) do
    desc "Proxy Server setting for the interface"
  end

  newparam(:authenticated_username) do
    desc "Username for proxy authentication"
  end

  newparam(:authenticated_password) do
    desc "Password for proxy authentication"
  end

  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end
end

This type has three properties, is ensurable, and a namevar called ‘name’. As for the provider, let’s start with self.instances and get the web proxy values for all interfaces. To do that we’re going to need to know how to get a list of all network interfaces, and also how to get the current proxy state for every interface. Fortunately, both of those tasks are accomplished with the networksetup binary:

▷ networksetup -listallnetworkservices
An asterisk (*) denotes that a network service is disabled.
Bluetooth DUN
Display Ethernet
Ethernet
FireWire
Wi-Fi
iPhone USB
Bluetooth PAN

▷ networksetup -getwebproxy Ethernet
Enabled: No
Server: proxy.corp.net
Port: 1234
Authenticated Proxy Enabled: 0

Cool, so one binary will do both tasks and they’re REASONABLY low-cost to run.

Helper methods

To keep things separated and easier to test, let’s create separate helper methods for each task. Since these methods are going to be called by self.instances, they will be provider methods.

The first method will simply return an array of network interfaces:

def self.get_list_of_interfaces
  interfaces = networksetup('-listallnetworkservices').split("\n")
  interfaces.shift
  interfaces.sort
end

Remember from above that the networksetup -listallnetworkservices command returns an info line before each interface, so this code strips that line off and returns a sorted list of interfaces based on a one-line-per-interface assumption.

The next method we need will accept a network interface name as an argument, will run the networksetup -getwebproxy (interface) command, and will use its output to return all the current property values (including the ensure value) for every instance of the type on the system (i.e. every interface’s proxy settings and whether the proxy is enabled, which means the resource is ensured as ‘present’, or disabled, which means the resource is ensured as ‘absent’.

def self.get_proxy_properties(int)
  interface_properties = {}

  begin
    output = networksetup(['-getwebproxy', int])
  rescue Puppet::ExecutionFailure => e
    raise Puppet::Error, "#mac_web_proxy tried to run `networksetup -getwebproxy #{int}` and the command returned non-zero. Failing here..."
  end

  output_array = output.split("\n")
  output_array.each do |line|
    line_values = line.split(':')
    line_values.last.strip!
    case line_values.first
    when 'Enabled'
      interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
    when 'Server'
      interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
    when 'Port'
      interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
    when 'Authenticated Proxy Enabled'
      interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
    end
  end

  interface_properties[:provider] = :ruby
  interface_properties[:name]     = int.downcase
  interface_properties
end

A couple of notes on the method itself – first, the networksetup command must exit zero on success or non-zero on failure (which it does). If ever the networksetup command were to return non-zero, we’re raising our own Puppet::Error, documenting what happened, and bailing out.

This method is going to return a hash of properties and values that is going to be used by self.instances – so the case statement needs to account for that. HOWEVER you populate that hash is up to you (in my case, I’m checking for specific output that networksetup returns), but make sure that the hash has a value for the :ensure key at the VERY least.

Assembling `self.instances`

Once the helper provider methods have been defined, self.instances becomes reasonably simple:

def self.instances
  get_list_of_interfaces.collect do |int|
    proxy_properties = get_proxy_properties(int)
    new(proxy_properties)
  end
end

Remember that self.instances must return an array of provider instances, and each one of these instances must include the namevar and ensure value at the very least. Since self.get_proxy_properties returns a hash containing all the property ‘is’ values for a resource, declaring a new provider instance is as easy as calling the new() method on the return value of self.get_proxy_properties for every network interface. In the end, the return value of the collect method on get_list_of_interfaces will be an array of provider instances.

Existance, `@property_hash`, and more magical methods

Even though we have assembled a functional self.instances method, we don’t have complete implementation that will work with puppet resource. The problem is that Puppet can’t yet determine the existance of a resource (even though the resource’s ensure value has been set by self.instances). If you were to execute the code with puppet resource mac_web_proxy, you would get the error:

Error: Could not run: No ability to determine if mac_web_proxy exists

To satisfy Puppet, we need to implement an exists?() method for the instance of the provider. Fortunately, we don’t need to re-implement any existing logic and can instead use @property_hash

A `@property_hash` is born…

I’ve omitted one last thing that is borne out of self.instances, and that’s the @property_hash instance variable. @property_hash is populated by self.instances as an instance variable that’s available to methods of the INSTANCE of the provider (i.e. methods that ARE NOT prefixed with self.) containing all the ‘is’ values for a resource. Do you need to get the ‘is’ value for a property? Just use @property_hash[:property_name]. Since the exists? method is a method of the instance of the provider, and it’s essentially the same thing as the ensure value for a resource, let’s implement exists? by doing a check on the ensure value from the @property_hash variable:

def exists?
  @property_hash[:ensure] == :present
end

Perfect, now exists? will return true or false accordingly and Puppet will be satisfied.

Getter methods – the slow way

Puppet may be happy that you have an exists? method, but puppet resource won’t successfully run until you have a method that returns an ‘is’ value for every property of the type (i.e. the proxy_server, proxy_authenticated, and proxy_port attributes for the mac_web_proxy type). These ‘is value methods’ are called ‘getter’ methods: they’re methods of the instance of the provider, and are named exactly the same as the properties they represent.

You SHOULD be thinking: “Hey, we already have @property_hash, why can’t we just use it again? We can, and you COULD implement all the getter methods like so:

def proxy_server
  @property_hash[:proxy_server]
end

If you did that, you would be TECHNICALLY correct, but it would seem to be a waste of lines in a provider (especially if you have many properties).

Getter methods – the quicker ‘method’

Because uncle Luke hated excess lines of code, he made available a method called mk_resource_methods which works very similarly to Ruby’s attr_accessor method. Adding mk_resource_methods to your provider will AUTOMATICALLY create getter methods that pull values out of @property_hash in the similar way that I just demonstrated (it will also create SETTER methods too, but we’ll look at those later). Long story short – don’t make getter/setter methods if you’re using self.instances – just implement mk_resource_methods.

JUST enough for `puppet resource`

Putting everything that we’ve learned up until now, we should have a provider that looks like this:

lib/puppet/provider/mac_web_proxy/ruby.rb

Puppet::Type.type(:mac_web_proxy).provide(:ruby) do
  commands :networksetup => 'networksetup'

  mk_resource_methods

  def self.get_list_of_interfaces
    interfaces = networksetup('-listallnetworkservices').split("\n")
    interfaces.shift
    interfaces.sort
  end

  def self.get_proxy_properties(int)
    interface_properties = {}

    begin
      output = networksetup(['-getwebproxy', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug "#get_proxy_properties had an error -> #{e.inspect}"
      return {}
    end

    output_array = output.split("\n")
    output_array.each do |line|
      line_values = line.split(':')
      line_values.last.strip!
      case line_values.first
      when 'Enabled'
        interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
      when 'Server'
        interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
      when 'Port'
        interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
      when 'Authenticated Proxy Enabled'
        interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
      end
    end

    interface_properties[:provider] = :ruby
    interface_properties[:name]     = int.downcase
    Puppet.debug "Interface properties: #{interface_properties.inspect}"
    interface_properties
  end

  def self.instances
    get_list_of_interfaces.collect do |int|
      proxy_properties = get_proxy_properties(int)
      new(proxy_properties)
    end
  end

  def exists?
    @property_hash[:ensure] == :present
  end
end

Here’s a tree of the module I’ve assembled on my machine:

└(~/src/puppet-mac_web_proxy)▷ tree .
.
└── lib
   └── puppet
       ├── provider
       │   └── mac_web_proxy
       │       └── ruby.rb
       └── type
           └── mac_web_proxy.rb

To test out puppet resource, we need to make Puppet aware of our new custom module. To do that, let’s set the $RUBYLIB environmental variable. $RUBYLIB is queried by Puppet and is added to its load path when looking for additional Puppet plugins. You will need to set $RUBYLIB to the path of the lib directory in the custom module that you’ve assembled. Because my custom module is located in ~/src/puppet-mac_web_proxy, I’m going to set $RUBYLIB like so:

export RUBYLIB=~/src/puppet-mac_web_proxy/lib

You can execute that command from the command line, or set it in your ~/.{bash,zsh}rc and source that file.

Finally, with all the files in place and $RUBYLIB set, it’s time to officially run puppet resource (I’m going to do it in --debug mode to see the debug output that I’ve written into the code):

└(~/src/blogtests)▷ envpuppet puppet resource mac_web_proxy --debug
Debug: Executing '/usr/sbin/networksetup -listallnetworkservices'
Debug: Executing '/usr/sbin/networksetup -getwebproxy Bluetooth DUN'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"bluetooth dun"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Bluetooth PAN'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"bluetooth pan"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Display Ethernet'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>"foo.bar.baz", :proxy_port=>"80", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"display ethernet"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Ethernet'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>"proxy.corp.net", :proxy_port=>"1234", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"ethernet"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy FireWire'
Debug: Interface properties: {:ensure=>:present, :proxy_server=>"stuff.bar.blat", :proxy_port=>"8190", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"firewire"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Wi-Fi'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"wi-fi"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy iPhone USB'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"iphone usb"}
mac_web_proxy { 'bluetooth dun':
  ensure => 'absent',
}
mac_web_proxy { 'bluetooth pan':
  ensure => 'absent',
}
mac_web_proxy { 'display ethernet':
  ensure => 'absent',
}
mac_web_proxy { 'ethernet':
  ensure => 'absent',
}
mac_web_proxy { 'firewire':
  ensure       => 'present',
  proxy_port   => '8190',
  proxy_server => 'stuff.bar.blat',
}
mac_web_proxy { 'iphone usb':
  ensure => 'absent',
}
mac_web_proxy { 'wi-fi':
  ensure => 'absent',
}

Note that you will only see ‘is’ values if you have a proxy set on any of your network interfaces (obviously, if you’ve not setup a proxy, then it will show as ‘absent’ on every interface. You can setup a proxy by opening System Preferences, clicking on the Network icon, choosing an interface from the list on the left, clicking the Advanced button in the lower right corner of the window, clicking the ‘Proxies” tab at the top of the window, clicking the checkbox next to the “Web Proxy (HTTP)” choice, and entering a proxy URL and port. NOW do you get why we automate this bullshit?). Also, your list of network interfaces may not match mine if you have more or less interfaces than I do.

TADA! puppet resource WORKS! ISN’T THAT AWESOME?! WHY AM I TYPING IN CAPS?!

Prefetching, flushing, caching, and other hard shit

Okay, so up until now we’ve implemented one half of the equation – we can query ‘is’ values and puppet resource works. What about using this ‘more efficient’ method of getting values for a type on the OTHER end of the spectrum? What if instead of calling setter methods one-by-one to set values for all resources of a type in a catalog we had a way to do it all at once? Well, such a way exists, and it’s called the flush method…but we’re getting slightly ahead of ourselves. Before we get to flushing, we need to point out that self.instances is ONLY used by puppet resource – THAT’S IT (and it’s only used by self.instances when you GET values from the system, not when you SET values on the system…and if you never knew that puppet resource could actually SET values on the system, well, I guess you got another surprise today). If we want puppet agent or puppet apply to use the behavior that self.instances implements, we need to create another method: self.prefetch

`self.prefetch`

If you thought self.instances didn’t have much documentation, wait until you see self.prefetch. After wading the waters of self.prefetch, I’m PRETTY SURE its implementation might have come to uncle Luke after a long night in Reed’s chem lab where he might have accidently synthesized mescaline.

Let’s look at the codebase:

lib/puppet/provider.rb

# @comment Document prefetch here as it does not exist anywhere else (called from transaction if implemented)
# @!method self.prefetch(resource_hash)
# @abstract A subclass may implement this - it is not implemented in the Provider class
# This method may be implemented by a provider in order to pre-fetch resource properties.
# If implemented it should set the provider instance of the managed resources to a provider with the
# fetched state (i.e. what is returned from the {instances} method).
# @param resources_hash [Hash<{String => Puppet::Resource}>] map from name to resource of resources to prefetch
# @return [void]
# @api public

That’s right, documentation for self.prefetch in the Puppet codebase is 9 lines of comments in lib/puppet/provider.rb, which is awesome. So when is self.prefetch used to provide information to Puppet and when is self.instances used?

Puppet Subcommand	Provider Method	Execution Mode
puppet resource	self.instances	getting values
puppet resource	self.prefetch	setting values
puppet agent	self.prefetch	getting values
puppet agent	self.prefetch	setting values
puppet apply	self.prefetch	getting values
puppet apply	self.prefetch	setting values

This doesn’t mean that self.instances is really only handy for puppet resource – that’s definitely not the case. In fact, frequently you will find that self.instances is used by self.prefetch to do some of the heavy lifting. Even though self.prefetch works VERY SIMILARLY to the way that self.instances works for puppet resource (and by that I mean that it’s going to gather a list of instances of a type on the system, and it’s also going to populate @property_hash for puppet apply, puppet agent, and when when puppet resource is setting values), it’s not an exact one-for-one match with self.instances. The self.prefetch method for a type is called once per run when Puppet encounters a resource of that type in the catalog. The argument to self.prefetch is a hash of all managed resources of that type that are encountered in a compiled catalog for that node (the hash’s key will be the namevar of the resource, and the value will be an instance of Puppet::Type – in this case, Puppet::Type::Mac_web_proxy). Your task is to implement a self.prefetch method that gets an array of instances of the provider that are discovered on the system, iterates through the hash passed to self.prefetch (containing all the resources of the type that were discovered in the catalog), and passes the correct instance of the provider that was discovered on the system to the provider= method of the correct instance of the type that was discovered in the catalog.

What the actual fuck?!

Okay, let’s break that apart to try and discover exactly what’s going on here. Assume that I’ve setup a proxy for the ‘FireWire’ interface on my laptop, and I want to try and manage that resource with puppet apply (i.e. something that uses self.prefetch). The resource in the manifest used to manage the proxy will look something like this:

mac_web_proxy { 'firewire':
  ensure       => 'present',
  proxy_port   => '8080',
  proxy_server => 'proxy.server.org',
}

When self.prefetch is called by Puppet, it’s going to be passed a hash looking something like this:

{ "firewire" => Mac_web_proxy[firewire] }

Because only one resource is encountered in the catalog, only one key/value pair shows up in the hash that’s passed as the argument to self.prefetch.

The job of self.prefetch is to find the current state of Mac_web_proxy['firewire'] on the system, create a new instance of the mac_web_proxy provider that contains the ‘is’ values for the Mac_web_proxy['firewire'] resource, and assign this provider instance as the value of the provider= method to the instance of the mac_web_proxy TYPE that is the VALUE of the ‘firewire’ key of the hash that’s passed to self.prefetch.

No, really, that’s what it’s supposed to do. I’m not even sure what’s real anymore

You’ll remember that self.instances gives us an array of resources that were discovered on the system, so we have THAT part of the implementation written. We also have the hash of resources that were encountered in the catalog – so we have THAT part done too. Our only job is to connect the dots (la la la la), programmatically speaking. This should just about do it:

def self.prefetch(resources)
  instances.each do |prov|
    if resource = resources[prov.name]
      resource.provider = prov
    end
  end
end

I want to make a confession right now – I’ve only ever copied and pasted this code into every provider I’ve ever written that needed self.prefetch implemented. It wasn’t until someone actually asked me what it DID that I had to walk the path of figuring out EXACTLY what it did. Based on the last couple of paragraphs – can you blame me?

This code iterates through the array of resources returned by self.instances, tries to assign a variable resource based on referencing a key in the resources hash using the name of the resource (remember, resources is a hash containing all resources in the catalog), and, if this assignment works (i.e. it isn’t nil, which is what happens when you reference a key in a Ruby hash that doesn’t exist), then we’re calling the provider= method on the instance of the type that was referenced in the resources hash, and passing it the resource that was discovered on the system by self.instances.

Wow.

Why DID we do all of that? We did it all for the @provider_hash. Doing this will populate @provider_hash in all methods of the instance of the provider (i.e. exists?, create, destroy, etc..) just like self.instances did for puppet resource.

Flush it; Ship it

As I alluded to above, the opposite side of the coin to prefetching (which is a way to query the state for all resources at once) is flushing (or specifically the flush method). The flush method is called once per resource whenever the ‘is’ and ‘should’ values for a property differ (and synchronization needs to occur). The flush method does not take the place of property setter methods, but, rather, is used in conjunction with them to determine how to synchronize resource property values. In this vein, it’s a single trigger that can be used to set all property values for an individual resource simultaneously.

There are a couple of strategies for implementing flush, but one of the more popular ones in use is to create an instance variable that will hold values to be synchronized, and then determine inside flush how best to make as-few-as-possible calls to the system to synchronize all the property values for an individual resource.

Our resource type is unique because the networksetup binary that we’ll be using to synchronize values allows us to set most every property value with a single command. Because of this, we really only need that instance variable for one property – the ensure value. But let’s start with the initialization of that instance variable for the flush method:

def initialize(value={})
  super(value)
  @property_flush = {}
end

The initialize method is magic to Ruby – it’s invoked when you instantiate a new object. In our case, we want to create a new instance variable – @property_flush – that will be available to all methods of the instance of the provider. This instance variable will be a hash and will contain all the ‘should’ values that will need to be synchronized for a resource. The super method in Ruby sends a message to the parent of the current object, asking it to invoke a method of the same name (e.g. intialize). Basically, the initialize method is doing the exact same thing as it has always done with one exception – making the instance variable available to all methods of the instance of the provider.

The only ‘setter’ method you need

This provider is going to be unique not only because the networksetup binary will set values for ALL properties, but because to change/set ANY property values you have to change/set ALL the property values at the same time. Typically, you’ll see providers that will need to pass arguments to a binary in order to set individual values. For example, if you had a binary fooset that took arguments of --bar and --baz to set values respectively for bar and baz properties of a resource, you might see the following setter and flush methods for bar and baz:

def bar=(value)
  @property_flush[:bar] = value
end

def baz=(value)
  @property_flush[:baz] = value
end

def flush
  array_arguments = []
  if @property_flush
    array_arguments << '--bar' << @property_flush[:bar] if @property_flush[:bar]
    array_arguments << '--baz' << @property_flush[:baz] if @property_flush[:baz]
  end
  if ! array_arguments.empty?
    fooset(array_arguments, resource[:name])
  end
end

That’s not the case for networksetup – in fact, one of the ONLY places in our code where we’re going to throw a value inside @property_flush is going to be in the destroy method. If our intention is to ensure a proxy absent (or, in this case, disable the proxy for a network interface), then we can short-circuit the method we’re going to create to set proxy values by simply checking for a value in @property_flush[:ensure]. Here’s what the destroy method looks like:

def destroy
  @property_flush[:ensure] = :absent
end

Next, we need a method that will set values for our proxy. This method will handle all interaction to networksetup. So, how do you set proxy values with networksetup?

networksetup -setwebproxy <networkservice> <domain> <port number> <authenticated> <username> <password>

The three properties to our mac_web_proxy type are proxy_port, proxy_server, and proxy_authenticated which map to the ‘<port number>’, ‘<domain>’, and ‘<authenticated>’ values in this command. To change any of these values means we have to pass ALL of these values (again, which is why our flush implementation may be unique from other flush implementations). Here’s what the set_proxy method looks like:

def set_proxy
  if @property_flush[:ensure] == :absent
      networksetup(['-setwebproxystate', resource[:name], 'off'])
      return
  end

  if (resource[:proxy_server].nil? or resource[:proxy_port].nil?)
    raise Puppet::Error, "Proxy types other than 'auto' require both a proxy_server and proxy_port setting"
  end
  if resource[:proxy_authenticated] != :true
    networksetup(
      [
        '-setwebproxy',
        resource[:name],
        resource[:proxy_server],
        resource[:proxy_port]
      ]
    )
  else
    networksetup(
      [
        '-setwebproxy',
        resource[:name],
        resource[:proxy_server],
        resource[:proxy_port],
        'on',
        resource[:authenticated_username],
        resource[:authenticated_password]
      ]
    )
  end
  networksetup(['-setwebproxystate', resource[:name], 'on'])
end

This helper method does all the validation checks for required properties, executes the correct command, and enables the proxy. Now, let’s implement flush:

def flush
  set_proxy

  # Collect the resources again once they've been changed (that way `puppet
  # resource` will show the correct values after changes have been made).
  @property_hash = self.class.get_proxy_properties(resource[:name])
end

The last line re-populates @property_hash with the current resource values, and is necessary for puppet resource to return correct values after it makes a change to a resource during a run.

The final method

We’ve implemented logic to query the state of all resources, to prefetch those states, to make changes to all properties at once, and to destroy a resource if it exists, but we’ve yet to implement logic to CREATE a resource if it doesn’t exist and it should. Well, this is a bit of a lie – the logic is in the code, but we don’t have a create method, so Puppet’s going to complain:

def create
  @property_flush[:ensure] = :present
end

Technically, this method doesn’t have to do a DAMN thing. Why? Remember how the flush method is triggered when a resource’s ‘is’ values differ from its ‘should’ values? Also, remember how the flush method only calls the set_proxy method? And, finally, remember how set_proxy only checks if @property_flush[:ensure] == :absent (and if it doesn’t, then it goes about its merry way running networksetup)? Right, well add these things up and you’ll realize that the create method is essentially meaningless based on our implementation (but if you OMIT create, then Puppet’s going to throw a shit-fit in the shape of of a Puppet::Error exception):

Error: /Mac_web_proxy[firewire]/ensure: change from absent to present failed: Could not set 'present' on ensure: undefined method `create' for Mac_web_proxy[firewire]:Puppet::Type::Mac_web_proxy

So make Puppet happy and write the goddamn create method, okay?

The complete provider:

Wow, that was a wild ride, huh? If you’ve been coding along, you should have created a file that looks something like this:

lib/puppet/provider/mac_web_proxy/ruby.rb

Puppet::Type.type(:mac_web_proxy).provide(:ruby) do
  commands :networksetup => 'networksetup'

  mk_resource_methods

  def initialize(value={})
    super(value)
    @property_flush = {}
  end

  def self.get_list_of_interfaces
    interfaces = networksetup('-listallnetworkservices').split("\n")
    interfaces.shift
    interfaces.sort
  end

  def self.get_proxy_properties(int)
    interface_properties = {}

    begin
      output = networksetup(['-getwebproxy', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug "#get_proxy_properties had an error -> #{e.inspect}"
      return {}
    end

    output_array = output.split("\n")
    output_array.each do |line|
      line_values = line.split(':')
      line_values.last.strip!
      case line_values.first
      when 'Enabled'
        interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
      when 'Server'
        interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
      when 'Port'
        interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
      when 'Authenticated Proxy Enabled'
        interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
      end
    end

    interface_properties[:provider] = :ruby
    interface_properties[:name]     = int.downcase
    Puppet.debug "Interface properties: #{interface_properties.inspect}"
    interface_properties
  end

  def self.instances
    get_list_of_interfaces.collect do |int|
      proxy_properties = get_proxy_properties(int)
      new(proxy_properties)
    end
  end

  def create
    @property_flush[:ensure] = :present
  end

  def exists?
    @property_hash[:ensure] == :present
  end

  def destroy
    @property_flush[:ensure] = :absent
  end

  def self.prefetch(resources)
    instances.each do |prov|
      if resource = resources[prov.name]
        resource.provider = prov
      end
    end
  end

  def set_proxy
    if @property_flush[:ensure] == :absent
        networksetup(['-setwebproxystate', resource[:name], 'off'])
        return
    end

    if (resource[:proxy_server].nil? or resource[:proxy_port].nil?)
      raise Puppet::Error, "Both the proxy_server and proxy_port parameters require a value."
    end
    if resource[:proxy_authenticated] != :true
      networksetup(
        [
          '-setwebproxy',
          resource[:name],
          resource[:proxy_server],
          resource[:proxy_port]
        ]
      )
    else
      networksetup(
        [
          '-setwebproxy',
          resource[:name],
          resource[:proxy_server],
          resource[:proxy_port],
          'on',
          resource[:authenticated_username],
          resource[:authenticated_password]
        ]
      )
    end
    networksetup(['-setwebproxystate', resource[:name], 'on'])
  end

  def flush
    set_proxy

    # Collect the resources again once they've been changed (that way `puppet
    # resource` will show the correct values after changes have been made).
    @property_hash = self.class.get_proxy_properties(resource[:name])
  end
end

Undoubtedly there are better ways to write this Ruby code, no? Also, I’m SURE I have some errors/bugs in that code. It’s those things that keep me in a job…

Final Thoughts

So, I write these posts not to belittle or mock anyone who works on Puppet or wrote any of its implementation (except the amazing/terrifying bastard who came up with self.prefetch). Anybody who contributes to open source and who builds a tool to save some time for a bunch of sysadmins is fucking awesome in my book.

No, I write these posts so that you can understand the ‘WHY’ piece of the puzzle. If you fuck up the ‘HOW’ of the code, you can spend some time in Google and IRB to figure it out, but if you don’t understand the ‘WHY’ then you’re probably not going to even bother.

Also, selfishly, I move from project to project so quickly that it’s REALLY easy to forget both why AND how I did what I did. Posts like these give me someplace to point people when they ask me “What’s self.prefetch?” that ISN’T just the source code or a liquor store.

This isn’t the last post in the series, by the way. I haven’t even TOUCHED on writing unit tests for this code, so that’s going to be a WHOLE other piece altogether. Also, while this provider manages a WEB proxy for a network interface, understand that there are MANY MORE kinds of proxies for OS X network interfaces (including socks and gopher!). A future post will show you how to refactor the above into a parent provider that can be inherited to allow for code re-use among all the proxy providers that I need to create.

As always, you’re more than welcome to comment, ask questions, or simply bitch at me both on this blog as well as on Twitter: @glarizza. Hopefully this post helped you out and you learned a little bit more about how Puppet providers do their dirty work…

Namaste, bitches.

When to Hiera (Aka: How Do I Module?)

Dec 8th, 2013

I’m convinced that writing Puppet modules is the ultimate exercise in bikeshedding: if it works, someone’s probably going to tell you that you could have done it better, if you’re using the methods suggested today, they’re probably going to be out-of-date in about 6 months, and good luck writing something that someone else can use cleanly without needing to change it.

I can help you with the last two.

Data and Code Separation == bliss?

I wrote a blog post about 2 years ago detailing why separating your data from your Puppet code was a good idea. The idea is still valid, which means it’s probably one of the better ideas I’ve ever stolen (Does anyone want any HD-DVDs?). Hunter Haugen and I put together a quick blog post on using Hiera to solve the data/code problem because there wasn’t a great bit of documentation on Hiera at that point in time. Since then, Hiera’s been widely accepted as “a good idea” and is in use in production Puppet environments around the world. In most every environment, usage of Hiera by more than just one person eventually gives birth to the question that inspired this post:

“What the hell does and does NOT belong in Hiera?”

Puppet data models

The params class pattern

Many Puppet modules out there since Puppet 2.6 have begun using this pattern:

puppetlabs-mysql/manifests/server.pp

class mysql::server (
  $config_file             = $mysql::params::config_file,
  $manage_config_file      = $mysql::params::manage_config_file,
  $old_root_password       = $mysql::params::old_root_password,
  $override_options        = {},
  $package_ensure          = $mysql::params::server_package_ensure,
  $package_name            = $mysql::params::server_package_name,
  $purge_conf_dir          = $mysql::params::purge_conf_dir,
  $remove_default_accounts = false,
  $restart                 = $mysql::params::restart,
  $root_group              = $mysql::params::root_group,
  $root_password           = $mysql::params::root_password,
  $service_enabled         = $mysql::params::server_service_enabled,
  $service_manage          = $mysql::params::server_service_manage,
  $service_name            = $mysql::params::server_service_name,
  $service_provider        = $mysql::params::server_service_provider,
  # Deprecated parameters
  $enabled                 = undef,
  $manage_service          = undef
) inherits mysql::params {

  ## Puppet goodness goes here
}

If you’re not familiar, this is a Puppet class definition for mysql::server that has several parameters defined and defaulted to values that come out of the mysql::params class. The mysql::params class looks a bit like this:

puppetlabs-mysql/manifests/params.pp

class mysql::params {
  case $::osfamily {
    'RedHat': {
      if $::operatingsystem == 'Fedora' and (is_integer($::operatingsystemrelease) and $::operatingsystemrelease >= 19 or $::operatingsystemrelease == "Rawhide") {
        $client_package_name = 'mariadb'
        $server_package_name = 'mariadb-server'
      } else {
        $client_package_name = 'mysql'
        $server_package_name = 'mysql-server'
      }
      $basedir             = '/usr'
      $config_file         = '/etc/my.cnf'
      $datadir             = '/var/lib/mysql'
      $log_error           = '/var/log/mysqld.log'
      $pidfile             = '/var/run/mysqld/mysqld.pid'
      $root_group          = 'root'
    }

    'Debian': {
      ## More parameters defined here
    }
  }
}

This pattern puts all conditional logic for all the variables/parameters used in the module inside one class – the mysql::params class. It’s called the ‘params class pattern’ because we suck at naming things.

Pros:

All conditional logic is in a single class
You always know which class to seek out if you need to change any of the logic used to determine a variable’s value
You can use the include function because parameters for each class will be defaulted to the values that came out of the params class
If you need to override the value of a particular parameter, you can still use the parameterized class declaration syntax to do so
Anyone using Puppet version 2.6 or higher can use it (i.e. anyone who’s been using Puppet since about 2010).

Cons:

Conditional logic is repeated in every module
You will need to use inheritance to inherit parameter values in each subclass
It’s another place to look if you ALSO use Hiera inside the module
Data is inside the manifest, so business logic is also inside params.pp

Hiera defaults pattern

When Hiera hit the scene, one of the first things people tried to do was to incorporate it into existing modules. The logic at that time was that you could keep all parameter defaults inside Hiera, rid yourself of the params class, and then just make Hiera calls out for your data. This pattern looks like this:

puppetlabs-mysql/manifests/server.pp

class mysql::server (
  $config_file             = hiera('mysql::params::config_file', 'default value'),
  $manage_config_file      = hiera('mysql::params::manage_config_file', 'default value'),
  $old_root_password       = hiera('mysql::params::old_root_password', 'default value'),
  ## Repeat the above pattern
) {

  ## Puppet goodness goes here
}

Pros:

All data is locked up in Hiera (and its multiple backends)
Default values can be provided if a Hiera lookup fails

Cons:

You need to have Hiera installed, enabled, and configured to use this pattern
All data, including non-business logic, is in Hiera
If you use the default value, data could either come from Hiera OR the default (multiple places to look when debugging)

Hybrid data model

This pattern is for those people who want the portability of the params.pp class combined with the power of Hiera. Because it’s a hybrid, there are multiple ways that people have set it up. Here’s a general example:

puppetlabs-mysql/manifests/server.pp

class mysql::server (
  $config_file             = hiera('mysql::params::config_file', $mysql::params::config_file),
  $manage_config_file      = hiera('mysql::params::manage_config_file', $mysql::params::manage_config_file),
  $old_root_password       = hiera('mysql::params::old_root_password', $mysql::params::old_root_password),
  ## Repeat the above pattern
) inherits mysql::params {

Pros:

Data is sought from Hiera first and then defaulted back to the params class parameter
Keep non-business logic (i.e. OS specific data) in the params class and business logic in Hiera
Added benefits of both models

Cons:

Where did the variable get set – Hiera or the params class? Debugging can be hard
Requires Hiera to be setup to use the module
If you fudge a variable name in Hiera, you get the params class default – see Con #1

Hiera data bindings in Puppet 3.x.x

In Puppet 3.0.0, there was a concept introduced called Data Bindings. This created a federated data model automatically incorporating a Hiera lookup. Previously, the order that Puppet would use to determine the value of a parameter was to first use a value passed with the parameterized class declaration syntax (i.e. the below:).

parameterized class declaration

class { 'apache':
  package_name => 'httpd',
}

If a parameter was not passed with the parameterized class syntax (like the ‘package_name’ parameter above’), Puppet would then look for a default value inside the class definition (i.e. the below:).

parameter default in a class definition

class ntp (
  $ntpserver = 'default.ntpserver.org'
) {
  # Use $ntpserver in a file declaration...
}

If the value of ‘ntpserver’ wasn’t passed with a parameterized class declaration, then the value would be set to ‘default.ntpserver.org’, since that’s the default set in the above class definition.

Failing both of these conditions, Puppet would throw a parse error and say that it couldn’t determine a value for a class parameter.

As of Puppet 3.0.0, Puppet will now do a Hiera lookup for the fully namespaced value of a class parameter

Roles and Profiles

The roles and profiles pattern has been written about a number of times and is ALSO considered to be ‘a best practice’ when setting up your Puppet environment. What roles and profiles gets you is a ‘wrapper class’ that allows you to declare classes within this wrapper class:

profiles/manifests/wordpress.pp

class profiles::wordpress {
  # Data Lookups
  $site_name               = hiera('profiles::wordpress::site_name')
  $wordpress_user_password = hiera('profiles::wordpress::wordpress_user_password')
  $mysql_root_password     = hiera('profiles::wordpress::mysql_root_password')
  $wordpress_db_host       = hiera('profiles::wordpress::wordpress_db_host')
  $wordpress_db_name       = hiera('profiles::wordpress::wordpress_db_name')
  $wordpress_db_password   = hiera('profiles::wordpress::wordpress_db_password')

  ## Create user
  group { 'wordpress':
    ensure => present,
  }
  user { 'wordpress':
    ensure   => present,
    gid      => 'wordpress',
    password => $wordpress_user_password,
    home     => '/var/www/wordpress',
  }

  ## Configure mysql
  class { 'mysql::server':
    root_password => $wordpress_root_password,
  }

  class { 'mysql::bindings':
    php_enable => true,
  }

  ## Configure apache
  include apache
  include apache::mod::php
}

## Continue with declarations...

Notice that any variables that might have business specific logic are set with Hiera lookups. These Hiera lookups do NOT have default values, which means the hiera() function will throw a parse error if a value is not returned. This is IDEAL because we WANT TO KNOW if a Hiera lookup fails – this means we failed to put the data in Hiera and should be corrected BEFORE a state that might contain invalid data is enforced with Puppet.

You then have a ‘Role’ wrapper class that simply includes many of the ‘Profile’ wrapper classes:

roles/manifests/frontend.pp

class roles::frontend {
  include profiles::mysql
  include profiles::apache
  include profiles::java
  include profiles::jboss
  # include more profiles...
}

The idea being that Profiles abstract all the technical bits that need to declared to setup a piece of technology, and Roles will abstract all the business logic for what pieces of technology should be installed on a certain ‘class’ of machine. Basically, you can say that “all our frontend infrastructure should have mysql, apache, java, jboss…”. In this statement, the Role is ‘frontend infrastructure’ and the Profiles are ‘mysql, apache, java, jboss…’.

Pros:

Hiera data lookups are confined to a wrapper class OUTSIDE of the component modules (like mysql, apache, java, etc…)
Data lookups for parameters containing business logic are done with Hiera
Non-business specific data is pulled from the module (i.e. the params class)
Wrapper modules can be ‘included’ with the include function, helping to eliminate multiple class declarations using the parameterized class declaration syntax
Component modules are backward-compatible to Puppet 2.6 while wrapper modules still get to use a modern data lookup mechanism (Hiera)
Component modules do NOT contain any business specific logic, which means they’re portable

Cons:

Hiera must be setup to use the wrapper modules
Wrapper modules add another debug path for variable data
Wrapper modules add another layer of abstraction

Data in Puppet Modules

R.I. Pienaar (the original author of MCollective, Hiera, and much more) published a blog post recently on implementing a folder for Puppet modules that Hiera can traverse when it does data lookups. This construct isn’t new, there was a feature request for this behavior filed in October of 2012 with a subsequent pull request that implemented this functionality (they’re both worth reads for further information). The pull request didn’t get merged, and so R.I. implemented the functionality inside a module on the Puppet Forge. In a nutshell, it’s a hiera.yaml configuration file INSIDE THE MODULE that implements a module-specific hierarchy, and a ‘data’ folder (also inside the module) that allows for individual YAML files that Hiera could read. This hierarchy is consulted AFTER the site-specific hiera.yaml file is read (i.e. /etc/puppet/hiera.yaml or /etc/puppetlabs/puppet/hiera.yaml), and the in-module data files are consulted AFTER the site-specific Hiera data files are read (normally found in either /etc/puppet/hieradata or /etc/puppetlabs/puppet/hieradata).

The argument here is that there’s a data store for SITE-SPECIFIC Hiera data that should be kept outside of modules, but there’s not a MODULE-SPECIFIC data store that Hiera can use. The argument isn’t whether data that should be shared with other people belongs inside a site-specific Hiera datastore (protip: it doesn’t. Data that’s not business-specific should be shared with others and kept inside the module), the argument is that it shouldn’t be locked up inside the DSL where the barrier-to-entry is learning Puppet’s DSL syntax. Whereas /etc/puppet/hiera.yaml or /etc/puppetlabs/puppet/hiera.yaml sets up the hierarchy for all your site-specific data, there’s no per-module hiera.yaml file for all module-specific data, and there’s no place to put module-specific Hiera data.

But module-specific data goes inside the params class and business-specific data goes inside Hiera, right?

Sure, but for some people the Puppet DSL is a barrier. The argument is that there should be a lower barrier to entry to contribute parameter data to Puppet that doesn’t require you to learn the syntax of if/case/selector statements in the Puppet DSL. There’s also the argument that if you want to add support for an operatingsystem to your module, you have to modify the params class file and add another entry to the if/case/selector statement – wouldn’t it be easier to just add another YAML file into a data folder that doesn’t affect existing datafiles?

Great, ANOTHER hierarchy to traverse for data – that’s going to get confusing

Well, think about it right now – most EVERY params class of EVERY module (if it supports multiple operatingsystems) does some sort of conditional logic to determine values for parameters on a per-OS basis. That’s something that you need to traverse. And many modules use different conditional data to determine what paramters to use. Look at the mysql params class example above – it not only splits on $osfamily, but it also checks specific operatingsystems. That’s a conditional inside a conditional. You’re TRAVERSING conditional data right now to find a value – the only difference is that this method doesn’t use the DSL, it uses Hiera and YAML.

Sure, but this is outside of Puppet and you’re losing visibility inside Puppet with your data

You’re already doing that if you’re using the params class. In this case, visibility is moved to YAML files instead of separate Puppet classes.

Setting it up

You will first need to install R.I.’s module from the Puppet Forge. As of this writing, it’s version 0.0.1, so ensure you have the most recent version using the puppet module tool:

[root@linux modules]# puppet module install ripienaar/module_data
Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└── ripienaar-module_data (v0.0.1)

Next, you’ll need to setup a module to use the data-in-modules pattern. Take a look at the tree of a sample module:

[root@linux modules]# tree mysql/
mysql/
├── data
│   ├── hiera.yaml
│   └── RedHat.yaml
└── manifests
    └── init.pp

I created a sample mysql module based on the examples above. All of the module’s Hiera data (including the module-specific hiera.yaml file) goes in the data folder. This module should be placed in Puppet’s modulepath – and if you don’t know where Puppet’s modulepath is set, run the puppet config face to determine that:

[root@linux modules]# puppet config print modulepath
/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules

In my case, I’m putting the module in /etc/puppetlabs/puppet/modules (since I’m running Puppet Enterprise). Here’s the hiera.yaml file from the sample mysql module:

mysql/data/hiera.yaml

:hierarchy:
  - "%{::osfamily}"

I’ve also included a YAML file for the $osfamily of RedHat:

mysql/data/RedHat.yaml

---
mysql::config_file: '/path/from/data_in_modules'
mysql::manage_config_file: true
mysql::old_root_password: 'password_from_data_in_modules'

Finally, here’s what the mysql class definition looks like from manifests/init.pp:

mysql/manifests/init.pp

class mysql (
  $config_file        = 'module_default',
  $manage_config_file = 'module_default',
  $old_root_password  = 'module_default'
) {
  notify { "The value of config_file: ${config_file}": }
  notify { "The value of manage_config_file: ${manage_config_file}": }
  notify { "The value of old_root_password: ${old_root_password}": }
}

Everything should be setup to notify the value of a couple of parameters. Now, to test it out…

Testing data-in-modules

Let’s include the mysql class with puppet apply to see where it’s looking for data:

[root@linux modules]# puppet apply -e 'include mysql'
Notice: The value of config_file: /path/from/data_in_modules
Notice: /Stage[main]/Mysql/Notify[The value of config_file: /path/from/data_in_modules]/message: defined 'message' as 'The value of config_file: /path/from/data_in_modules'
Notice: The value of manage_config_file: true
Notice: /Stage[main]/Mysql/Notify[The value of manage_config_file: true]/message: defined 'message' as 'The value of manage_config_file: true'
Notice: The value of old_root_password: password_from_data_in_modules
Notice: /Stage[main]/Mysql/Notify[The value of old_root_password: password_from_data_in_modules]/message: defined 'message' as 'The value of old_root_password: password_from_data_in_modules'
Notice: Finished catalog run in 0.62 seconds

Since I’m running on an operatingsystem whose family is ‘RedHat’ (i.e. CentOS), you can see that the values of all the parameters were pulled from the Hiera data files inside the module. Let’s temporarily change the $osfamily fact value and see what happens:

[root@linux modules]# FACTER_osfamily=Debian puppet apply -e 'include mysql'
Notice: The value of config_file: module_default
Notice: /Stage[main]/Mysql/Notify[The value of config_file: module_default]/message: defined 'message' as 'The value of config_file: module_default'
Notice: The value of old_root_password: module_default
Notice: /Stage[main]/Mysql/Notify[The value of old_root_password: module_default]/message: defined 'message' as 'The value of old_root_password: module_default'
Notice: The value of manage_config_file: module_default
Notice: /Stage[main]/Mysql/Notify[The value of manage_config_file: module_default]/message: defined 'message' as 'The value of manage_config_file: module_default'
Notice: Finished catalog run in 0.51 seconds

This time, when I specified a value of Debian for $osfamily, the parameter values were pulled from the declaration in the mysql class definition (i.e. from inside mysql/manifests/init.pp).

Testing outside of Puppet

One of the big pros of Hiera is that it comes with the hiera binary that can be run from the command line to test values. This works just fine for site-specific module data that’s defined in the central hiera.yaml file that’s usually defined in /etc/puppet or /etc/puppetlabs/puppet, but the data-in-modules pattern relies on a Puppet indirector to point to the current module’s data folder, and thus (as of right now) there’s not a simple way to run the hiera binary to pull data out of modules WITHOUT running Puppet. This is not a dealbreaker, and doesn’t stop anybody from hacking up something that WILL look inside modules for data, but as of right now it doesn’t yet exist. It also makes debugging for values that come out of modules a bit more difficult.

The scorecard for data-in-modules

Pros:

Parameters are defined in YAML and not Puppet DSL (i.e. you only need to know YAML and not the Puppet DSL)
Adding parameters is as simple as adding another YAML file to the module
Module authors provide module data that can be read by Puppet 3.x.x Hiera data bindings

Cons:

Must be using Puppet 3.0.0 or higher
Additional hierarchy and additional Hiera data file to consult when debugging
Not (currently) an easy/straightforward way to use the hiera binary to test values
Currently depends on a Puppet Forge module being installed on your system

What are you trying to say?

I am ALL ABOUT code portability, re-usability, and not building 500 apache modules. Ever since people have been building modules, they’ve been putting too much data inside modules (to the point where they can’t share them with anyone else). I can’t tell you how many times I’ve heard “We have a module for that, but I can’t share it because it has all our company-specific data in it.”

Conversely, I’ve also seen organizations put EVERYTHING in their site-specific Hiera datastore because “that’s the place for Puppet data.” They usually end up with 15+ levels in their Hiera hierarchies because they’re doing things like this:

hiera.yaml

---
:backends:
  - yaml

:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - "%{osfamily}"
  - "%{osfamily}/%{operatingsystem}"
  - "%{osfamily}/%{operatingsystem}/%{os_version_major}"
  - "%{osfamily}/%{operatingsystem}/%{os_version_minor}"
  # Repeat until you have 15 levels of WTF

This leads us back again to “What does and DOESN’T go in Hiera?” I usually say the following:

Data in site-specific Hiera datastore

Business-specific data (i.e. internal NTP server, VIP address, per-environment java application versions, etc…)
Sensitive data
Data that you don’t want to share with anyone else

Data that does NOT go in the site-specific Hiera datastore

OS-specific data
Data that EVERYONE ELSE who uses this module will need to know (paths to config files, package names, etc…)

Basically, if I ask you if I can publish your module to the Puppet Forge, and you object because it has business-specific or sensitive data in it, then you probably need to pull that data out of the module and put it in Hiera.

The recommendations that I give when I go on-site with Puppet users is the following:

Use Roles/Profiles to create wrapper-classes for class declaration
Do ALL Hiera lookups for site-specific data inside your ‘Profile’ wrapper classes
All module-specific data (like paths to config files, names of packages to install, etc…) should be kept in the module in the params class
All ‘Role’ wrapper classes should just include ‘Profile’ wrapper classes – nothing else

But what about Data in Modules?

I went through all the trouble of writing up the Data in Modules pattern, but I didn’t recommend or even MENTION it in the previous section. The reason is NOT because I don’t believe in it (I actually think the future will be data outside of the DSL inside a Puppet module), the reason is because it’s not YET in Puppet’s core and because it’s not YET been widely tested. If you’re an existing Puppet user that’s been looking for a way to split data outside of the DSL, here is your opportunity. Use the pattern and PLEASE report back on what you like and don’t like about it. The functionality is in a module, so it’s easy to tweak. If you’re new to Puppet and are comfortable with the DSL, then the params class exists and is available to you.

To voice your opinion or to follow the progress of data in modules, follow or comment on this Puppet ticket.

Update

R.I. posted another article on the problem with params.pp that is worth reading. He gives compelling reasons on why he built Hiera, why params.pp WORKS, but also why he believes it’s not the future of Puppet. R.I. goes even further to explain that it’s not necessarily the Puppet DSL that is the barrier to entry, it’s that this sort of data belongs in a file for config data and not INSIDE THE CODE itself (i.e. inside the Puppet DSL). Providing data inside modules gives module authors a way to provide this configuration data in files that AREN’T the Puppet DSL (i.e. not inside the code).

Who Abstracted My Ruby?

Nov 26th, 2013

Previously, on Lost, I said a lot of words about Puppet Types; you should totally check it out. In this second installment, you’re going to find out how to actually throw pure Ruby at Puppet in a way that makes you feel accomplished. And useful. And elitist. Well, possibly just elitist. Either way, read on – there’s much thought-leadership to be done…

In the last post, we learned that Types will essentially dictate the attributes that you’ll be passing in your resource declaration using the DSL. In the simplest and crudest explanation I could muster, types model how your declaration will look in the manifest. Providers are where the actual IMPLEMENTATION happens. If you’ve ever wondered how this:

package { 'httpd':
  ensure => installed,
}

eventually gets turned into this:

yum install -e 0 -d 0 -y httpd

your answer would be “It’s in the provider file”.

Dirty black magic

I’ve seen people do the craziest shit imaginable in the Puppet DSL simply because they’re:

Unsure how types and providers work
Afraid of Ruby
Confused by error messages
Afraid to ask for help

Sometimes you have a problem that can only be solved by interacting with data that’s returned by a binary (using some binary to get a value, and then using that binary to set a value, and so on…). I see people writing defined resource types with a SHIT TON of exec statements and conditional logic to model this data when a type and provider would not only BETTER model the problem but would also be shareable and re-useable by other folk. The issue is that while the DSL is REALLY easy to get started with, types and providers still feel like dirty black magic.

The reason is because they’re dirty black magic.

Hopefully, I can help get you over the hump and onto a working implementation. Let’s take a problem I had last week:

Do this if that, and then be done

I was working with a group who wanted to set a list of domains that would bypass their web proxy for a specific network interface on an OS X workstation. It sounds so simple, because it was. Due to the amount of time I had on-site, I wrote a class with some nasty exec statements, a couple of facts, and some conditional logic because that’s what you do when you’re in a hurry…but it doesn’t make it right. When I left, I hacked up a type and provider, and it’s a GREAT example because you probably have a similar problem. Let’s look at the information we have:

The list of network interfaces:

└▷ networksetup -listallnetworkservices
An asterisk (*) denotes that a network service is disabled.
Bluetooth DUN
Display Ethernet
Ethernet
FireWire
Wi-Fi
iPhone USB
Bluetooth PAN

Getting the list of bypass domains for an interface:

└▷ networksetup -getproxybypassdomains Ethernet
www.garylarizza.com
*.corp.net
10.13.1.3/24

The message displayed when no domains are set for an interface:

└▷ networksetup -getproxybypassdomains FireWire
There aren't any bypass domains set on FireWire.

Setting the list of bypass domains for an interface:

└▷ networksetup -setproxybypassdomains Ethernet '*.corp.net' '10.13.1.3/24' 'www.garylarizza.com'

Perfect – all of that is done with a single binary, and it’s pretty straightforward. Let’s look at the type I ended up creating for this problem:

lib/puppet/type/mac_proxy_bypassdomains.rb

Puppet::Type.newtype(:mac_proxy_bypassdomains) do
  desc "Puppet type that models bypass domains for a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
  end

  newproperty(:domains, :array_matching => :all) do
    desc "Domains which should bypass the proxy"
    def insync?(is)
      is.sort == should.sort
    end
  end
end

The type uses a namevar parameter called ‘name’, which is the name of the network interface. This means that we can set one list of bypass domains for every network interface. There’s a single property, ‘domains’ that accepts an array of domains that should bypass the proxy for the network interface. I’ve overridden the insync? method for the domains property to sort the array values on both ends – this means that the ORDER of the domains doesn’t matter, I only care that the domains specified exist on the system. Finally, the type is ensurable (which means that we can create a list of domains and remove/destroy the list of domains for a network interface).

Setup the provider

Okay, so we’ve defined the problem, seen how to interact with the system to get us the data that we need, setup a type to model the data, and now the last thing left to do is to wire up the provider to make the binary calls we need and return the data we want.

Typos are not your friend.

The first thing you will encounter is “Puppet’s predictable naming pattern” that is used by the Puppet autoloader. Typos are not fun, and omitting a single letter in either the filename or the provider name will render your provider (emotionally) unavailable to Puppet. Our type is called ‘mac_proxy_bypassdomains’, as types are generally named along the lines of ‘what does this data model?’ The provider name is generally the name of the underlying technology that’s doing the modeling. For the package type, the providers are named after the package management systems (e.g. yum, apt, pacman, zypper, pip), for the file type, the providers are loosely named for the operatingsystem kernel type on which files are to be created (e.g. windows, posix). In our example, I simply chose to name the provider ‘ruby’ because, as a Puppet Labs employee, I TOO suck at naming things.

Here’s a tree of my module to understand how the type and provider files are to be laid out:

Module tree

├── Modulefile
├── README.markdown
└── lib
    └── puppet
        ├── provider
        │   ├── mac_proxy_bypassdomains
        │   │   └── ruby.rb
        └── type
            └── mac_proxy_bypassdomains.rb

As you can see from above, the name of both the type and provider must EXACTLY match the filename of their corresponding files. Also, the provider file lives in a directory named after the type. There are MANY things that can be typoed here (filenames, foldernames, type/provider names in their files), so be absolutely sure that you’ve named your files correctly.

The reason for all this naming bullshit is because of the way Puppet syncs down plugin files (coincidentally, with a process known as Pluginsync). Everything in the lib directory in a Puppet module is going to get synced down to your nodes inside the vardir directory on the node itself. The vardir is a known library path to Puppet, and all files in the vardir are treated as if they had lived in Puppet’s source code (in the same relative paths). Because the Puppet source code has all type files in the lib/puppet/type directory, all CUSTOM types must go in the module’s lib/puppet/type directory for confirmity. This is repeated for EVERY custom Puppet/Facter plugin (including custom facts, custom functions, and etc…).

More scaffolding

Let’s layout the shell of our provider, first, to ensure that we haven’t typoed anything. Here’s the provider declaration:

lib/puppet/type/mac_proxy_bypassdomains/ruby.rb

Puppet::Type.type(:mac_proxy_bypassdomains).provide(:ruby) do
  # Provider work goes here
end

Note that the name of the type and the name of the provider are symbolized (i.e. they’re prepended with a colon). Like I mentioned above, they must be spelled EXACT or Puppet will complain very loudly. You may see variants on that declaration line because there are multiple ways in Ruby to extend a class object. The method I’ve listed above is the ‘generally accepted best-practice’, which is to say it’s the way we’re doing it this month.

Congrats! You have THE SHELL of a provider that has yet to do a single goddamn thing! Technically, you’re further than about 90% of other Puppet users at this point! Let’s go the additional 20% (since we’re basing this on a mangement metric of 110%) by wiring up the methods and making the damn thing work!

Are you (en)sure about this?

We’ve explained before that a type is ‘ensurable’ when you can check for its existance on a system, create it when it doesn’t exist (and it SHOULD exist), and destroy it when it does exist (and it SHOULDN’T exist). The bare minimum amount of methods necessary to make a type ensurable is three, and they’re called exists?, create, and destroy.

Method: `exists?`

The exists? method is a predicate method – that means it should either return the boolean true or false value based on whether the bypass domain list exists. Puppet will always call the exists? provider method to determine if that ‘thing’ (in this case, ‘thing’ means ‘a list of domains to bypass for a specific network interface’) exists before calling any other methods. How do we know if this thing exists? Like I showed before, you need to run the networksetup -getproxybypassdomains command and pass the interface name. If it returns ‘There aren’t any bypass domains set on (interface name)’, then the list doesn’t exist. Let’s do some binary execution…

Calling binaries from Puppet

Puppet provides some helper syntax around basic actions that most providers perform. MOST providers are going to need to call out to an external binary (e.g. yum, apt, etc…) at some point, and so Puppet allows you to create your own method JUST for a system binary. The commands method abstracts all the dirtyness of making a method for each system binary you want to call. The way you use the commands method is like so:

commands :networksetup => 'networksetup'

The commands method accepts a hash whose key must be a symbolized name. The CONVENTION is to use a symbolized name that matches the binary name, but it’s not REQUIRED to do so. The value for that symbolized key MUST be the binary name. Note that I’ve not passed a full path to the binary. Why? Well, Puppet will automatically do a path lookup for that binary and store its full path for use when the binary is invoked. We don’t REQUIRE you to pass the full path because sometimes the same binary exists in different locations for different operatingsystems. Instead of creating a provider for each OS you manage with Puppet, we abstract away the path stuff. You CAN still pass a full path as a value, but if you elect to do that an the binary doesn’t exist at that path, Puppet will disqualify the provider and you’ll be quite upset.

In the event that Puppet CANNOT find this binary, it will disqualify the entire provider, and you’ll get a message saying as much in the debug output of your Puppet run. Because of that, the commands method is a good way to confine your provider to a specific system or class of system.

When the commands method is successfully invoked, you will get a new provider method named after the SYMBOLIZED key, and not necessarily the binary name (unless you made them the same). After the above command is evaluated, Puppet will now have a networksetup() method in our provider. The argument to the networksetup method should be an array of arguments that are passed to the binary. It’s c-style, so each element is going to be individually quoted. You can run into issues here if you pass values containing quotes as part of your argument array. Read that again – quoting your values is totally acceptable (e.g. [‘foo’, ‘bar’]), but passing a value that contains quotes can potentially cause problems (e.g. [“‘foo’”, “‘bar’”]).

You’re probably thinking “Why the hell would I go through this trouble when I can use the %x{} syntax in ruby to execute a shell command?!” And to that I would say “Quit yelling at me” and also “Because: testing.” When you write spec tests for your provider (which will be covered in a later blog post, since it’s its OWN path of WTF), you’re going to need to mock out calls to the system during your tests (i.e. sometimes you may be running the tests on a system that doesn’t have the binary you’re meant to be calling in your provider. You don’t want the tests to fail due to the absence of a binary file). The %x{} construct in Ruby is hard to mock out, but a method of our provider is a relatively easy thing to mock out. Also – see the path problem above. We don’t STOP you from doing %x{} in your code (it will still totally work), but we give you a couple of good reasons to NOT do it.

Objects are a provider’s best friend

Within your provider, you’re going to be doing lots of system calls and data manipulation. Often we’re asked whether you do that ugliness inside the main methods (i.e. inside the exists? method directly), or if you create a helper method for some of this data manipulation. The answer I usually give is that you should probably create a helper method if:

The code is going to be called more than once
The code does something that would be tricky to test (like reading from a file)
Complexity would be reduced by creating a helper method

The act of getting a list of domains for a specific interface is definitely going to be utilized in more than one place in our provider (we’ll use it in the exists? method as well as in a ‘getter’ method for the domains property). Also, you could argue that it might be tricky to test since it’s going to be a binary call that’s going to return some data. Because of this, let’s create a helper method that returns a list of domains for a specific interface:

def get_proxy_bypass_domains(int)
  begin
    output = networksetup(['-getproxybypassdomains', int])
  rescue Puppet::ExecutionFailure => e
    Puppet.debug("#get_proxy_bypass_domains had an error -> #{e.inspect}")
    return nil
  end
  domains = output.split("\n").sort
  return nil if domains.first =~ /There aren\'t any bypass domains set/
  domains
end

Ruby convention is to use underscores (i.e. versus camelCase or hyphens) in method names. You want to give your methods very descriptive names based on what it is that they DO. In this case, get_proxy_bypass_domains seems adequately descriptive. Also, you should err on the side of readability when you’re writing code. You can get pretty creative with Ruby metaprogramming, but that can quickly become hard to follow (and then you’re just a dick). Finally, error-handling is a good thing. If you’re going to do any error-handling, though, be very specific about the errors you catch/rescue. When you have a rescue block, make sure you catch a specific exception class (in the case above, we’re catching a Puppet::ExecutionFailure – which means the binary is returning a non-zero exit code).

The code above will return an array containing all the domains, or it will return nil if domains aren’t found or the networksetup binary had an issue.

Using the helper method above, here’s what the final exists? method looks like:

def exists?
  get_proxy_bypass_domains(resource[:name]) != nil
end

All provider methods have the ability to access the ‘should’ values for the resource (and by that I mean the values that are set in the Puppet maniest on the Puppet master server, or locally if you’re using puppet apply). Those values reside in the resource method that responds with a hash. In the code above, resource[:name] will return the network interface name (e.g. Ethernet, FireWire, etc…) that was specified in the Puppet manifest. The exists method will return true of a list of domains exists for an interface, or it will return false if a list of domains does not exist (i.e. get_proxy_bypass_domains returns nil).

Method: `create`

The create method is called when exists? returns false and a resource has an ensure value set to present. Because of this, you don’t need to call the exists? method explicitly in create – it’s already been evaluated. Remember from above that the -setproxybypassdomains argument to the networksetup binary will set a domain list, so the create method is going to be very short-and-sweet:

def create
  networksetup(['-setproxybypassdomains', resource[:name], resource[:domains]])
end

In the end, the create method will call the networksetup binary with the -setproxybypassdomains argument, pass the interface name (from resource[:name]) and pass an array of domain values (which comes from resource[:domains]). That’s it; it’s done!

Method: `destroy`

The destroy method is easier than the create method:

def destroy
  networksetup(['-setproxybypassdomains', nil])
end

Here, we’re calling networksetup with the -setproxybypassdomains argument and passing nothing else. This will initialize the list and set it to be empty.

Synchronizing properties

Getter method: `domains`

At this point our type is ensurable, which means we can create and destroy resources. What we CAN’T do, however, is change the value of any properties that are out-of-sync. A property is out-of-sync when the value discovered by Puppet on the node differs from the value in the catalog (i.e. set by the Puppet manifest using the DSL on the Puppet master). Just like exists? is called to determine if a resource exists, Puppet needs a way to get the current value for a property on a node. The method that gets this value is called the ‘getter method’ for a property, and its name must match the name of the property. Because we have a property called domains, the provider must have a domains method that returns a value (in this case, an array of domains to be bypassed by the proxy). We’ve already written a helper method that does this work for us, so the domains getter method is pretty easy:

def domains
  get_proxy_bypass_domains(resource[:name])
end

Tada! Just call the helper method and pass the interface name. Boom – instant array of values. The getter method will return the ‘is’ value, because that’s what the value IS (currently on the node). Get it? Anyone? The IS value is the other side of the coin to the ‘should’ value (that comes from the Puppet manifest) because that’s what the value SHOULD be set on the node.

Setter method: `domains=`

If the getter method (e.g. domains) returns a value that doesn’t match the value in the catalog, then Puppet changes the value on the node and sets it to the value in the catalog. It does this by calling the ‘setter’ method for the property, which is the name of the property and the equals ( = ) sign. In this case, the setter method for the domains property must be called domains=. It looks like this:

def domains=(value)
  networksetup(['-setproxybypassdomains', resource[:name], value])
end

Setter methods are always passed a single argument – the ‘should’ value of the property. In our example, we’re calling the networksetup binary with the -setproxybypassdomains argument, passing the name of the interface, and then passing the ‘should’ value – or the array of domains. It’s easy, it’s one line, and I love it when a plan comes together

Putting the whole damn thing together

I’ve broken down the provider line by line, but here’s the entire file:

lib/puppet/provider/mac_proxy_bypassdomains/ruby.rb

Puppet::Type.type(:mac_proxy_bypassdomains).provide(:ruby) do
  commands :networksetup => 'networksetup'

  def get_proxy_bypass_domains(int)
    begin
      output = networksetup(['-getproxybypassdomains', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug("#get_proxy_bypass_domains had an error -> #{e.inspect}")
      return nil
    end
    domains = output.split("\n").sort
    return nil if domains.first =~ /There aren\'t any bypass domains set/
    domains
  end

  def exists?
    get_proxy_bypass_domains(resource[:name]) != nil
  end

  def destroy
    networksetup(['-setproxybypassdomains', nil])
  end

  def create
    networksetup(['-setproxybypassdomains', resource[:name], resource[:domains]])
  end

  def domains
    get_proxy_bypass_domains(resource[:name])
  end

  def domains=(value)
    networksetup(['-setproxybypassdomains', resource[:name], value])
  end
end

Testing the type/provider

And that’s it, we’re done! The last thing to do is to test it out. You can test out your provider in one of two ways: the first is to add the module to the modulepath of your Puppet master and include it that way, or test it locally by setting the $RUBYLIB environmental variable to point to the lib directory of your module (which is the more preferred method since it won’t serve it out to all of your nodes without it being tested). Because this module is on my system at /users/glarizza/src/puppet-mac_proxy, here’s how my $RUBYLIB is set:

export RUBYLIB=/users/glarizza/src/puppet-mac_proxy/lib

Next, we need to create a resource declaration to try and set a couple of bypass domains. I’ll create a tests directory and simple test file in tests/mac_proxy_bypassdomains.pp:

tests/mac_proxy_bypassdomains.pp

mac_proxy_bypassdomains { 'Ethernet':
  ensure  => 'present',
  domains => ['www.garylarizza.com','*.puppetlabs.com','10.13.1.3/24'],
}

Finally, let’s run Puppet and test it out:

└▷ puppet apply ~/src/puppet-mac_proxy/tests/mac_proxy_bypassdomains.pp
Notice: Compiled catalog for satori.local in environment production in 0.06 seconds
Notice: /Stage[main]//Mac_proxy_bypassdomains[Ethernet]/domains: domains changed [] to 'www.garylarizza.com *.puppetlabs.com 10.13.1.3/24'
Notice: Finished catalog run in 3.47 seconds

NOTE: If you run this as a local user, you will be prompted by OS X to enter an administrative password for a change. Since Puppet will ultimately be run as root on OS X when we’re NOT testing out code, this shouldn’t be required during a normal Puppet run. To test this out (i.e. that you don’t always have to enter an admin password in a pop-up window), you’ll need to sudo -s to change to root, set the $RUBYLIB as the root user, and then run Puppet again.

And that’s it – looks like our code worked! To check and make sure it will notice a change, open System Preferences, then the Network pane, click on the Ethernet interface, then the Advanced button, then the Proxies tab, and finally note the ‘Bypass proxy settings…’ text box at the bottom of the screen (now do you see why we automate this shit?!). Make a change to the entries in there and run Puppet again – it should correct it for you

Wait…so that was it? Really? We’re done?

Yeah, that was a whole type and provider. Granted, it has only one property and it’s not too complicated, but that’s the point. We’ve still got some latent bugs (the network interface passed must be capitalized exactly like OS X expects it, we could do some better error handling, etc…), and the type doesn’t work with puppet resource (yet), but we’ll handle all of these things in the next blog post (or two…or three).

Until then, take this time to crack open a type and a provider for something that’s been pissing you off and FIX it! Better yet, push it up to Github, tweet about it, and post it up on The Forge so the rest of the community can use it!

Like always, feel free to comment, tweet me (@glarizza), email me (gary AT puppetlabs DOT com), or use the social media platform of choice to get a hold of me (Snapchats may or may not get a response. Maybe.) Cheers!

Fun With Puppet Providers - Part 1 of Whatever

Nov 25th, 2013

I don’t know why I write blog posts – everybody in open-source software knows that the code IS the documentation. If you’ve ever tried to write a Puppet type/provider, you know this fact better than ANYONE. To this day, when someone asks me for the definitive source on this activity I usually refer them first to Nan Liu and Dan Bode’s awesome Types and Providers book (which REALLY is a fair bit of quality information), and THEN to the source code for Puppet. Everything else falls in-between those sources (sadly).

As someone who truly came from knowing absolute fuckall about Ruby and only marginally more than that about Puppet, I’ve walked through the valley of the shadow of self.instances and have survived to tell the tale. That’s what this post is about – hopefully some GOOD information if you want to start writing your own Puppet type and provider. I also wrote this because this knowledge has been passed down from Puppet employee to Puppet employee, and I wanted to break the priesthood being held on type and provider magic. If you don’t hear from me after tomorrow, well, then you know what happened…

Because 20 execs in a defined type…

What would drive someone to write a custom type and provider for Puppet anyhow? Afterall, you can do ANYTHING IMAGINABLE in the Puppet DSL*! After drawing back my sarcasm a bit, let me explain where the Puppet DSL tends to fall over and the idea of a custom type and provider starts becoming more than just an incredibly vivid dream:

You have more than a couple of exec statements in a single class/defined type that have multiple conditional properties like ‘onlyif’ and/or ‘unless’.
You need to use pure Ruby to manipulate data and parse it through a system binary
Your defined type has more conditional logic than your pre-nuptual agreement
Any combination of similar arguments related to the above

If the above sounds familiar to you, then you’re probably ready to build your own custom Puppet type and provider. Do note that custom types and providers are written in Ruby and not the Puppet DSL. This can initially feel very scary, but get over it (there are much scarier things coming).

* Just because you can doesn’t mean you don’t, in fact, suck.

I’m not your Type

This blog post is going to focus on types and type-interaction, while later posts will focus on providers and ultimately dirty provider tricks to win friends and influence others. Type and provider interaction can be totally daunting for newcomers, let ALONE just naming files correctly due to Puppet’s predictable (note: anytime I write the word “predictable”, just substitute the phrase “annoying pain in the ass”) naming pattern. Let’s break it down a bit for you – somebody que Dre…

(NOTE: I’m going to ASSUME you understand the fundamentals of a Puppet run already. If you’re pretty hazy on that concept, checkout docs.puppetlabs.com for more information)

Types are concerned about your looks

The type file defines all the properties and parameters that can be used by your new custom resource. Think of the type file like the opening stanza to a new Puppet class – we’re describing all the tweakable knobs and buttons to the new thing we’re creating. The type file also gives you some added validation abilities, which is very handy.

It’s important to understand that there is a BIG difference between a ‘property’ and a ‘parameter’ with regard to a type (even though they’re both assigned values identically in a resource declaration). Think of it this way: a property is something that can be inspected and changed by Puppet, while a parameter is just helper data that Puppet uses to do its job. A property would be something like a file’s mode. You can inspect a file and determine its mode, and you can even CHANGE a file’s mode on disk. The file resource type also has a parameter called ‘backup’. Its sole job is to tell Puppet whether to backup the file to the filebucket before making changes. This data is useful for Puppet during a run, but you can’t inspect a file on disk and know definitively whether Puppet is going to back it up or not (and it goes without saying that if you can’t determine this aspect about a file on disk just by inspecting it, than you also can’t CHANGE this aspect about a file on disk either). You’ll see later where the property/parameter distinction becomes very important.

Recently I built a type modeling the setting of proxy data for network interfaces on OS X, so we’ll use that as a demonstration of a type. It looks like the following:

lib/puppet/type/mac_web_proxy.rb

Puppet::Type.newtype(:mac_web_proxy) do
  desc "Puppet type that models a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
    munge do |value|
      value.downcase
    end
    def insync?(is)
      is.downcase == should.downcase
    end
  end

  newproperty(:proxy_server) do
    desc "Proxy Server setting for the interface"
  end

  newparam(:authenticated_username) do
    desc "Username for proxy authentication"
  end

  newparam(:authenticated_password) do
    desc "Password for proxy authentication"
  end

  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end
end

First note the type file’s path in the grey titlebar of the graphic: lib/puppet/type/mac_web_proxy.rb This path is relative to the module that you’re building, and it’s VERY important that it be named EXACTLY this way to appease Puppet’s predictable naming pattern. The name of the file directly correllates to the name of the type listed in the Puppet::Type.newtype() method.

Next, let’s look at a sample parameter declaration – for starters, let’s look at the ‘authenticated_password’ parameter declaration on line 24 of the above type. The newparam() method is called and the lone argument passed is the symbolized name of our parameter (i.e. it’s prepended with a colon). This parameter provides the password to use when setting up an authenticated web proxy on OS X. It’s a parameter because as far as I know, there’s no way for me to query the system for this password (it’s obfuscated in the GUI and I’m not entirely certain where it’s stored on-disk). If there were a way for us to query this value from the system, then we could turn it into a property (since we could both ‘GET’ as well as ‘SET’ the value). As of right now, it exists as helper data for when I need to setup an authenticated proxy.

Having seen a parameter, let’s look at the ‘proxy_server’ property that’s declared on line 16 of the type file above. We’re able to both query the system for this value, as well as change/set the value by using the networksetup binary, so it’s able to be ‘synchronized’ (according to Puppet). Because of this, it must be a property.

Just enough validation

The second major function of the type file is to provide methods to validate property and parameter data that is being passed. There are two methods to validate this data, and one method that allows you to massage the data into an acceptable format (which is called ‘munging’).

validate()

The first method, named ‘validate’, is widely believed to be the only successfully-named method in the entire Puppet codebase. Validate accepts a block and allows you to perform free-form validation in any way you prefer. For example:

lib/puppet/type/user.rb

validate do |value|
  raise ArgumentError, "Passwords cannot include ':'" if value.is_a?(String) and value.include?(":")
end

This example, pulled straight from the Puppet codebase, will raise an error if a password contains a colon. In this case, we’re looking for a specific exception and are raising errors accordingly.

newvalues()

The second method, named ‘newvalues’, accepts a regex that property/parameter values need to match (if you’re one of the 8 people in the world that speak regex fluently), or a list of acceptable values. From the example above:

lib/puppet/type/mac_web_proxy.rb

  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end

munge()

The final method, named ‘munge’ accepts a block like newvalues but allows you to convert an unacceptable value into an acceptable value. Again, this is from the example above:

lib/puppet/type/mac_web_proxy.rb

munge do |value|
  value.downcase
end

In this case, we want to ensure that the parameter value is lower case. It’s not necessary to throw an error, but rather it’s acceptable to ‘munge’ the value to something that is more acceptable without alerting the user.

Important type considerations

You could write half a book just on how types work (and, again, check out the book referenced above which DOES just that), but there are a couple of final considerations that will prove helpful when developing your type.

Defaulting values

The defaultto method provides a default value should the user not provide one for your property/parameter. It’s a pretty simple construct, but it’s important to remember when you write spec tests for your type (which you ARE doing, right?) that there will ALWAYS be values for properties/parameters that utilize defaultto. Here’s a quick example:

Defaultto example

newparam(:enable_lacp) do
  defaultto :true
  newvalues(:true, :false)
end

Ensurable types

A resource is considered ‘ensurable’ when its presence can be verified (i.e. it exists on the system), it can be created when it doesn’t exist and it SHOULD, and it can be destroyed when it exists and it SHOULDN’T. The simplest way to tell Puppet that a resource type is ensurable is to call the ensurable method within the body of the type (i.e. outside of any property/parameter declarations). Doing this will automatically create an ‘ensure’ property that accepts values of ‘absent’ and ‘present’ that are automatically wired to the ‘exists?’, ‘create’ and ‘destroy’ methods of the provider (something I’ll write about in the next post). Optionally, you can choose to pass a block to the ensurable method and define acceptable property values as well as the methods of the provider that are to be called. That would look something like this:

lib/puppet/type/package.rb

ensurable do
  newvalue(:present) do
    provider.install
  end

  newvalue(:absent) do
    provider.uninstall
  end

  newvalue(:purged) do
    provider.purge
  end

  newvalue(:held) do
    provider.hold
  end
end

This means that instead of calling the create method to create a new resource that SHOULD exist (but doesn’t), Puppet is going to call the install method. Conversely, it will call the uninstall method to destroy a resource based on this type. The ensure property will also accept values of ‘purged’ and ‘held’ which will be wired up to the purge and hold methods respectively.

Namevars are unique little snowflakes

Puppet has a concept known as the ‘namevar’ for a resource. If you’re hazy about the concept check out the documentation, but basically it’s the parameter that describes the form of uniqueness for a resource type on the system. For the package resource type, the ‘name’ parameter is the namevar because the way you tell one package from another is its name. For the file resource, it’s the ‘path’ parameter, because you can differentiate unique files from each other according to their path (and not necessarily their filename, since filenames don’t have to be unique on systems).

When designing a type, it’s important to consider WHICH parameter will be the namevar (i.e. how can you tell unique resources from one another). To make a parameter the namevar, you simply set the :namevar attribute to :true like below:

newparam(:name, :namevar => :true) do
  # Type declaration attributes here...
end

Handling array values

Nearly every property/parameter value that is declared for a resource is ‘stringified’, or cast to a string. Sometimes, however, it’s necessary to accept an array of elements as the value for a property/parameter. To do this, you have to explicitly tell Puppet that you’ll be passing an array by setting the :array_matching attribute to :all (if you don’t set this attribute, it defaults to :first, which means that if you pass an array as a value for a property/parameter, Puppet will only accept the FIRST element in that array).

newproperty(:domains, :array_matching => :all) do
  # Type declaration attributes here... 
end

If you set :array_matching to :all, EVERY value passed for that parameter/property will be cast to an array (which means if you pass a value of ‘foo’, you’ll get an array with a single element – the string of ‘foo’).

Documenting your property/parameter

It’s a best-practice to document the purpose of your property or parameter declaration, and this can be done by passing a string to the desc method within the body of the property/parameter declaration.

newproperty(:domains, :array_matching => :all) do
  desc "Domains which should bypass the proxy"
# Type declaration attributes here...
end

Synchronization tricks

Puppet uses a method called insync? to determine whether a property value is synchronized (i.e. if Puppet needs to change its value, or it’s set appropriately). You usually have no need to change the behavior of this method since most of the properties you create for a type will have string values (and the == operator does a good job of checking string equality). For structured data types like arrays and hashes, however, that can be a bit trickier. Arrays, for example, are ordered construct – they have a definitive idea of what the first element and the last element of the array are. Sometimes you WANT to ensure that values are in a very specific order, and sometimes you don’t necessarily care about the ORDER that values for a property are set – you just want to make sure that all of them are set.

If the latter cases sounds like what you need, then you’ll need to override the behavior of the insync? method. Take a look at the below example:

newproperty(:domains, :array_matching => :all) do
  desc "Domains which should bypass the proxy"
  def insync?(is)
    is.sort == should.sort
  end
end

In this case, I’ve overridden the insync? method to first sort the ‘is’ value (or, the value that was discovered by Puppet on the target node) and compare it with the sorted ‘should’ value (or, the value that was specified in the Puppet manifest when the catalog was compiled by the Puppet master). You can do WHATEVER you want in here as long as insync? returns either a true or a false value. If insync? returns true, then Puppet determines that everything is in sync and no changes are necessary, whereas if it returns false then Puppet will trigger a change.

And this was the EASY part!

Wow this went longer than I expected… and types are usually the ‘easier’ bit since you’re only describing the format to be used by the Puppet admin in manifests. There are some hacky type tricks that I’ve not yet covered (i.e. features, ‘inheritance’, and other meta-bullshit), but those will be saved for a final ‘dirty tips and tricks’ post. In the next section, I’ll touch on providers (which is where all interaction with the system takes place), so stay tuned for more brain-dumping-goodness…

From the Archive: Using Crankd

Jul 10th, 2013

Supporting laptops in a managed environment is tricky (and doubly so if you allow them to be taken off your corporate network). While you can be reasonably assured that your desktops will remain on and connected during the workday, it’s not uncommon for laptops to go to sleep, change wireless access points, and even change between an Ethernet or AirPort connection several times during the day. It’s important to have a tool that can “tweak” certain settings in response to these changes.

This is where crankd comes in.

Crankd is a cool utility that’s part of the Pymacadmin (http://code.google.com/p/pymacadmin/) suite of tools co-authored by Chris Adams and Nigel Kersten. Specifically crankd is a Python daemon that lets you trigger shell scripts, or execute Python methods, based upon state changes in SystemConfiguration, NSWorkspace and FSEvents.

Use Cases

It’s easier to see how crankd can help you with a couple of scenarios:

Your laptops, like all of the other machines in your organization, are bound to your corporate LDAP servers. When they’re on network, they will query the LDAP servers for things like authentication information. Unless your corporate LDAP directory is accessible outside your corporate network, your laptops may exhibit the “spinning wheel of death” when they attempt to contact a suddenly-unreachable LDAP directory at the neighborhood Starbucks. A solution to this is to remove the LDAP servers from your Search (and Contacts) path whenever the laptop is taken off-network and add the LDAP servers when you come back on-network.
Perhaps you’re using Puppet, Munki, Chef, StarDeploy, Filewave, Absolute Manage, Casper, or any other configuration management system that needs to contact a centralized server for configuration information. Usually these tools will have your machine contact their servers once an hour or so, but this can be a problem if the machine is constantly sleeping and waking. Plus if you take your machine off-network, you don’t want it trying to contact a server that might not be reachable from the outside world. It would be nice to have your laptop “phone home” when it establishes a network connection on your corporate network, and skip this step when the laptop is taken outside your organization.
OS X allows you to set a preferred order for your network connections, but it would be nice to disable the AirPort when your laptop establishes an Ethernet connection.
Finally, maybe you have the need to perform an action whenever your laptop sleeps (or wakes), changes a network connection, mounts a volume, or runs a specific Application (whether it’s located in the Applications directory or anywhere else on your machine).

All of these situations can be made trivial through the help of crankd.

How do I get it working?

Crankd is a daemon, so it’s running in the background while you work. It uses an XML plist file that tells it which scripts (or which Python methods) to execute in response to specific state changes (like a network connection going up or down or a volume being mounted). Since it’s a small Python library, the files aren’t huge and the entire finished installation is around 100 Kb (or larger with your custom code/scripts). Lets download crankd and experiment with its settings:

Download the Pymacadmin source. You can do this through Google Code or Github – I’ll demonstrate the Github method. Navigate to http://github.com/acdha/pymacadmin, click the Downloads button, and download either the .tar.gz or the .zip version of the source code. Drag it to your desktop and then double-click on the file to expand it. It should open a folder named “acdha-pymacadmin-”
Install crankd Upon opening the pymacadmin folder, you should see a series of folders, readme files, and an “install-crankd.sh” installation script. Let’s open Terminal.app and navigate to the pymacadmin folder that we expanded on our desktop (you can type “cd” into Terminal.app and then drag and drop the folder into the Terminal window. Hit the Return button on your keyboard to change to the directory.). The install-crankd.sh script is executable, so run it by typing “sudo ./install-crankd.sh” into the Terminal window and hitting Return. Enter your password when it prompts you
Setup a plist file for crankd If you’ve never worked with crankd before, it’s best to let it setup a configuration plist for you. If you don’t specify a configuration plist with the “—config” argument, or you don’t have a com.googlecode.pymacadmin.crankd.plist file in your /Users//Library/Preferences folder, crankd will automatically create a sample plist for you. Let’s do that by typing “/usr/local/sbin/crankd.py” into Terminal and hitting the Return button. Take a look at the sample configuration plist file:

<?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE plist PUBLIC “–//Apple Computer//DTD PLIST 1.0//EN” “http://www.apple.com/DTDs/PropertyList-1.0.dtd”>

<key>NSWorkspace</key>
<dict>
  <key>NSWorkspaceDidMountNotification</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "A new volume was mounted!"</string>
  </dict>
  <key>NSWorkspaceDidWakeNotification</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "The system woke from sleep!"</string>
  </dict>
  <key>NSWorkspaceWillSleepNotification</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "The system is about to go to sleep!"</string>
  </dict>
</dict>
<key>SystemConfiguration</key>
<dict>
  <key>State:/Network/Global/IPv4</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "Global IPv4 config changed"</string>
  </dict>
</dict>

This XML file has two main keys – one for NSWorkspace events (such as mounted volumes and sleeping/waking your laptop), and one for SystemConfiguration events (such as network state changes). Followed by a key for the specific event that we’re monitoring, a key specifying whether we’ll be executing a command or a Python method in response to this event, and a string (or an array of strings, as we’ll see later) specifying the actual command that’s to be executed. For all of the events in the sample plist, we’re going to be echoing a message to the console.

Start crankd Once crankd has been installed and your configuration plist file is setup, you’re ready to let crankd monitor for state changes. Let’s start crankd with the sample plist that was created in the previous step by executing the following command in Terminal “/usr/local/sbin/crankd.py —config=/Users//Library/Preferences/com.googlecode.pymacadmin.crankd.plist” Remember to substitute your username for in that command (if you don’t know your username, you can type “whoami” into Terminal and hit the Return button). If everything was executed correctly, you should see the following lines displayed in Terminal:

Module directory /Users//Library/Application Support/crankd does not exist: Python handlers will need to use absolute pathnames INFO: Loading configuration from /Users//Library/Preferences/com.googlecode.pymacadmin.crankd.plist INFO: Listening for these NSWorkspace notifications: NSWorkspaceWillSleepNotification, NSWorkspaceDidWakeNotification, NSWorkspaceDidMountNotification INFO: Listening for these SystemConfiguration events: State:/Network/Global/IPv4

It might look like Terminal isn’t doing anything, but in all actuality crankd is listening for changes. You can make crankd come to life by either connecting to (or disconnecting from) an AirPort network, sleeping/waking your machine, or mounting a volume (by inserting a USB memory stick, for example). Performing any of these actions will cause crankd to echo messages to your Terminal window. Here’s the message I received when I disconnected from an AirPort network:

INFO: SystemConfiguration: State:/Network/Global/IPv4: executing /bin/echo “Global IPv4 config changed” Global IPv4 config changed

To quit this sample configuration of crankd, simply hold down the control button on your keyboard and press the C key. Congratulations, crankd is now up and running!

A more complex example

Let’s look at one of our previous situations

Puppet + Github = Laptop <3

Feb 15th, 2013

Everybody wants to be special (I blame our moms). The price of special, when it comes to IT, is time. Consider how long you’ve spent on just your damn PROMPT and you’ll realize why automation gives any good sysadmin a case of the giggles. Like the cobbler’s kids, though, your laptop and development environment are always the last to be visited by the automation gnomes.

Until now.

Will Farrington gave a great talk at Puppetconf 2012 about managing an army of developer laptops using Puppet + some Github love that left more than a couple of people asking for his code. That request has unfortunately been denied.

Until now.

Boxen, and hand-tune no more

Enter Boxen (née ‘The Setup’), a full-fledged open source project from the guys at Github that melds Puppet with your Github credentials to create a framework for automating everything from applications, to dotfiles, and even printers and emacs extensions (that last bit’s a lie – no one should be using emacs).

How does it work? Think ‘Masterless Puppet’ (or, just a bunch of Puppet modules that are enforced by running puppet apply on your local machine). Boxen not only includes the framework ITSELF, but is a project on Github that hosts over 75 individual modules for managing things like rbenv, homebrew, git, mysql, postgres, riak, redis, npm, erlang, dropbox, skype, minecraft, heroku, 1password, iterm2, and much more. Odds are there’s a module for many of the common things you setup on your laptop. And what about things like dotfiles that are intrinsically personal? You can create your own repository and manage them like you would any other file/directory/symlink on the file system. The goal is to model every little piece of your laptop that makes it ‘unique’ from everyone else until you have your entire environment managed and the hardware becomes…well…disposable. How many times have you shied away from doing an upgrade because some component of your laptop required you to spend countless hours tinkering with it? If you’ve done the time, you should do something to make sure that you NEVER have to repeat that process manually ever again.

Boxen is also very self-contained. Packages and binaries that come out of Homebrew are installed into /opt/boxen/homebrew/bin, frameworks like Ruby and Node are installed into /opt/boxen/{rbenv,nvm}, and individual versions of those frameworks are kept separate from your system version of those frameworks. These details are important when you consider that you could purge the whole setup without having to rip out components scattered around your system.

You may be reading this and thinking “There’s no way in hell I can use this to manage every laptop in my organization!”, and you’re right. The POINT of Boxen is that it’s a tool written by developers for developers to automate THEIR systems. The goal of developing is to have as little friction between the process of writing code and deploying that code into production. A tool like Boxen allows you to more quickly GET to the state where your laptop is ready for you to start developing. If you want a tool to completely manage and lock down all the laptops on your system, look to using Puppet in agent/master mode or to a tool like Munki to manage all packages on your system. If you’re interested in giving your developers/users the freedom to manage their OWN ‘boxes’ because they know best what works for them, then Boxen is your tool.

There IS one catch – it’s targeted to OS X (10.8 to be exact).

Diary of an elitist

I was fortunate to have early-access to Boxen in order to kick its Ruby tyres. As someone who’s managed Macs with Puppet before (all the way down to the desktop/laptop level), I was embarrassed to admit that I had NOTHING about my laptop automated. Will unlocked the project and basically said “Have fun, break shit, fix it, and file pull requests” and away I went. To commit completely to the project, I did what any sane person would do.

I reformatted my laptop and started entirely from scratch.

(Let’s be clear here – you don’t have to do that. Initially Will reported problems getting Boxen running in VMs, but I never ran into an issue. I ran Boxen in VMware Fusion 5 a number of times to make sure the changes I made were going to do the right thing on a fresh install. I’d recommend going down THAT road if you’re hesitant of immediately throwing this on your pretty snowflake of a laptop.)

Installing Boxen was pretty easy – the only prerequisite was downloading the XCode Command-Line Tools (which included git), pulling down the Boxen repo, and running script/boxen. It was stupid simple. What you GOT, by default, was:

Homebrew
Git
Hub
DNSMasq w/ .dev resolver for localhost
NVM
RBenv
Full Disk Encryption requirement
NodeJS 0.4
NodeJS 0.6
NodeJS 0.8
Ruby 1.8.7
Ruby 1.9.2
Ruby 1.9.3
Ack
Findutils
GNU-Tar

Remember, this is all tunable and you don’t need to pull down ALL of these packages, but, since it was new, I decided to install everything and sort it out later. Yes, the initial setup took a good number of minutes, but think about everything that’s being installed. In the end, I had a full Ruby development environment with rbenv, multiple versions of Ruby, and a laptop that could be customized without much work at all.

Which end do I blow in?

The readme on the project page for Boxen describes how to clone the project into /opt/boxen/repo, so that’s the directory we’ll be working with. To see what will be enforced on your machine, check out manifests/site.pp to see something that looks like this:

manifests/site.pp

require boxen::environment
require homebrew::repo

Exec {
  group       => 'staff',
  logoutput   => on_failure,
  user        => $luser,

  path => [
    "${boxen::config::home}/rbenv/shims",
    "${boxen::config::home}/homebrew/bin",
    '/usr/bin',
    '/bin',
    '/usr/sbin',
    '/sbin'
  ],

  environment => [
    "HOMEBREW_CACHE=${homebrew::cachedir}",
    "HOME=/Users/${::luser}"
  ]
}

File {
  group => 'staff',
  owner => $luser
}

Package {
  provider => homebrew,
  require  => Class['homebrew']
}

Repository {
  provider => git,
  extra    => [
    '--recurse-submodules'
  ],
  require  => Class['git']
}

Service {
  provider => ghlaunchd
}

This is largely scaffolding setting up the Boxen environment and resource defaults. If you’re familiar with Puppet, this should be recognizable to you, but for everyone else, let’s dissect one of the resource defaults:

File {
  group => 'staff',
  owner => $luser
}

This block basically means that any file you declare with Puppet should default to having its owner set as your username and its group set to ‘staff’ (which is standard in OS X). You can override this explicitly with a file declaration by providing the owner or group attribute, but if you omit it then it’s going to default to these values.

The rest of the defaults are customized for Boxen’s preferences (i.e. homebrew will be used to install all packages unless you specify otherwise, exec resources will log all output on failure, service resources will use githubs’s customized service provider, and etc…). Now let’s look below:

manifests/site.pp

node default {
  # core modules, needed for most things
  include dnsmasq
  include git
  include hub
  include nginx
  include nvm
  include ruby

  # fail if FDE is not enabled
  if $::root_encrypted == false {
    fail('Please enable full disk encryption and try again')
  }

  # node versions
  include nodejs::0-4
  include nodejs::0-6
  include nodejs::0-8

  # default ruby versions
  include ruby::1-8-7
  include ruby::1-9-2
  include ruby::1-9-3

  # common, useful packages
  package {
    [
      'ack',
      'findutils',
      'gnu-tar'
    ]:
  }

  file { "${boxen::config::srcdir}/our-boxen":
    ensure => link,
    target => $boxen::config::repodir
  }
}

These are the things that Boxen has chosen to enforce ‘out of the box’. Knowing that Boxen was designed so that developers could customize their ‘boxes’ THEMSELVES, it makes sense that there’s not much that’s being enforced on everyone. In fact, the most significant thing being ‘thrust’ upon you is the fact that the machine must have full disk encryption enabled (which is a good idea anyways).

If you want to pare down what Boxen gives you by default, you can choose to comment out lines providing, for example, nvm and nodejs versions (if you don’t use node.js in your environment). I’m a Ruby developer, so all the Ruby builds (and rbenv) are very helpful to me, but you could also remove those if you were so inclined. The point is that this file contains the ‘knobs’ to dial your base Boxen setup up or down.

Customizing (or, my dotfiles are better than yours)

The whole point of Boxen is to customize your laptop and keep its customization automated. To do this, we’re going to need to make some Puppet class files.

CAUTION: PUPPET AHEAD

If you’ve not had experience with Puppet before, I can’t recommend the learning Puppet series enough. In the vein of “Puppet now, learn later”, I’m going to give you Puppet code that works for ME and only explain the trickier bits.

Boxen has some ‘magic’ code that’s going to automatically look for a class called people::<github username>, and so I’m going to create a file in modules/people/manifests called glarizza.pp. This file will contain Puppet code specific to MY laptop(s). Here’s a snippit of that file:

modules/people/manifests/glarizza.pp

class people::glarizza {

  notify { 'class people::glarizza declared': }

  # Changes the default shell to the zsh version we get from Homebrew
  # Uses the osx_chsh type out of boxen/puppet-osx
  osx_chsh { $::luser:
    shell   => '/opt/boxen/homebrew/bin/zsh',
    require => Package['zsh'],
  }

  file_line { 'add zsh to /etc/shells':
    path    => '/etc/shells',
    line    => "${boxen::config::homebrewdir}/bin/zsh",
    require => Package['zsh'],
  }

  ##################################
  ## Facter, Puppet, and Envpuppet##
  ##################################

  repository { "${::boxen_srcdir}/puppet":
    source => 'puppetlabs/puppet',
  }

  repository { "${::boxen_srcdir}/facter":
    source => 'puppetlabs/facter',
  }

  file { '/bin/envpuppet':
    ensure  => link,
    mode    => '0755',
    target  => "${::boxen_srcdir}/puppet/ext/envpuppet",
    require => Repository["${::boxen_srcdir}/puppet"],
  }
}

The notify resource is only to prove that when you run Boxen that this class is being declared – it only displays a message to the console when you run the boxen binary.

The osx_chsh resource is a custom defined type that Github has created to ensure a line shows up in /etc/shells as an acceptable shell. Because Boxen installs zsh from homebrew into /opt/boxen/homebrew, we need to ensure that /etc/shells is correct. Note the syntax of $boxen::config::homebrewdir which refers to a variable called $homebrewdir in the boxen::config class.

Next, I’ve setup a couple of resources to make sure the puppet and facter repositories are installed on my system. Github has also developed a lightweight repository resource that will simply ensure that a repo is cloned at a location on disk. $::boxen_srcdir is one of the custom Facter facts that Boxen provides in shared/boxen/lib/facter/boxen.rb in the Boxen repository.

The file resource sets up a symlink from /bin/envpuppet to /Users/glarizza/src/puppet/ext/envpuppet on my system. The attributes should be pretty self-explanatory, but the newest attribute of require says that the repository resource must come BEFORE this file resource is declared. This is a demonstration of Puppet’s ordering metaparameters that are described in the Learning Puppet series.

Since we briefly touched on $::boxen_srcdir, what are some other custom facts that come out of shared/boxen/lib/facter/boxen.rb?

shared/boxen/lib/facter/boxen.rb

require "json"
require "boxen/config"

config   = Boxen::Config.load
facts    = {}
factsdir = File.join config.homedir, "config", "facts"

facts["github_login"]   = config.login
facts["github_email"]   = config.email
facts["github_name"]    = config.name

facts["boxen_home"]     = config.homedir
facts["boxen_srcdir"]   = config.srcdir

if config.respond_to? :reponame
  facts["boxen_reponame"] = config.reponame
end

facts["luser"]          = config.user

Dir["#{config.homedir}/config/facts/*.json"].each do |file|
  facts.merge! JSON.parse File.read file
end

facts.each { |k, v| Facter.add(k) { setcode { v } } }

This file will also give you $::luser, which will evaluate out to your system username, and $::github_name, which is equivalent to your Github username (note that this is what Boxen uses to find your class file in modules/people/manifests). If you’re looking for all the other values set by these custom facts, check out config/boxen/defaults.json after you run Boxen.

Using modules out of the Boxen namespace

Not only is Boxen its own project, but it’s a separate organization on Github that hosts a number of Puppet modules. Some of these modules are pretty simple (a single resource to install a package), but the point is that they’ve been provided FOR you – so use, fork, and improve them (but most of all, submit pull requests). The way you use them with Boxen may not be readily clear, so let’s walk through that with a simple module for installing Google Chrome.

Add the module to your Puppetfile
Classify the module in your Puppet setup
Run Boxen

Add the module to your Puppetfile

Boxen uses a tool called librarian-puppet to source and install Puppet modules from Github. Librarian-puppet uses the Puppetfile file in the root of the Boxen repo to install modules. Let’s look at a couple of lines in that file:

Puppetfile

mod "boxen",    "0.1.8",  :github_tarball => "boxen/puppet-boxen"
mod "dnsmasq",  "0.0.1",  :github_tarball => "boxen/puppet-dnsmasq"
mod "git",      "0.0.3",  :github_tarball => "boxen/puppet-git"
mod "hub",      "0.0.1",  :github_tarball => "boxen/puppet-hub"
mod "homebrew", "0.0.17", :github_tarball => "boxen/puppet-homebrew"
mod "inifile",  "0.0.1",  :github_tarball => "boxen/puppet-inifile"
mod "nginx",    "0.0.2",  :github_tarball => "boxen/puppet-nginx"
mod "nodejs",   "0.0.2",  :github_tarball => "boxen/puppet-nodejs"
mod "nvm",      "0.0.5",  :github_tarball => "boxen/puppet-nvm"
mod "ruby",     "0.4.0",  :github_tarball => "boxen/puppet-ruby"
mod "stdlib",   "3.0.0",  :github_tarball => "puppetlabs/puppetlabs-stdlib"
mod "sudo",     "0.0.1",  :github_tarball => "boxen/puppet-sudo"

This evaluates out to the following syntax:

mod, <module name>, <version or tag>, <source>

The HARDEST thing about this file is finding the version number of modules on Github (HINT: it’s a tag). Once you’re given that information, it’s easy to pull up a module on Github, look at its tags, and then fill out the file. Let’s do that with a line for the Chrome module:

mod "chrome",     "0.0.2",   :github_tarball => "boxen/puppet-chrome"

Classify the module in your Puppet setup

In the previous section, we created modules/people/manifests/<github username>.pp. We COULD continue to fill this file with a ton of resources, but I tend to like to separate out resources into separate subclasses. Puppet has module naming conventions to ensure that it can FIND your subclasses, so I recommend browsing that guide before randomly naming files (HINT: Filenames ARE important and DO matter here). I want to create a people::glarizza::applications subclass, so I need to do the following:

## YES, make sure to replace YOUR USERNAME for 'glarizza'
$ cd /opt/boxen/repo
$ mkdir -p modules/people/manifests/glarizza
$ vim modules/people/manifests/glarizza/applications.pp

It’s totally fine that there’s a glarizza directory aside the glarizza.pp file – this is intentional and desired. Puppet’s not going to automatically declare anything in the people::glarizza::applications class until we TELL it to, so let’s open modules/people/manifests/glarizza.pp and add the following line at the top:

modules/people/manifests/glarizza.pp

include people::glarizza::applications

That tells Puppet to find the people::glarizza::applications class and make sure it ‘does’ everything in that file. Now, let’s create the people::glarizza::applications class:

modules/people/manifests/glarizza/applications.rb

class people::glarizza::applications {
  include chrome
}

Yep, all it takes is one line to include the module we will get from Boxen. Because of the way Boxen works, it will consult the Puppetfile FIRST, pull down any modules that are in the Puppetfile but NOT on the system, drop them into place so Puppet can find them, and then run Puppet normally.

Run Boxen

Once you have Boxen setup, you can just run boxen from the command line to have it enforce your configuration. By default, if there are any errors, it will log them as Github Issues on your fork of the main Boxen repository (this can be disabled with boxen --no-issue). As you’re just getting started, don’t worry about the errors. The good news is that once you fix things and perform a successful Boxen run, it will automatically close all open issues. If everything went well, you should now have Google Chrome in your /Applications directory!

¡Más Puppet!

You’ll find as you start customizing all the things that you’re usually managing one of the following resources:

Packages
Files
Repositories
Plist files

We’ve covered managing a repository and a file, but let’s look at a couple of the other popular resources:

Packages are annoying

I would be willing to bet that most of the things you end up managing will be packages. Using Puppet with Boxen, you have the ability to install four different kinds of packages:

Applications inside a DMG
Installer packages inside a DMG
Homebrew Packages
Applications inside a .zip file

Here’s an example of every type of package installer:

  # Application in a DMG
  package { 'Gephi':
    ensure   => installed,
    source   => 'https://launchpadlibrarian.net/98903476/gephi-0.8.1-beta.dmg',
    provider => appdmg,
  }

  # Installer in a DMG
  package { 'Virtualbox':
    ensure => installed,
    source => 'http://download.virtualbox.org/virtualbox/4.1.22/VirtualBox-4.1.23-80870-OSX.dmg',
    provider => pkgdmg,
  }

  # Homebrew Package
  package { 'tmux':
    ensure => installed,
  }

  # Application in a .zip
  package { 'Github for Mac':
    ensure   => installed,
    source   => 'https://github-central.s3.amazonaws.com/mac%2FGitHub%20for%20Mac%2069.zip',
    provider => compressed_app
  }

Notice that the only thing that changes among these resources is the provider attribute. Remember from before that Boxen sets the default package provider to be ‘homebrew’, so for ‘tmux’ I omitted the provider attribute to utilize the default. Also, the ensure attribute is defaulted to ‘installed’, so technically I could remove it…but I tend to prefer to use it for people who will be reading my code later.

There’s no provider for .pkg files. Why? Well, packages on OS X are either bundles or flat-packages. Bundles LOOK like individual files, but they’re actually folders that contain everything necessary to expand and install the package. Flat packages are just that – an actual file that ends in .pkg that can be expanded to install whatever you want. Bundle packages are pretty common, but they’re also hard for curl to download them (being that it’s just a folder full of files) – this is why most installer packages you encounter on OS X are going to be enclosed in a .dmg Disk Image.

So which provider will you use? Well, if your file ends in .dmg then you’re going to be using either the pkgdmg or appdmg provider. How do you know which to use? Expand the .dmg file and look inside it. If it contains an application ending in .app that simply needs dragged into the /Applications folder on disk, then chose the appdmg provider (that’s essentially all it does – expand the .dmg file and ditto the .app file into /Applications). If the disk image contains a .pkg package installer, then you’ll chose the pkgdmg provider (which expands the .dmg file and uses installer to install the contents of the .pkg file silently in the background). If your file is a .zip file containing an Application (.app file), then you can use Github’s custom compressed_app provider that will unzip the file and ditto the app into /Applications. Finally, if you want to install a package from Homebrew, then the homebrew provider is pretty self-explanatory here.

(NOTE: There is ONE more package provider I haven’t covered here – the macports provider. It requires Macports to be installed on your system, and will use it to install a package. Macports vs. Homebrew arguments notwithstanding, if you’re into Macports then there’s a provider for you.)

Plists: because why NOT XML :\

Apple falls somewhere between “the registry” and “config files” on the timeline of tweaking system settings. Most settings are locked up in plist files that can be managed by hand or with plistbuddy or defaults. A couple of people have saved their customizations in with their dotfiles (Zach Holman has an example here), but Puppet is a great way for managing individual keys in your plist files. I’ve written a module that will manage any number of keys in a plist file. You can modify your Puppetfile to make sure Boxen picks up my module by adding the following line:

mod "property_list_key",  "0.1.0",   :github_tarball => "glarizza/puppet-property_list_key"

Next, you’ll need to add resources to your classes:

  # Disable Gatekeeper so you can install any package you want
  property_list_key { 'Disable Gatekeeper':
    ensure => present,
    path   => '/var/db/SystemPolicy-prefs.plist',
    key    => 'enabled',
    value  => 'no',
  }

  $my_homedir = "/Users/${::luser}"

  # NOTE: Dock prefs only take effect when you restart the dock
  property_list_key { 'Hide the dock':
    ensure     => present,
    path       => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    key        => 'autohide',
    value      => true,
    value_type => 'boolean',
    notify     => Exec['Restart the Dock'],
  }

  property_list_key { 'Align the Dock Left':
    ensure     => present,
    path       => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    key        => 'orientation',
    value      => 'left',
    notify     => Exec['Restart the Dock'],
  }

  property_list_key { 'Lower Right Hotcorner - Screen Saver':
    ensure     => present,
    path       => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    key        => 'wvous-br-corner',
    value      => 10,
    value_type => 'integer',
    notify     => Exec['Restart the Dock'],
  }

  property_list_key { 'Lower Right Hotcorner - Screen Saver - modifier':
    ensure     => present,
    path       => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    key        => 'wvous-br-modifier',
    value      => 0,
    value_type => 'integer',
    notify     => Exec['Restart the Dock'],
  }

  exec { 'Restart the Dock':
    command     => '/usr/bin/killall -HUP Dock',
    refreshonly => true,
  }

  file { 'Dock Plist':
    ensure  => file,
    require => [
                 Property_list_key['Lower Right Hotcorner - Screen Saver - modifier'],
                 Property_list_key['Hide the dock'],
                 Property_list_key['Align the Dock Left'],
                 Property_list_key['Lower Right Hotcorner - Screen Saver'],
                 Property_list_key['Lower Right Hotcorner - Screen Saver - modifier'],
               ],
    path    => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    mode    => '0600',
    notify     => Exec['Restart the Dock'],
  }

The important attributes are:

path: The path to the plist file on disk
key: The individual KEY in the plist file you want to manage
value: The value that the key should have in the plist file
value_type: The datatype the value should be (defaults to string, but could also be array, hash, boolean, or integer)

You MUST pass a path, key, and value or Puppet will throw an error.

The first resource above sets Gatekeeper in 10.8 and allows you to install packages from the web that HAVEN’T been signed (in 10.8, Apple won’t allow you to install unsigned packages or anything outside of the App Store without enabling this setting).

All of the other resources relate to making changes to the Dock. Because of the way the Dock is managed, you must HUP its process when making changes to your dock plist before they take effect. Also, the dock plist has to be owned by you or else the changes won’t take effect. Every dock plist resource has a notify metaparameter which means “any time this resource changes, run this exec resource”. That exec resource is a simple command that HUPs the dock process. It will ONLY be run if a resource notifies it – so if no changes are made in a Puppet run then the command won’t fire. Finally, the file resource to manage the dock plist ensures that permissions are set (and notifies the exec in case it needs to CHANGE permissions).

Again, this is purely dealing with Puppet – but plists are a major part of OS X and you’ll be dealing with them regularly!

But seriously, dotfiles

I know I’ve joked about it a couple of times, but getting your dotfiles into their correct location is a quick win. The secret is to lock them all up in a repository, and then symlink them where you need them. Let’s look at that:

  # My dotfile repository
  repository { "${my_sourcedir}/dotfiles":
    source => 'glarizza/dotfiles',
  }

  file { "${my_homedir}/.tmux.conf":
    ensure  => link,
    mode    => '0644',
    target  => "${my_sourcedir}/dotfiles/tmux.conf",
    require => Repository["${my_sourcedir}/dotfiles"],
  }

  file { "/Users/${my_username}/.zshrc":
    ensure  => link,
    mode    => '0644',
    target  => "${my_sourcedir}/dotfiles/zshrc",
    require => Repository["${my_sourcedir}/dotfiles"],
  }

  file { "/Users/${my_username}/.vimrc":
    ensure => link,
    mode   => '0644',
    target => "${my_sourcedir}/dotfiles/vimrc",
    require => Repository["${my_sourcedir}/dotfiles"],
  }

  # Yes, oh-my-zsh. Judge me.
  file { "/Users/${my_username}/.oh-my-zsh":
    ensure  => link,
    target  => "${my_sourcedir}/oh-my-zsh",
    require => Repository["${my_sourcedir}/oh-my-zsh"],
  }

It’s worth mentioning that Puppet does not do things procedurally. Just because the dotfiles repository is listed before every symlink DOES NOT mean that Puppet will evaluate and declare it first. You’ll need to specify order here, and that’s what the require metaparameter does.

Based on what I’ve already shown you, this code block should be very simple to follow. Because I’m using symlinks, the dotfiles should always be current. Because the dotfiles are under revision control, updating them all is as simple as making commits and updating your repository. If you’ve ever had to migrate these files to a new VM/machine, then you know how full of win this block of code is.

Don’t sweat petty (or pet sweaty)

When I show sysadmins/developers automation like this, they usually want to apply it to the HARDEST part of their day-job IMMEDIATELY. That’s a somewhat rational reaction, but it’s not going to give you the results you want. The cool thing ABOUT Boxen and Puppet is that it’s going to remove those little annoyances in your day that slowly sap your time. START by tackling those small annoyances to remove them and build your confidence (like the dotfiles example above). Yeah, you’ll only save a couple of minutes a day, but it grows exponentially. Also, when you solve a problem during the course of your day, MANAGE it with Boxen by putting it in your Puppet class (then, test it out on a VM or another machine to make sure it does what you expect).

Don’t worry that you’re not saving the world with a massive Puppet class – sometimes the secret to happiness is opening iTerm on a new machine and seeing your finely-crafted prompt shining in its cornflower-blue glory.

Now show me some cool stuff

So that’s a quick tour of the basics of Boxen and the kinds of things you can do from the start. I’m really excited for everyone to get their hands on Boxen and do more cool stuff with Puppet. I’ve done a bunch of work with Puppet for OS X, and that’s enough to know that there’s still PLENTY that can be improved in the codebase. A giant THANK YOU to John Barnette, Will Farrington, and the entire Github Boxen team for all their work on this tool (and letting me tinker with it before it hit the general public)! Feel free to comment below, email me (gary at puppetlabs), or yell at me on Twitter for more information!

Repeatable Puppet Development With Vagrant

Feb 1st, 2013

I miss testing code in production. In smaller organizations, ‘testing’ and ‘development’ can sometimes consist of making changes directly on a server, walking to an active machine, and hoping things work. Once you were done, you MIGHT document what changes you made, but more often than not you kept that information in your head and referred to it later.

I lied – that is everything that sucks about manual configuration of machines.

The best way to get out of this rut is to get addicted to automating first the menial tasks on your machines, and then work your way up from there. We STILL have the problem, though, of doing this in production – that’s what this post is meant to address.

What we want is the ability to spin up a couple of test nodes for the purpose of testing our automation workflow BEFORE it gets committed and applied to our production nodes. This post details using Vagrant and Puppet to both establish a clean test environment and also test automation changes BEFORE applying them to your production environment.

Puppet is a Configuration Management tool that automates all the annoying aspects of manual configuration out of your infrastructure. The bulk of its usage is beyond the scope of THIS post, however we’re going to be using it as the means to describe the changes we want to make on our systems.

Vagrant is a magical project that uses minimal VM templates (boxes) to spin up clean virtualized environments on your workstation for the purpose of testing changes. Currently, it only supports a Virtualbox backend, but its creator, Mitchell Hashimoto, has teased a preview of upcoming VMware integration that SHOULD be coming any day now. In this post, Vagrant will be the means by which we spin up new VMs for development purposes

Getting setup

The only moving piece you need installed on your system is Vagrant. Fortunately, Mitchell provides native package installers on his website for downloading Vagrant. If you’ve never used Vagrant before, and you AREN’T a Ruby developer who maintains multiple Ruby versions on your system, then you’ll want to opt for the native package installer since it’s the easiest method to get Vagrant installed (and, on Macs, Vagrant embeds its own Ruby AND Rubygems binaries in the Package bundle…which is kind of cool).

IF, however, you are developing in Ruby and you use RVM or Rbenv to maintain multiple copies of Ruby on your system, then you’ll want to favor installing Vagrant via Rubygems a la:

$ gem install vagrant --no-ri --no-rdoc

If you have no idea how to use RVM or Rbenv – stick with the native installers :)

Puppet does NOT need to be on your workstation since we’re only going to be using it on the VMs that Vagrant spins up – so don’t worry about Puppet yet.

My kingdom for a box

Vagrant uses box files as templates from which to spin up a new virtual machine for development purposes. There are sites that host boxes available for download, OR, you could use an awesome project called Veewee to build your own. Again, building your box file is outside the scope of this article, so just make sure you download a box with an OS that’s to your liking. This box DOES NOT need to have Puppet preinstalled – in fact, it’s probably better that it doesn’t (because the version will probably be old, and we’re going to work around this anyways). I’m going to choose a CentOS 6.3 box that the SE team at Puppet Labs uses for demos, but, again, it’s up to you.

Vagrantfile, assemble!

Now that we’ve got the pieces we need, let’s start stitching together a repeatable workflow. To do that, we’ll need to create a directory for this project and a Vagrantfile to direct Vagrant on how it should setup your VM. I’m going to use ~/src/vagrant_projects for the purpose of this demo:

$ mkdir -p ~/src/vagrant_projects
$ cd ~/src/vagrant_projects
$ vim Vagrantfile

Let’s take a look at a sample Vagrantfile that I use to get Puppet installed on a box:

Vagrantfile

Vagrant::Config.run do |config|
  config.vm.box       = "centos-6.3-x86_64"
  config.vm.box_url   = "https://saleseng.s3.amazonaws.com/boxfiles/CentOS-6.3-x86_64-minimal.box"
  config.vm.host_name = "development.puppetlabs.vm"
  config.vm.network :hostonly, "192.168.33.10"
  config.vm.forward_port 80, 8084
  config.vm.provision :shell, :path => "centos_6_x.sh"
end

Stepping through this file line-by-line, the first two config.vm lines establish the box we want to use for our development VM as well as the URL to the box file where it can be downloaded (in the event that it does not exist on our system). Because, initially, this box will NOT be known to Vagrant, it will attempt to reach out to that address and download it (note that the URL to THIS PARTICULAR BOX is subject to change – please find a box file that works for you and substitute its URL in the config.vm.box_url config setting). The next three lines define the machine’s hostname, the network type, and the IP address for this VM. In this case, I’m using a host-only network and giving it an IP address on a made-up 192.168.33.0/24 subnet (feel free to use your own private IP range as long as it doesn’t conflict with anything). The next line is forwarding port 80 on the VM to port 8084 on my local laptop – this allows you to test out web services by simply navigating to http://localhost:8084 from your web browser. I’ll save explaining the last line for the next section.

NOTE: For more documentation on these settings, visit Vagrant’s documentation site as it’s quite good

Getting Puppet on your VM

The final line in the sample Vagrantfile runs what’s called the ‘Shell Provisioner’ for Vagrant. Essentially, it runs a shell script on the VM once it’s been booted and configured. What does this shell script do?

centos_6_x.shlink

#!/usr/bin/env bash
# This bootstraps Puppet on CentOS 6.x
# It has been tested on CentOS 6.3 64bit

set -e

REPO_URL="http://yum.puppetlabs.com/el/6/products/i386/puppetlabs-release-6-6.noarch.rpm"

if [ "$EUID" -ne "0" ]; then
  echo "This script must be run as root." >&2
  exit 1
fi

if which puppet > /dev/null 2>&1; then
  echo "Puppet is already installed"
  exit 0
fi

# Install puppet labs repo
echo "Configuring PuppetLabs repo..."
repo_path=$(mktemp)
wget --output-document=${repo_path} ${REPO_URL} 2>/dev/null
rpm -i ${repo_path} >/dev/null

# Install Puppet...
echo "Installing puppet"
yum install -y puppet > /dev/null

echo "Puppet installed!"

As you can see, it sets up the Puppet Labs el6 repository containing the current packages for Puppet/Facter/Hiera/PuppetDB/etc and installs the most recent version of Puppet and Facter that are in the repository. This will ensure that you have the most recent version of Puppet on your VM, and you don’t need to worry about creating a new box every time Puppet releases a new version.

This code came from Mitchell’s puppet-bootstrap repo where he maintains a list of scripts that will bootstrap Puppet onto many of the common operating systems out there. This code was current as of the initial posting date of this blog, but make sure to check that repo for any updates. If you’re maintaining your OWN provisioning script, consider filing pull requests against Mitchell’s repo so we can ALL benefit from good code and don’t have to keep creating ‘another wheel’ just to provision Puppet on VMs!

Spin up your VM

Once you’ve created a Vagrantfile in a directory, the next logical thing to do is to test out Vagrant and fire up your VM. Let’s first check the status of the vm:

$ vagrant status

Current VM states:

default                  not created

The environment has not yet been created. Run `vagrant up` to
create the environment.

As expected, this VM has yet to be created, so let’s do that by doing a vagrant up

$ vagrant up

[default] Box centos-6.3-x86_64 was not found. Fetching box from specified
URL...
[vagrant] Downloading with Vagrant::Downloaders::HTTP...
[vagrant] Downloading box:
https://saleseng.s3.amazonaws.com/boxfiles/CentOS-6.3-x86_64-minimal.box
[vagrant] Extracting box...
[vagrant] Verifying box...
[vagrant] Cleaning up downloaded box...
[default] Importing base box 'centos-6.3-x86_64'...
[default] The guest additions on this VM do not match the install version of
VirtualBox! This may cause things such as forwarded ports, shared
folders, and more to not work properly. If any of those things fail on
this machine, please update the guest additions and repackage the
box.

Guest Additions Version: 4.1.18
VirtualBox Version: 4.1.23
[default] Matching MAC address for NAT networking...
[default] Clearing any previously set forwarded ports...
[default] Forwarding ports...
[default] -- 22 => 2222 (adapter 1)
[default] -- 80 => 8084 (adapter 1)
[default] Creating shared folders metadata...
[default] Clearing any previously set network interfaces...
[default] Preparing network interfaces based on configuration...
[default] Booting VM...
[default] Waiting for VM to boot. This can take a few minutes.
[default] VM booted and ready for use!
[default] Configuring and enabling network interfaces...
[default] Setting host name...
[default] Mounting shared folders...
[default] -- v-root: /vagrant
[default] Running provisioner: Vagrant::Provisioners::Shell...
Configuring PuppetLabs repo...
warning: 
/tmp/tmp.FvW0K7FJWU: Header V4 RSA/SHA1 Signature, key ID 4bd6ec30: NOKEY
Installing puppet
warning: 
rpmts_HdrFromFdno: Header V4 RSA/SHA1 Signature, key ID 4bd6ec30: NOKEY
Importing GPG key 0x4BD6EC30:
 Userid : Puppet Labs Release Key (Puppet Labs Release Key) <info@puppetlabs.com>
 Package: puppetlabs-release-6-6.noarch (installed)
 From   : /etc/pki/rpm-gpg/RPM-GPG-KEY-puppetlabs
Warning: RPMDB altered outside of yum.
Puppet installed!

Vagrant first noticed that we did not have the CentOS box on our machine, so it downloaded, extracted, and verified the box before importing it and creating our custom VM. Next, it configured the VM’s network settings according to our Vagrantfile, and finally it provisioned the box using the script we passed in the Vagrantfile.

We’ve now got a VM running and Puppet is installed. Let’s ssh to our VM and check the Puppet Version:

$ vagrant ssh

Last login: Tue Jul 10 22:56:01 2012 from 10.0.2.2
[vagrant@development ~]$ puppet --version
3.0.2
[vagrant@development ~]$ hostname
development.puppetlabs.vm
[vagrant@development ~]$ exit
logout
Connection to 127.0.0.1 closed.

$ vagrant destroy -f
[default] Forcing shutdown of VM...
[default] Destroying VM and associated drives...

Cool – so we demonstrated that we could ssh into the VM, check the Puppet version, check the hostname to ensure that Vagrant had set it correctly, exit out, and then we finally destroyed the VM with vagrant destroy -f. The next step is to actually configure Puppet to DO something with this VM…

Using Puppet to setup your node

The act of GETTING a clean VM is all well and good (and is probably magic enough for most people out there), but the purpose of this post is to demonstrate a workflow for testing out Puppet code changes. In the previous step we showed how to get Puppet installed, but we’ve yet to demonstrate how to use Vagrant’s built-in Puppet provisioner to configure your VM. Let’s use the example of a developer wanting to spin up a LAMP stack. To manually configure that would require installing a number of packages, editing a number of config files, and then making sure services were installed (among other things). We’re going to use some of the Puppet modules from the Puppet Forge to tackle these tasks and make Vagrant automatically configure our VM.

Scaffolding Puppet

We need a way to pass our Puppet code to the VM Vagrant creates. Fortunately, Vagrant has a way to define Shared Folders that can be shared from your workstation and mounted on your VM at a particular mount point. Let’s modify our Vagrantfile to account for this shared folder:

Vagrantfile

Vagrant::Config.run do |config|
  config.vm.box       = "centos-6.3-x86_64"
  config.vm.box_url   = "https://saleseng.s3.amazonaws.com/boxfiles/CentOS-6.3-x86_64-minimal.box"
  config.vm.host_name = "development.puppetlabs.vm"
  config.vm.network :hostonly, "192.168.33.10"
  config.vm.forward_port 80, 8084
  config.vm.provision :shell, :path => "centos_6_x.sh"

  # Puppet Shared Folder
  config.vm.share_folder "puppet_mount", "/puppet", "puppet"
end

The syntax for the config.vm.share_folder line is that the first argument is a logical name for the shared folder mapping, the second argument is the path IN THE VM where this folder will be mounted (so, a folder called ‘puppet’ in the root of the filesystem), and the last argument is the path to the folder ON YOUR WORKSTATION that will be mounted in the VM (it can be a full or relative path – which is what we’ve done here). This folder hasn’t been created yet, so let’s create it (and a couple of subfolders):

$ cd ~/src/vagrant_projects
$ mkdir -p puppet/{manifests,modules}

This command will create the puppet directory in the same directory that contains our Vagrantfile, and then two subdirectories, manifests and modules, that will be used by the Puppet provisioner later. Now that we’ve told Vagrant to create our shared folder, and we’ve created the folder structure, let’s bring up the VM with vagrant up again, ssh into the VM with vagrant ssh, and then check to see that the folder has been mounted.

$ vagrant up

<output suppressed - see above for example output>

$ vagrant ssh

Last login: Tue Jul 10 22:56:01 2012 from 10.0.2.2
[vagrant@development ~]$ ls /puppet
manifests  modules

Great! We’ve setup a shared folder. To further test it out, you can try dropping a file in the puppet directory or one of its subdirectories – it should immediately show up on the VM without having to recreate the VM (because it’s a shared folder). There are pros and cons with this workflow – the main pro is that changes you make on your workstation will immediately be reflected in the VM, and the main con is that you can’t symlink folders INSIDE the shared folder on your workstation because of the nature of symlinks.

Installing the necessary Puppet Modules

Since we’ve already spun up a new VM and ssh’d into it, let’s use our VM to download modules we’re going to need to setup our LAMP stack:

[vagrant@development ~]$ puppet module install puppetlabs/apache --target-dir /puppet/modules/
Notice: Preparing to install into /puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/puppet/modules
└─┬ puppetlabs-apache (v0.5.0-rc1)
  ├── puppetlabs-firewall (v0.0.4)
  └── puppetlabs-stdlib (v3.2.0)

[vagrant@development ~]$ puppet module install puppetlabs/mysql --target-dir /puppet/modules/
Notice: Preparing to install into /puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/puppet/modules
└── puppetlabs-mysql (v0.6.1)

[vagrant@development ~]$ ls /puppet/modules/
apache  concat  firewall  mysql  stdlib

The puppet binary has a module subcommand that will connect to the Puppet Forge to download Puppet modules and their dependencies. The commands we used will install Puppet Labs’ apache and mysql modules (and their dependencies). We’re also passing the --target-dir argument that will tell the puppet module subcommand to install the module into our shared directory (instead of Puppet’s default module path).

I’m choosing to use puppet module to install these modules, but there are a multitude of other methods you can use (from downloading the modules directly out of Github to using a tool like librarian-puppet). The point is that we need to ultimately get the modules into the modules directory in our shared puppet folder – however you want to do that works for me :)

Once the modules are in puppet/modules, we’re good. You only ever need to do this step ONCE. Because this folder is a shared folder, you can now vagrant up and vagrant destroy to your heart’s content – Vagrant will not remove the content in our shared folder when a VM is destroyed. Remember, too, that any changes made to those modules from either the VM or on your Workstation will be IMMEDIATELY available to both.

Since we’re now done with the VM for now, let’s destroy it with vagrant destroy

$ vagrant destroy

Classifying your development VM

The modules we installed are a framework that we will use to configure the node. The act of directing the actions that Puppet should take on a particular node is called ‘Classification’. Puppet uses a file called site.pp to map Puppet code with the corresponding ‘node’ (or, in our case, our VM) that should receive it. Let’s create a site.pp file and open it for editing:

$ cd ~/src/vagrant_projects
$ vim puppet/manifests/site.pp

Let’s create a site.pp that will setup the LAMP stack on our development.puppetlabs.vm that we create with Vagrant:

~/src/vagrant_projects/manifests/site.pp

node 'development.puppetlabs.vm' {
  # Configure mysql
  class { 'mysql::server':
    config_hash => { 'root_password' => '8ZcJZFHsvo7fINZcAvi0' }
  }
  include mysql::php

  # Configure apache
  include apache
  include apache::mod::php
  apache::vhost { $::fqdn:
    port    => '80',
    docroot => '/var/www/test',
    require => File['/var/www/test'],
  }

  # Configure Docroot and index.html
  file { ['/var/www', '/var/www/test']:
    ensure => directory
  }

  file { '/var/www/test/index.php':
    ensure  => file,
    content => '<?php echo \'<p>Hello World</p>\'; ?> ',
  }

  # Realize the Firewall Rule
  Firewall <||>
}

Again, the point of this post is not about writing Puppet code but more about testing the Puppet code you write. The above node declaration will setup MySQL with a root password of ‘puppet’, setup Apache and a VHost for development.puppetlabs.vm with a docroot out of /var/www/test, setup an index.php file for Apache, and setup a Firewall rule to allow access through to port 80 on our VM.

Setting up the Puppet provisioner for Vagrant

We’re going to have to modify our Vagrantfile one more time to tell Vagrant to use the Puppet provisioner to execute our Puppet code and setup our VM:

Vagrantfile

Vagrant::Config.run do |config|
  config.vm.box       = "centos-6.3-x86_64"
  config.vm.box_url   = "https://saleseng.s3.amazonaws.com/boxfiles/CentOS-6.3-x86_64-minimal.box"
  config.vm.host_name = "development.puppetlabs.vm"
  config.vm.network :hostonly, "192.168.33.10"
  config.vm.forward_port 80, 8084
  config.vm.provision :shell, :path => "centos_6_x.sh"

  # Puppet Shared Folder
  config.vm.share_folder "puppet_mount", "/puppet", "puppet"

  # Puppet Provisioner setup
  config.vm.provision :puppet do |puppet|
    puppet.manifests_path = "puppet/manifests"
    puppet.module_path    = "puppet/modules"
    puppet.manifest_file  = "site.pp"
  end
end

Notice the block for the Puppet provisioner that sets up the manifest path (i.e. where to find site.pp), the module path (i.e. where to find our Puppet modules), and the name of our manifest file (i.e. site.pp). Again, this is all documented on the Vagrant documentation page should you need to use it for reference.

This bumps the number of provisioners in our Vagrantfile to two, but which one goes first? Vagrant will iterate through the Vagrantfile procedurally, so the Shell provisioner will always get checked first and then the Puppet provisioner will get checked second. This allows us to be certain that Puppet will always be installed before attempting to use the Puppet provisioner. You could continue to add as many provisioning blocks as you like – Vagrant will iterate through them procedurally as it encounters them.

Give the entire workflow a try

Now that we have our Vagrantfile finalized, our Puppet directory structure setup, our Puppet modules installed, and our site.pp file set to classify our new VM, let’s actually let Vagrant do what it does best and setup our VM:

$ vagrant up

You should see Vagrant use the Shell provisioner to install Puppet, hand off to the Puppet provisioner, and then use Puppet to setup a LAMP stack on our VM. After everything completes, try visiting http://localhost:8084 in your web browser and see if you get a shiny “Hello World” staring back at you. If you do – Awesome! If you don’t, check the error messages to determine if there are typos in the Puppet code or if something went wrong in the Vagrantfile.

Where do you take it from here?

The first thing to do is to take the Vagrantfile you’ve created and put it under revision control so you can track the changes you make. I personally have a couple of workflows up on Github that I use as templates when I’m testing out something new. You’ll probably find that your Vagrantfile won’t change much – just the modules you use for testing.

Now that you understand the pattern, you can expand it to fit your workflow. Single-vm projects are great when you’re testing a specific component, but the next logical step is to test out multi-tiered components/applications. In these instances, Vagrant has the ability to spin up multiple VMs from a single Vagrantfile. That workflow saves a TON of time and lets you create your own private network of VMs for the purpose of simulating changes. That’s a post for another time, though…

Get involved

Stay tuned to the Vagrant website for updates on the VMware provisioner. Stability with Virtualbox has notoriously been an issue, but, as of this posting, things have been relatively rock-solid for me (using Virtualbox version 4.1.23 on OS X).

If you want to keep up-to-date on all things Vagrant, follow Mitchell on Twitter, check out #vagrant on Freenode, join the Vagrant list, and check out Google for what other folks have done!

A GIANT thank you to Mitchell Hashimoto for all the work he’s done on Vagrant – I can’t count the number of hours it’s saved me personally (let ALONE everyone at Puppet Labs!

← Older Blog Archives Newer →

Puppet Environments

One step further – ‘dynamic’ environments

Module repositories: the one-to-many problem

Enter R10k

Setting up R10k

R10k demonstration – from module iteration to environment iteration

Add the module to an environment

Perform an R10k synchronization

Rename the branch to avoid confusion

Run Puppet to test the new environment

Tie a module version to an environment

Merge your changes with master/production

Making a change to an EXISTING module in an environment

OR, use tags

Holy crap, that’s a lot to take in…

Profiles: technology-specific wrapper classes

Name your profiles according to the technology they setup

Do all Hiera lookups in the profile

Use parameterized class declarations and explicitly pass values you care about

An annoying Puppet bug – top-level class declarations and profiles

Roles: business-specific wrapper classes

Roles ONLY include profiles

Every node is classified with one role. Period.

Roles CAN use inheritance…if you like

A role similar, yet different, from another role is: a new role

Review: what does this get you?

Choose your level of comfortability

Great! Now go and refactor…

It all starts with the component module

Parameters are your API

Inherit the ::params class

Do NOT do Hiera lookups in your component modules!

Keep your component modules generic

Don’t play the “what if” game

Store your modules in version control

Best practices are shit

Something, something, puppet resource

self.instances

An important note about scope and self.instances

Building a provider that uses self.instances (or: more Mac problems)

Helper methods

Assembling self.instances

Existance, @property_hash, and more magical methods

A @property_hash is born…

Getter methods – the slow way

Getter methods – the quicker ‘method’

JUST enough for puppet resource

Prefetching, flushing, caching, and other hard shit

self.prefetch

Flush it; Ship it

The only ‘setter’ method you need

The final method

The complete provider:

Final Thoughts

Data and Code Separation == bliss?

“What the hell does and does NOT belong in Hiera?”

Puppet data models

The params class pattern

Pros:

Cons:

Hiera defaults pattern

Pros:

Cons:

Hybrid data model

Pros:

Cons:

Hiera data bindings in Puppet 3.x.x

Roles and Profiles

Pros:

Cons:

Data in Puppet Modules

But module-specific data goes inside the params class and business-specific data goes inside Hiera, right?

Great, ANOTHER hierarchy to traverse for data – that’s going to get confusing

Sure, but this is outside of Puppet and you’re losing visibility inside Puppet with your data

Setting it up

Testing data-in-modules

Testing outside of Puppet

The scorecard for data-in-modules

Pros:

Cons:

Inherit the `::params` class

Something, something, `puppet resource`

`self.instances`

An important note about scope and `self.instances`

Building a provider that uses `self.instances` (or: more Mac problems)

Assembling `self.instances`

Existance, `@property_hash`, and more magical methods

A `@property_hash` is born…

JUST enough for `puppet resource`

`self.prefetch`

Method: `exists?`

Method: `create`

Method: `destroy`

Getter method: `domains`

Setter method: `domains=`