Shit Gary Says

...things I don't want to forget

On R10k and ‘Environments’

There have been more than a couple of moments where I’m on-site with a customer who asks a seemingly simple question and I’ve gone “Oh shit; that’s a great question and I’ve never thought of that…” Usually that’s followed by me changing up the workflow and immediately regretting things I’ve done on prior gigs. Some people call that ‘agile’; I call it ‘me not having the forethought to consider conditions properly’.

‘Environment’, like ‘scaling’, ‘agent’, and ‘test’, has many meanings

It’s not a secret that we’ve made some shitty decisions in the past with regard to naming things in Puppet (and anyone who asks me what puppet agent -t stands for usually gets a heavy sigh, a shaken head, and an explanation emitted in dulcet, apologetic tones). It’s also very easy to conflate certain concepts that unfortunately share very common labels (quick – what’s the difference between properties and parameters, and give me the lowdown on MCollective agents versus Puppet agents!).

And then we have ‘environments’ + Hiera + R10k.

Puppet ‘environments’

Puppet has the concept of ‘environments’, which, to me, exist to provide a means of compiling a catalog using different paths to Puppet modules on the Puppet master. Using a Puppet environment is the same as saying “I made some changes to my tomcat class, but I don’t want to push it DIRECTLY to my production machines yet because I don’t drink Dos Equis. It would be great if I could stick this code somewhere and have a couple of my nodes test how it works before merging it in!”

Puppet environments suffer some ‘seepage’ issues, which you can read about here, but do a reasonable job of quickly testing out changes you’ve made to the Puppet DSL (as opposed to custom plugins, as detailed in the bug). Puppet environments work well when you need a pipeline for testing your Puppet code (again, when you’re refactoring or adding new functionality), and using them for that purpose is great.

Internal ‘environments’

What I consider ‘internal environments’ have a couple of names – sometimes they’re referred to as application or deployment gateways, sometimes as ‘tiers’, but in general they’re long-term groupings that machines/nodes are attached to (usually for the purpose of phased-out application deployments). They frequently have names such as ‘dev’, ‘test’, ‘prod’, ‘qa’, ‘uat’, and the like.

For the purpose of distinguishing them from Puppet environments, I’m going to refer to them as ‘application tiers’ or just ‘tiers’ because, fuck it, it’s a word.

Making both of them work

The problems with having Puppet environments and application tiers are:

  • Puppet environments are usually assigned to a node for short periods of time, while application tiers are usually assigned to a node for the life of the node.
  • Application tiers usually need different bits of data (i.e. NTP server addresses, versions of packages, etc), while Puppet environments usually use/involve differences to the Puppet DSL.
  • Similarly to the first point, the goal of Puppet environments is to eventually merge code differences into the main production Puppet environment. Application tiers, however, may always have differences about them and never become unified.

You can see where this would be problematic – especially when you might want to do things like use different Hiera values between different application tiers, but you want to TEST out those values before applying them to all nodes in an application tier. If you previously didn’t have a way to separate Puppet environments from application tiers, and you used R10k to generate Puppet environments, you would have things like long-term branches in your repositories that would make it difficult/annoying to manage.

NOTE: This is all assuming you’re managing component modules, Hiera data, and Puppet environments using R10k.

The first step in making both monikers work together is to have two separate variables in Puppet – namely $environment for Puppet environments, and something ELSE (say, $tier) for the application tier. The “something else” is going to depend on how your workflow works. For example, do you have something centrally that can correlate nodes to the tier in which they belong? If so, you can write a custom fact that will query that service. If you don’t have this magical service, you can always just attach an application tier to a node in your classification service (i.e. the Puppet Enterprise Console or Foreman). Failing both of those, you can look to external facts. External Fact support was introduced into Facter 1.7 (but Puppet Enterprise has supported them through the standard lib for quite awhile). External facts give you the ability to create a text file inside the facts.d directory in the format of:

1
2
tier=qa
location=portland

Facter will read this text file and store the values as facts for a Puppet run, so $tier will be qa and $location will be portland. This is handy for when you have arbitrary information that can’t be easily discovered by the node, but DOES need to be assigned for the node on a reasonably consistent basis. Usually these files are created during the provisioning process, but can also be managed by Puppet. At any rate, having $environment and $tier available allow us to start to make decisions based on the values.

Branch with $environment, Hiera with $tier

Like we said above, Puppet environments are frequently short-term assignments, while application tiers are usually long-term residencies. Relating those back to the R10k workflow: branches to the main puppet repo (containing the Puppetfile) are usually short-lived, while data in Hiera is usually longer-lived. It would then make sense that the name of the branches to the main puppet repo would resolve to being $environment (and thus the Puppet environment name), and $tier (and thus the application tier) would be used in the Hiera hierarchy for lookups of values that would remain different across application tiers (like package versions, credentials, and etc…).

Wins:

  • Puppet environment names (like repository branch names) become relatively meaningless and are the “means” to the end of getting Puppet code merged into the PUPPET CODE’s production branch (i.e. code that has been tested to work across all application tiers)
  • Puppet environments become short lived and thus have less opportunity to deviate from the main production codebase
  • Differences across application tiers are locked in one place (Hiera)
  • Differences to Puppet DSL code (i.e. in Manifests) can be pushed up to the profile level, and you have a fact ($tier) to catch those differences.

The ultimate reason why I’m writing about this is because I’ve seen people try to incorporate both the Puppet environment and application tier into both the environment name and/or the Hiera hierarchy. Many times, they run into all kinds of unscalable issues (large hierarchies, many Puppet environments, confusing testing paths to ‘production’). I tend to prefer this workflow choice, but, like everything I write about, take it and model it toward what works for you (because what works now may not work 6 months from now).

Thoughts?

Like I said before, I tend to discover new corner cases that change my mind on things like this, so it’s quite possible that this theory isn’t the most solid in the world. It HAS helped out some customers to clean up their code and make for a cleaner pipeline, though, and that’s always a good thing. Feel free to comment below – I look forward to making the process better for all!

Building a Functional Puppet Workflow Part 3b: More R10k Madness

In the last workflows post, I talked about dynamic Puppet environments and introduced R10k, which is an awesome tool for mapping modules to their environments which are dynamically generated by git branches. I didn’t get out everything I wanted to say because:

  • I was tired of that post sitting stale in a Google Doc
  • It was already goddamn long

So because of that, consider this a continuation of that previous monstrosity that talks about additional uses of R10k beyond the ordinary

Let’s talk Hiera

But seriously, let’s not actually talk about what Hiera does since there are better docs out there for that. I’m also not going to talk about WHEN to use Hiera because I’ve already done that before. Instead, let’s talk about a workflow for submitting changes to Hiera data and testing it out before it enters into production.

Most people store their Hiera data (if they’re using a backend that reads Hiera data from disk anyways) in separate repos as their Puppet repo. Some DO tie the Hiera datadir folder to something like the main Puppet repo that houses their Puppetfie (if they’re using R10k), but for the most part it’s a separate repo because you may want separate permissions for accessing that data. For the purposes of this post, I’m going to refer to a repository I use for storing Hiera data that’s out on Github.

The next logical step would be to integrate that Hiera repo into R10k so R10k can track and create paths for Hiera data just like it did for Puppet.

NOTE: Fundamentally, all that R10k does is checkout modules to a specific path whose folder name comes from a git branch. PUPPET ties its environment to this folder name with some puppet.conf trickery. So, to say that R10k “creates dynamic environments” is the end-result, but not the actual job of the tool.

We COULD add Hiera’s repository to the /etc/r10k.yaml file to track and create folders for us, and if we did it EXACTLY like we did for Puppet we would most definitely run into this R10k bug (AND, it comes up again in this bug).

UPDATE: So, I originally wrote this post BEFORE R10k version 1.1.4 was released. Finch released version 1.1.4 which FIXES THESE BUGS…so the workflow I’m going to describe (i.e. using prefixing to solve the problem of using multiple repos in /etc/r10k.yaml that could possibly share branch names) TECHNICALLY does NOT need to be followed ‘to the T’, as it were. You can disable prefixing when it comes to that step, and modify /etc/puppetlabs/puppet/hiera.yaml so you don’t prepend ‘hiera_’ to the path of each environment’s folder, and you should be totally fine…you know, as long as you use version 1.1.4 or greater of R10k. So, be forewarned

The issue is those bugs is that R10k collects the names of ALL the environments from ALL the sources at once, so if you have multiple source repositories and they share branch names, then you have clashes (since it only stores ONE branch name internally). The solution that Finch came up with was prefixing (or, prefixing the name of the branch with the name of the source). When you prefix, however, it creates a folder on-disk that matches the prefixed name (e.g. NameOfTheSource_NameOfTheBranch ). This is actually fine since we’ll catch it and deal with it, but you should be aware of it. Future versions of R10k may most likely deal with this in a different manner, so make sure to check out the R10k docs before blindly copying my code, okay? (Update: See the previous, bolded paragraph where I describe how Finch DID JUST THAT).

In the previous post I setup a file called r10k_installation.pp to setup R10k. Let’s revisit that manifest it and modify it for my Hiera repo:

/var/tmp/r10k_installation.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class { 'r10k':
  version           => '1.1.4',
  sources           => {
    'puppet' => {
      'remote'  => 'https://github.com/glarizza/puppet_repository.git',
      'basedir' => "${::settings::confdir}/environments",
      'prefix'  => false,
    },
    'hiera' => {
      'remote'  => 'https://github.com/glarizza/hiera_environment.git',
      'basedir' => "${::settings::confdir}/hiera",
      'prefix'  => true,
    }
  },
  purgedirs         => ["${::settings::confdir}/environments"],
  manage_modulepath => true,
  modulepath        => "${::settings::confdir}/environments/\$environment/modules:/opt/puppet/share/puppet/modules",
}

NOTE: For the duration of this post, I’ll be referring to Puppet Enterprise specific paths (like /etc/puppetlabs/puppet for $confdir). Please do the translation for open source Puppet, as R10k will work just fine with either the open source edition or the Enterprise edition of Puppet

You’ll note that I added a source called ‘hiera’ that tracks my Hiera repository, creates sub-folders in /etc/puppetlabs/puppet/hiera, and enables prefixing to deal with the bug I mentioned in the previous paragraph. Now, let’s run Puppet and do an R10k synchronization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
[root@master1 garysawesomeenvironment]# puppet apply /var/tmp/r10k_installation.pp
Notice: Compiled catalog for master1 in environment production in 1.78 seconds
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/content: content changed '{md5}c686917fcb572861429c83f1b67cfee5' to '{md5}69d38a14b5de0d9869ebd37922e7dec4'
Notice: Finished catalog run in 1.24 seconds

[root@master1 puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_testing
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/hiera

[root@master1 puppet]# ls /etc/puppetlabs/puppet/hiera
hiera_master  hiera_production  hiera_testing

[root@master1 puppet]# ls /etc/puppetlabs/puppet/environments/
development  garysawesomeenvironment  master  production

Great, so it configured R10k to clone the Hiera repository to /etc/puppetlabs/puppet/hiera like we wanted it to, and you can see that with prefixing enabled we have folders named “hiera_${branchname}”.

In Puppet, the magical connection that maps these subfolders to Puppet environments is in puppet.conf, but for Hiera that’s the hiera.yaml file. I’ve included that file in my Hiera repo, so let’s look at the copy at /etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml:

/etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml
1
2
3
4
5
6
7
8
9
10
---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - global

:yaml:
  :datadir: '/etc/puppetlabs/puppet/hiera/hiera_%{environment}/hieradata'

The magical line is in the :datadir: setting of the :yaml: section; it uses %{environment} to evaluate the environment variable set by Puppet and set the path accordingly.

As of right now R10k is configured to clone Hiera data from a known repository to /etc/puppetlabs/puppet/hiera, to create sub-folders based on branches to that repository, and to tie data provided to each Puppet environment to the respective subfolder of /etc/puppetlabs/puppet/hiera that matches the pattern of “hiera_(environment_name)”.

The problem with hiera.yaml

You’ll notice that each subfolder to /etc/puppetlabs/puppet/hiera contains its own copy of hiera.yaml. You’re probably drawing the conclusion that each Puppet environment can read from its own hiera.yaml for Hiera configuration.

And you would be wrong.

For information on this bug, check out this link. You’ll see that we provide a ‘hiera_config’ configuration option in Puppet that allows you to specify the path to hiera.yaml, but Puppet loads that config as singleton, which means that it’s read initially when the Puppet master process starts up and it’s NOT environment-aware. The workaround is to use one hiera.yaml for all environments on a Puppet master but to dynamically change the :datadir: path according to the current environment (in the same way that dynamic Puppet environments abuse ‘$environment’ in puppet.conf). You gain the ability to have per-environment changes to Hiera data but lose the ability to do things like using different hierarchies for different environments. As of right now, if you want a different hierarchy then you’re going to need to use a different master (or do some hacky things that I don’t even want to BEGIN to approach in this article).

In summary – there will be a hiera.yaml per environment, but they will not be consulted on a per-environment basis.

Workflow for per-environment Hiera data

Looking back on the previous post, you’ll see that the workflow for updating Hiera data is identical to the workflow for updating code to your Puppet environments. Namely, to create a new environment for testing Hiera data, you will:

  • Push a branch to the Hiera repository and name it accordingly (remembering that the name you choose will be a new environment).
  • Run R10k to synchronize the data down to the Puppet master
  • Add your node to that environment and test out the changes

For existing environments, simply push changes to that environment’s branch and repeat the last two steps.

NOTE: Puppet environments and Hiera environments are linked – both tools use the same ‘environment’ concept and so environment names MUST match for the data to be shared (i.e. if you create an environment in Puppet called ‘yellow’, you will need a Hiera environment called ‘yellow’ for that data).

This tight-coupling can cause issues, and will ultimately mean that certain branches are longer-lived than others. It’s also the reason why I don’t use defaults in my hiera() lookups inside Puppet manifests – I WANT the early failure of a compilation error to alert me of something that needs fixed.

You will need to determine whether this tight-coupling is worth it for your organization to tie your Hiera repository directly into R10k or to handle it out-of-band.

R10k and monolithic module repositories

One of the first requirements you encounter when working with R10k is that your component modules need to be stored in their own repositories. That convention is still relatively new – it wasn’t so long ago that we were recommending that modules be locked away in a giant repo. Why?

  • It’s easier to clone
  • The state of module reusability was poor

The main reason was that it was easier to put everything in one repo and clone it out on all your Puppet master servers. This becomes insidious as your module count rises and people start doing lovely things like committing large binaries into modules, pulling in old versions of modules they find out on the web, and the like. It also becomes an issue when you start needing to lock committers out of specific directories due to sensitive data, and blah blah blah blah…

There are better posts out there justifying/villafying the choice of one or multiple repositories, this section’s meant only to show you how to incorporate a single repository containing multiple modules into your R10k workflow.

From the last post you’ll remember that the Puppetfile allows you to tie a repository, and some version reference, to a directory using R10k. Incorporating a monolithic repository starts with an entry in the Puppetfile like so:

Puppetfile
1
2
3
mod "my_big_module_repo",
  :git => "git://github.com/glarizza/my_big_module_repo.git",
  :ref => '1.0.0'

NOTE: That git repository doesn’t exist. I don’t HAVE a monolithic repo to demonstrate, so I’ve chosen an arbitrary URI. Also note that you can use ANY name you like after the mod syntax to name the resultant folder – it doesn’t HAVE to mirror the URI of the repository.

Adding this entry to the Puppetfile would checkout that repository to wherever all the other modules are checked out with a folder name of ‘my_big_module_repo’. Within that folder would most-likely (again, depending on how you’ve laid out your repository) contain subfolders containing Puppet modules. This entry gets the modules onto your Puppet master, but it doesn’t make Puppet aware of their location. For that, we’re going to need to add an entry to the ‘modulepath’ configuration item in puppet.conf

Inside /etc/puppetlabs/puppet/puppet.conf you should see a configuration item called ‘modulepath’ that currently has a value of:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules

The modulepath itself works like a PATH environment variable in Linux – it’s a priority-based lookup mechanism that Puppet uses to find modules. Currently, Puppet will first look in /etc/puppetlabs/puppet/environments/$environment/modules for a module. If a the module that Puppet was looking for was found, Puppet will use it and not inspect the second path. If the module was not found at the FIRST path, it will inspect the second path. Failing to find the module at the second path results in a compilation error for Puppet. Using this to our advantage, we can add the path to the monolithic repository checked-out by the Puppetfile AFTER the path to where all the individual modules are checked-out. This should look something like this:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/etc/puppetlabs/puppet/environments/$environment/modules/my_big_module_repo:/opt/puppet/share/puppet/modules

Note: This assumes all modules are in the root of the monolithic repo. If they’re in a subdirectory, you must adjust accordingly

That’s a huge line (and if you’re afraid of anything over 80 column-widths then I’m sorry…and you should probably buy a new monitor…and the 80s are over), but the gist is that we’re first going to look for modules checked out by R10k, THEN we’re going to look for modules in our monolithic repo, then we’re going to look in Puppet Enterprise’s vendored module directory, and finally, like I said above, we’ll fail if we can’t find our module. This will allow you to KEEP using your monolithic repository and also slowly cut modules inside that monolithic repo over to their own repositories (since when they gain their own repository, they will be located in a path that COMES before the monolithic repo, and thus will be given priority).

Using MCollective to perform R10k synchronizations

This section is going to be much less specific than the rest because the piece that does the ACTION is part of a module for R10k. As of the time of this writing, this agent is in one state, but that could EASILY change. I will defer to the module in question (and specifically its README file) should you need specifics (or if my module is dated). What I CAN tell you, however, is that the R10k module does come with a class that will setup and configure both an MCollective agent for R10k and also a helper application that should make doing R10k synchroniations on multiple Puppet masters much easier than doing them by hand. First, you’ll need to INSTALL the MCollective agent/application, and you can do that by pulling down the module and its dependencies, and classifying all Puppet masters with R10k enabled by doing the following:

1
include r10k::mcollective

Terribly difficult, huh? With that, both the MCollective agent and application should be available to MCollective on that node. The way to trigger a syncronization is to login to an account on a machine that has MCollective client access (in Puppet Enterprise, this would be any Puppet master that’s allowed the role, and then, specifically, the peadmin user…so doing a su - peadmin should afford you access to that user), and perform the following command:

1
mco r10k deploy

This is where the README differs a bit, and the reason for that is because Finch changed the syntax that R10k uses to synchronize and deploy modules to a Master. The CURRENTLY accepted command (because, knowing Finch, that shit might change) is r10k deploy environment -p, and the action to the MCollective agent that EXECUTES that command is the ‘deploy’ action. The README refers to the ‘synchronize’ action, which executes the r10k synchronize command. This command MAY STILL WORK, but it’s deprecated, and so it’s NOT recommended to be used.

Like I said before, this agent is subject to change (mainly do to R10k command deprecation and maturation), so definitely refer to the README and the code itself for more information (or file issues and pull requests on the module repo directly).

Tying R10k to CI workflows

I spent a year doing some presales work for the Puppet Labs SE team, so I can hand-wave and tapdance like a motherfucker. I’m going to need those skills for this next section, because if you thought the previous section glossed over the concepts pretty quickly and without much detail, then this section is going to feel downright vaporous (is that a word? Fuck it; I’m handwaving – it’s a word). I really debated whether to include the following sections in this post because I don’t really give you much specific information; it’s all very generic and full of “ideas” (though I do list some testing libraries below that are helpful if you’ve never heard of them). Feel free to abandon ship and skip to the FINAL section right now if you don’t want to hear about ‘ideas’.

For the record, I’m going to just pick and use the term “CI” when I’m referring to the process of automating the testing and deployment of, in this case, Puppet code. There have definitely been posts arging about which definition is more appropriate, but, frankly, I’m just going to pick a term and go with it,

The issue at hand is that when you talk “CI” or “CD” or “Continuous (fill_in_the_blank)”, you’re talking about a workflow that’s tailored to each organization (and sometimes each DEPARTMENT of an organization). Sometimes places can agree on a specific tool to assist them with this process (be it Jenkins, Hudson, Bamboo, or whatever), but beyond that it’s anyone’s game.

Since we’re talking PUPPET code, though, you’re restricted to certain tasks that will show up in any workflow…and THAT is what I want to talk about here.

To implement some sort of CI workflow means laying down a ‘pipeline’ that takes a change of your Puppet code (a new module, a change to an existing module, some Hiera data updates, whatever) from the developer’s/operations engineer’s workstation right into production. The way we do this with R10k currently is to:

  • Make a change to an individual module
  • Commit/push those changes to the module’s remote repository
  • Create a test branch of the puppet_repository
  • Modify the Puppetfile and tie your module’s changes to this environment
  • Commit/push those changes to the puppet_repository
  • Perform an R10k synchronization
  • Test
  • Repeat steps 1-7 as necessary until shit works how you like it
  • Merge the changes in the test branch of the puppet_repository with the production branch
  • Perform an R10k synchronization
  • Watch code changes become active in your production environment

Of those steps, there’s arguably about 3 unique steps that could be automated:

  • R10k synchronizations
  • ‘Testing’ (whatever that means)
  • Merging the changes in the test branch of the puppet_repository with the production branch

NOTE: As we get progressively-more-handwavey (also probably not a word, but fuck it – let’s be thought leaders and CREATE IT), each one of these steps is going to be more and more…generic. For example – to say “test your code” is a great idea, but, seriously, defining how to do that could (and should) be multiple blog posts.

Laying down the pipeline

If I were building an automated workflow, the first thing I would do is setup something like Jenkins and configure it to watch the puppet_repository that contains the Puppetfile mapping all my modules and versions to Puppet environments. On changes to this repository, we want Jenkins to perform an R10k synchronization, run tests, and then, possibly, merge those changes into production (depending on the quality of your tests and how ‘webscale’ you think you are on that day).

R10k synchronizations

If you’re paying attention, we solved this problem in the previous section with the R10k MCollective agent. Jenkins should be running on a machine that has the ability to execute MCollective client commands (such as triggering mco r10k deploy when necessary). You’ll want to tailor your calls from Jenkins to only deploy environments it’s currently testing (remember in the puppet_repository that topic branches map to Puppet environments, so this is a per-branch action) as opposed to deploying ALL environments every time.

Also, if you’re buiding a pipeline, you might not want to do R10k synchronizations on ALL of your Puppet Masters at this point. Why not? Well, if your testing framework is good enough and has sufficient coverage that you’re COMPLETELY trusting it to determine whether code is acceptable or not, then this is just the FIRST step – making the code available to be tested. It’s not passed tests yet, so pushing it out to all of your Puppet masters is a bit wasteful. You’ll probably want to only synchronize with a single master that’s been identified for testing (and a master that has the ability to spin up fresh nodes, enforce the Puppet code on them, submit those nodes to a battery of tests, and then tear them down when everything has been completed).

If you’re like the VAST majority of Puppet users out there that DON’T have a completely automated testing framework that has such complete coverage that you trust it to determine whether code changes are acceptable or not, then you’re probably ‘testing’ changes manually. For these people, you’ll probably want to synchronize code to whichever Puppet master(s) are suitable.

The cool thing about these scenarios is that MCollective is flexible enough to handle this. MCollective has the ability to filter your nodes based on things like available MCollective agents, Facter facts, Puppet classes, and even things like the MD5 hashes of arbitrary files on the filesystem…so however you want to restrict synchronization, you can do it with MCollective.

After all of that, the answer here is “Use MCollective to do R10k syncs/deploys.”

Testing

This section needs its own subset of blog posts. There are all kinds of tools that will allow you to test all sorts of things about your Puppet code (from basic syntax checking and linting, to integration tests that check for the presence of resources in the catalog, to acceptance-level tests that check the end-state of the system to make sure Puppet left it in a state that’s acceptable). The most common tools for these types of tests are:

Unfortunately, the point of this section is NOT to walk you through setting up one or more of those tools (I’d love to write those posts soon…), but rather to make you aware of their presence and identify where they fit in our Pipeline.

Once you’ve synchronized/deployed code changes to a specific machine (or subset of machines), the next step is to trigger tests.

Backing up the train a bit, certain kinds of ‘tests’ should be done WELL in advance of this step. For example, if code changes don’t even pass basic syntax checking and linting, they shouldn’t even MAKE it into your repository. Things like pre-commit hooks will allow you to trigger syntactical checks and linting before a commit is allowed. We’re assuming you’ve already set those up (and if you’ve NOT, then you should probably do that RIGHT NOW).

Rather, in this section, we’re talking about doing some basic integration smoke testing (i.e. running the rspec-puppet tests on all the modules to ensure that what we EXPECT in the catalog is actually IN the catalog), moving into acceptance level testing (i.e. spinning up pristine/clean nodes, actually applying the Puppet code to the nodes, and then running things like Beaker or Serverspec on the nodes to check the end-state of things like services, open ports, configuration files, and whatever to ensure that Puppet ACTUALLY left the system in a workable state), and then returning a “PASS” or “FAIL” response to Jenkins (or whatever is controlling your pipeline).

These tests can be as thorough or as loose as is acceptable to you (obviously, the goal is to automate ALL of your tests so you don’t have to manually check ANY changes, but that’s the nerd-nirvana state where we’re all browsing the web all day), but they should catch the most NOTORIOUS and OBVIOUS things FIRST. Follow the same rules you did when you got started with Puppet – catch the things that are easiest to catch and start building up your cache of “Total Time Saved.”

Jenkins needs to be able to trigger these tests from wherever it’s running, so your Jenkins box needs the ability to, say, spin up nodes in ESX, or locally with something like Vagrant, or even cloud nodes in EC2 or GCE, then TRIGGER the tests, and finally get a “PASS” or “FAIL” response back. The HARDEST part here, by far, is that you have to define what level of testing you’re going to implement, how you’re going to implement it, and devise the actual process to perform the testing. Like I said before, there are other blog posts that talk about this (and I hope to tackle this topic in the very near future), so I’ll leave it to them for the moment.

To merge or not to merge

The final step for any test code is to determine whether it should be merged into production or not. Like I said before, if your tests are sufficient and are adequate at determining whether a change is ‘good’ or not, then you can look at automating the process of merging those changes into production and killing off the test branch (or, NOT merging those changes, and leaving the branch open for more changes).

Automatically merging is scary for obvious reasons, but it’s also a good ‘test’ for your test coverage. Committing to a ‘merge upon success’ workflow takes trust, and there’s absolutely no shame in leaving this step to a human, to a change review board, or to some out-of-band process.

Use your illusion

These are the most common questions I get asked after the initial shock of R10k, and its workflow, wears off. Understand that I do these posts NOT from a “Here’s what you should absolutely be doing!” standpoint, but more from a “Here’s what’s going on out there.” vantage. Every time I’m called on-site with a customer, I evaluate:

  • The size and experience level of the team involved
  • The processes that the team must adhere to
  • The Puppet experience level of the team
  • The goals of the team

Frankly, after all those observations, sometimes I ABSOLUTELY come to the conclusion that something like R10k is entirely-too-much process for not-enough benefit. For those who are a fit, though, we go down the checklists and tailor the workflow to the environment.

What more IS there on R10k?

I do have at least a couple of more posts in me on some specific issues I’ve hit when consulting with companies using R10k, such as:

  • How best to use Hiera and R10k with Puppet ‘environments’ and internal, long-term ‘environments’
  • Better ideas on ‘what to branch and why’ with regard to component modules and the puppet_repository
  • To inherit or not to inherit with Roles
  • How to name things (note that I work for Puppet Labs, so I’m most likely very WRONG with this section)
  • Other random things I’ve noticed…

Also, I apologize if it’s been awhile since I’ve replied to a couple of comments. I’m booked out 3 months in advance and things are pretty wild at the moment, but I’m REALLY thankful of everyone who cares enough to drop a note, and I hope I’m providing some good info you can actually use! Cheers!

Building a Functional Puppet Workflow Part 3: Dynamic Environments With R10k

Workflows are like kickball games: everyone knows the general idea of what’s going on, there’s an orderly progression towards an end-goal, nobody wants to be excluded, and people lose their shit when they get hit in the face by a big rubber ball. Okay, so maybe it’s not a perfect mapping but you get the idea.

The previous two posts (one and two) focused on writing modules, wrapping modules, and classification. While BOTH of these things are very important in the grand scheme of things, one of the biggest problems people get hung-up on is how do you iterate upon your modules, and, more importantly, how do you eventually get these changes pushed into production in a reasonably orderly fashion?

This post is going to be all over the place. We’re gonna cover the idea of separate environments in Puppet, touch on dynamic environments, and round it out with that mother-of-a-shell-script-turned-personal-savior, R10k. Hold on to your shit.

Puppet Environments

Puppet has the concept of ‘environments’ where you can logically separate your modules and manifest (read: site.pp) into separate folders to allow for nodes to get entirely separate bits of code based on which ‘environment’ the node belongs to.

Puppet environments are statically set in puppet.conf, but, as other blog posts have noted, you can do some crafty things in puppet.conf to give you the solution of having ‘dynamic environments’.

NOTE: The solutions in this post are going to rely on Puppet environments, however environments aren’t without their own shortcomings namely, this bug on Ruby plugins in Puppet). For testing and promoting Puppet classes written in the DSL, environments will help you out greatly. For complete separation of Ruby instances and any plugins to Puppet written in Ruby, however, you’ll need separate masters (which is something that I won’t be covering in this article).

One step further – ‘dynamic’ environments

Adrien Thebo, hitherto known as ‘Finch’, – who is known for building awesome things and talking like he’s fresh from a Redbull binge – created the now-famous blog post on creating dynamic environments in Puppet with git. That post relied upon a post-commit hook to do all the jiggery-pokery necessary to checkout the correct branches in the correct places, and thus it had a heavy reliance upon git.

Truly, the only magic in puppet.conf was the inclusion of ‘$environment’ in the modulepath configuration entry on the Puppet master (literally that string and not the evaluated form of your environment). By doing that, the Puppet master would replace the string ‘$environment’ with the environment of the node checking in and would look to that path for Puppet manifests and modules. If you use something OTHER than git, it would be up to you to create a post-receive hook that populated those paths, but you could still replicate the results (albiet with a little work on your part).

People used this pattern and it worked fairly well. Hell, it STILL works fairly well, nothing has changed to STOP you from using it. What changed, however, was the ecosystem around modules, the need for individual module testing, and the further need to automate this whole goddamn process.

Before we deliver the ‘NEW SOLUTION’, let’s provide a bit of history and context.

Module repositories: the one-to-many problem

I touched on this topic in the first post, but one of the first problems you encounter when putting your modules in version control is whether or not to have ONE GIANT REPO with all of your modules, or a repository for every module you create. In the past we recommended putting every module in one repository (namely because it was easier, the module sharing landscape was pretty barren, and teams were smaller). Now, we recommend the opposite for the following reasons:

  • Individual repos mean individual module development histories
  • Most VCS solutions don’t have per-folder ACLs for a single repositories; having multiple repos allows per-module security settings.
  • With the one-repository-per-module solution, modules you pull down from the Forge (or Github) must be committed to your repo. Having multiple repositories for each module allow you to keep everything separate
  • Publishing this module to the Forge (or Github/Stash/whatever) is easier with separate repos (rather than having to split-out the module later).

The problem with having a repository for every Puppet module you create is that you need a way to map every module with every Puppet master (and, also which version of every module should be installed in which Puppet environment).

A project called librarian-puppet sprang up that created the ‘Puppetfile’, a file that would map modules and their versions to a specific directory. Librarian was awesome, but, as Finch noted in his post, it had some shortcomings when used in an environment with many and fast-changing modules. His solution, that he documented here,, was the tool we now come to know as R10k.

Enter R10k

R10k is essentially a Ruby project that wraps a bunch of shell commands you would NORMALLY use to maintain an environment of ever-changing Puppet modules. Its power is in its ability to use Git branches combined with a Puppetfile to keep your Puppet environments in-sync. Because of this, R10k is CURRENTLY restricted to git. There have been rumblings of porting it to Hg or svn, but I know of no serious attempts at doing this (and if you ARE doing this, may god have mercy on your soul). Great, so how does it work?

Well, you’ll need one main repository SIMPLY for tracking the Puppetfile. I’ve got one right here, and it only has my Puppetfile and a site.pp file for classification (should you use it).

NOTE: The Puppetfile and librarian-puppet-like capabilities under the hood are going to be doing most of the work here – this repository is solely so you can create topic branches with changes to your Puppetfile that will eventually become dynamically-created Puppet environments.

Let’s take a look at the Puppetfile and see what’s going on:

Puppetfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
forge "http://forge.puppetlabs.com"

# Modules from the Puppet Forge
mod "puppetlabs/stdlib"
mod "puppetlabs/apache", "0.11.0"
mod "puppetlabs/pe_gem"
mod "puppetlabs/mysql"
mod "puppetlabs/firewall"
mod "puppetlabs/vcsrepo"
mod "puppetlabs/git"
mod "puppetlabs/inifile"
mod "zack/r10k"
mod "gentoo/portage"
mod "thias/vsftpd"


# Modules from Github using various references
mod "wordpress",
  :git => "git://github.com/hunner/puppet-wordpress.git",
  :ref => '0.4.0'

mod "property_list_key",
  :git => "git://github.com/glarizza/puppet-property_list_key.git",
  :ref => '952a65d9ea2c5809f4e18f30537925ee45548abc'

mod 'redis',
  :git => 'git://github.com/glarizza/puppet-redis',
  :ref => 'feature/debian_support'

This example lists the syntax for dealing with modules from both the Forge and Github, as well as pulling specific versions of modules (whether versions in the case of the Forge, or Github references as tags, branches, or even specific commits). The syntax is not hard to follow – just remember that we’re mapping modules and their versions to a set/known environment.

For every topic branch on this repository (containing the Puppetfile), R10k will in turn create a Puppet environment with the same name. For this reason, it’s convention to rename the ‘master’ branch to ‘production’ since that’s the default environment in Puppet (note that renaming branches locally is easy – renaming the branch on Github can sometimes be a pain in the ass). You will also note why it’s going to be somewhat hard to map R10k to subversion, for example, due to the lack of lightweight branching schemes.

To explain any more of R10k reads just as if I were describing its installation, so let’s quit screwing around and actually INSTALL/SETUP the damn thing.

Setting up R10k

As I mentioned before, we have the main repository that will be used to track the Puppetfile, which in turn will track the modules to be installed (whether from The Forge, Github, or some internal git repo). Like any good Puppet component, R10k itself can be setup with a Puppet module. The module I’ll be using was developed by Zack Smith, and is pretty simple to get started. Let’s download it from the forge first:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@master1 vagrant]# puppet module install zack/r10k
Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└─┬ zack-r10k (v1.0.2)
  ├─┬ gentoo-portage (v2.1.0)
  │ └── puppetlabs-concat (v1.0.1)
  ├── mhuffnagle-make (v0.0.2)
  ├── puppetlabs-gcc (v0.1.0)
  ├── puppetlabs-git (v0.0.3)
  ├── puppetlabs-inifile (v1.0.1)
  ├── puppetlabs-pe_gem (v0.0.1)
  ├── puppetlabs-ruby (v0.1.0)
  └── puppetlabs-vcsrepo (v0.2.0)

The module will be installed into the first path in your modulepath, which in the case above is /etc/puppetlabs/puppet/modules. This modulepath will change due to the way we’re going to setup our dynamic Puppet environments. For this example, I’m going to have environments dynamically generated at /etc/puppetlabs/puppet/environments, so let’s create that directory first:

1
[root@master1 vagrant]# mkdir -p /etc/puppetlabs/puppet/environments

Now, we need to setup R10k on this machine. The module we downloaded will allow us to do that, but we’ll need to create a small Puppet manifest that will allow us to setup R10k out-of-band from a regular Puppet run (you CAN continuously-enforce R10k configuration in-band with your regular Puppet run, but if we’re setting up a Puppet master to use R10k to serve out dynamic environments it’s possible to create a chicken-and-egg situation.). Let’s generate a file called r10k_installation.pp in /var/tmp and have it look like the following:

/var/tmp/r10k_installation.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
class { 'r10k':
  version           => '1.1.3',
  sources           => {
    'puppet' => {
      'remote'  => 'https://github.com/glarizza/puppet_repository.git',
      'basedir' => "${::settings::confdir}/environments",
      'prefix'  => false,
    }
  },
  purgedirs         => ["${::settings::confdir}/environments"],
  manage_modulepath => true,
  modulepath        => "${::settings::confdir}/environments/\$environment/modules:/opt/puppet/share/puppet/modules",
}

So what is every section of that declaration doing?

  • version => '1.1.3' sets the version of the R10k gem to install
  • sources => {...} is a hash of sources that R10k is going to track. For now it’s only our main Puppet repo, but you can also track a Hiera installation too. This hash accepts key/value pairs for configuration settings that are going to be written to /etc/r10k.yaml, which is R10k’s main configuration file. The keys in-use are remote, which is the path to the repository to-be-checked-out by R10k, basedir, which is the path on-disk to where dynamic environments are to be created (we’re using the $::settings::confdir variable which maps to the Puppet master’s configuration directory, or /etc/puppetlabs/puppet), and prefix which is a boolean to determine whether to use R10k’s source-prefixing feature. NOTE: the false value is a BOOLEAN value, and thus SHOULD NOT BE QUOTED. Quoting it turns it into a string, which matches as a boolean TRUE value. Don’t quote false – that’s bad, mmkay.
  • purgedirs=> ["${::settings::confdir}/environments"] is configuring R10k to implement purging on the environments directory (so any folders that R10k doesn’t create it will delete). This configuration MAY be moot with newer versions of R10k as I believe it implements this behavior by default.
  • manage_modulepath => true will ensure that this module sets the modulepath configuration item in /etc/puppetlabs/puppet/puppet.conf
  • modulepath => ... sets the modulepath value to be dropped into /etc/puppetlabs/puppet/puppet.conf. Note that we are interpolating variables ($::settings::confdir again), AND inserting the LITERAL string of $environment into the modulepath – this is because Puppet will replace $environment with the value of the agent’s environment at catalog compilation.

JUST IN CASE YOU MISSED IT: Don’t quote the false value for the prefix setting in the sources block. That is all.

Okay, we have our one-time Puppet manifest, and now the only thing left to do is to run it:

1
2
3
4
5
6
7
[root@master1 tmp]# puppet apply /var/tmp/r10k_installation.pp
Notice: Compiled catalog for master1 in environment production in 2.05 seconds
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/ensure: defined content as '{md5}0b619d5148ea493e2d6a5bb205727f0c'
Notice: /Stage[main]/R10k::Config/Ini_setting[R10k Modulepath]/value: value changed '/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules' to '/etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules'
Notice: /Package[r10k]/ensure: created
Notice: /Stage[main]/R10k::Install::Pe_gem/File[/usr/bin/r10k]/ensure: created
Notice: Finished catalog run in 10.55 seconds

At this point, it goes without saying that git needs to be installed, but if you’re firing up a new VM that DOESN’T have git, then R10k is going to spit out an awesome error – so ensure that git is installed. After that, let’s synchronize R10k with the r10k deploy environment -pv command (-p for Puppetfile synchronization and -v for verbose mode):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
[root@master1 puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

I ran this first synchronization with verbose mode so you can see exactly what’s getting copied where. Futher synchronizations don’t have to be in verbose mode, but it’s good for debugging. After all of that, we have an /etc/puppetlabs/puppet/environments folder containing our dynamic Puppet environments based off of the branches of the main Puppet repo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@master1 puppet]# ls -lah /etc/puppetlabs/puppet/environments/
total 20K
drwxr-xr-x 5 root root 4.0K Feb 19 11:44 .
drwxr-xr-x 7 root root 4.0K Feb 19 11:25 ..
drwxr-xr-x 4 root root 4.0K Feb 19 11:44 development
drwxr-xr-x 5 root root 4.0K Feb 19 11:43 master
drwxr-xr-x 5 root root 4.0K Feb 19 11:42 production

[root@master1 puppet]# cd /etc/puppetlabs/puppet/environments/production/
[root@master1 production]# git branch -a
  master
* production
  remotes/origin/HEAD -> origin/master
  remotes/origin/development
  remotes/origin/master
  remotes/origin/production

As you can see (at the time of this writing), my main Puppet repo has three main branches: development, master, and production, and so R10k created three Puppet environments matching those names. It’s somewhat of a convention to rename the master branch to production, but in this case I left it alone to demonstrate how this works.

ONE OTHER BIG GOTCHA: R10k does NOT resolve dependencies, and so it is UP TO YOU to track them in your Puppetfile. Check this out:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
[root@master1 production]# puppet module list
Warning: Module 'puppetlabs-firewall' (v1.0.0) fails to meet some dependencies:
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'puppetlabs-firewall' (v0.3.x)
Warning: Module 'puppetlabs-stdlib' (v4.1.0) fails to meet some dependencies:
  'puppetlabs-pe_accounts' (v2.0.1) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-pe_mcollective' (v0.1.14) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-request_manager' (v0.0.10) requires 'puppetlabs-stdlib' (v3.2.x)
Warning: Missing dependency 'cprice404-inifile':
  'puppetlabs-pe_puppetdb' (v0.0.11) requires 'cprice404-inifile' (>=0.9.0)
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'cprice404-inifile' (v0.10.x)
  'puppetlabs-puppetdb' (v1.5.1) requires 'cprice404-inifile' (>= 0.10.3)
Warning: Missing dependency 'puppetlabs-concat':
  'puppetlabs-apache' (v0.11.0) requires 'puppetlabs-concat' (>= 1.0.0)
  'gentoo-portage' (v2.1.0) requires 'puppetlabs-concat' (v1.0.x)
Warning: Missing dependency 'puppetlabs-gcc':
  'zack-r10k' (v1.0.2) requires 'puppetlabs-gcc' (>= 0.0.3)
/etc/puppetlabs/puppet/environments/production/modules
├── gentoo-portage (v2.1.0)
├── mhuffnagle-make (v0.0.2)
├── property_list_key (???)
├── puppetlabs-apache (v0.11.0)
├── puppetlabs-firewall (v1.0.0)  invalid
├── puppetlabs-git (v0.0.3)
├── puppetlabs-inifile (v1.0.1)
├── puppetlabs-mysql (v2.2.1)
├── puppetlabs-pe_gem (v0.0.1)
├── puppetlabs-ruby (v0.1.0)
├── puppetlabs-stdlib (v4.1.0)  invalid
├── puppetlabs-vcsrepo (v0.2.0)
├── redis (???)
├── ripienaar-concat (v0.2.0)
├── thias-vsftpd (v0.2.0)
├── wordpress (???)
└── zack-r10k (v1.0.2)
/opt/puppet/share/puppet/modules
├── cprice404-inifile (v0.10.3)
├── puppetlabs-apt (v1.1.0)
├── puppetlabs-auth_conf (v0.1.7)
├── puppetlabs-firewall (v0.3.0)  invalid
├── puppetlabs-java_ks (v1.1.0)
├── puppetlabs-pe_accounts (v2.0.1)
├── puppetlabs-pe_common (v0.1.0)
├── puppetlabs-pe_mcollective (v0.1.14)
├── puppetlabs-pe_postgresql (v0.0.5)
├── puppetlabs-pe_puppetdb (v0.0.11)
├── puppetlabs-postgresql (v2.5.0)
├── puppetlabs-puppet_enterprise (v3.1.0)
├── puppetlabs-puppetdb (v1.5.1)
├── puppetlabs-reboot (v0.1.2)
├── puppetlabs-request_manager (v0.0.10)
├── puppetlabs-stdlib (v3.2.0)  invalid
└── ripienaar-concat (v0.2.0)

I’ve installed Puppet Enterprise 3.1.0, and so /opt/puppet/share/puppet/modules reflects the state of the Puppet Enterprise (also known as ‘PE’) modules at that time. You can see that there are some conflicts because certain modules require certain versions of other modules. This is currently the nature of the beast with regard to Puppet modules. Some of these errors are loud and incidental (i.e. someone set a dependency on a version and forgot to update it), some are due to namespace changes (i.e. cfprice-inifile being ported over to puppetlabs-inifile), and so on. Basically, ensure that you handle the dependencies you care about inside the Puppetfile as R10k won’t do it for you.

There – we’ve done it! We’ve configured R10k! Now how the hell do you use it?

R10k demonstration – from module iteration to environment iteration

Let’s take the environment we’ve setup in the previous steps and walk you through adding a new module to your production environment, iterating upon that module, pushing the changes to that module, pushing the changes to a Puppet environment, and then promoting those changes to production.

NOTES ON THE SETUP OF THIS DEMO:

  • In this demonstration, classification method is going to be left to the user (i.e. it’s not a part of the magic). So, when I tell you to classify your node with a specific class, I don’t care if you use the Puppet Enterprise Console, site.pp, or any other manner.
  • I’m using Github for my repositories so that you folk watching and playing along at home can have something to follow. Feel free to substitute Github for something like Atlassian Stash/Bitbucket, internal repos, or whatever.

Add the module to an environment

The module we’ll be working with, a simple module called ‘notifyme’, will notify a message that will help us track the module’s process through all phases of iteration.

The first thing we need to do is to add the module to an environment, so let’s dynamically create a NEW environment by creating a new topic branch and pushing it up to the main puppet repo. I will perform this step on my laptop and outside of the VM I’m using to test R10k:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
└(~/src/puppet_repository)▷ git branch
  master
* production

└(~/src/puppet_repository)▷ git checkout -b notifyme
Switched to a new branch 'notifyme'

└(~/src/puppet_repository)▷ vim Puppetfile

# Perform the changes to Puppetfile here

└(~/src/puppet_repository)▷ git add Puppetfile
└(~/src/puppet_repository)▷ git commit
[notifyme 5239538] Add the 'notifyme' module
 1 file changed, 3 insertions(+)

└(~/src/puppet_repository)▷ git push origin notifyme:notifyme
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 348 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
 * [new branch]      notifyme -> notifyme

The contents I added to my Puppetfile look like this:

Puppetfile
1
2
mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git"

Perform an R10k synchronization

To pull the new dynamic environment down to the Puppet master, do another R10k synchronization with r10k deploy environment -pv:

1
2
3
4
5
6
7
8
[root@master1 production]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
<snip for brevity>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment notifyme
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/notifyme/modules
<more snipping>

I only included the relevant messages, but you can see that it pulled in a new environment called ‘notifyme’ that ALSO pulled in a module called ‘notifyme’

Rename the branch to avoid confusion

Suddenly I realize that this may get confusing having both an environment called ‘notifyme’ with a module/class called ‘notifyme’. No worries, how about we rename that branch?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
└(~/src/puppet_repository)▷ git branch -m notifyme garysawesomeenvironment

└(~/src/puppet_repository)▷ git push origin :notifyme
To https://github.com/glarizza/puppet_repository.git
 - [deleted]         notifyme

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 348 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
 * [new branch]      garysawesomeenvironment -> garysawesomeenvironment

That bit of git renamed the ‘notifyme’ branch to ‘garysawesomeenvironment’. The next git command is a bit tricky – when you git push to a remote, it’s supposed to be:

git push name_of_origin local_branch_name:remote_branch_name

In our case, the name of our origin is LITERALLY ‘origin’, but we actually want to DELETE a remote branch. The way to delete a local branch is with git branch -d branch_name, but the way to delete a REMOTE branch is to push NOTHING to it. So consider the following command:

git push origin :notifyme

We’re pushing to the origin named ‘origin’, but providing NO local branch name and pushing that bit of nothing to the remote branch of ‘notifyme’. This kills (deletes) the remote branch.

Finally, we push to our origin named ‘origin’ again and push the contents of the local branch ‘garysawesomeenvironment’ to the remote branch of ‘garysawesomeenvironment’ which in turn CREATES that branch if it doesn’t exist. Whew. Let’s run another damn synchronization:

1
2
3
4
5
6
7
8
[root@master1 production]# `r10k deploy environment -pv`
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
<more snippage>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
<more of that snipping shit>
R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

Cool, let’s check out our environments folder on our VM:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@master1 production]# ls -lah /etc/puppetlabs/puppet/environments/
total 24K
drwxr-xr-x 6 root root 4.0K Feb 19 13:34 .
drwxr-xr-x 7 root root 4.0K Feb 19 12:09 ..
drwxr-xr-x 4 root root 4.0K Feb 19 11:44 development
drwxr-xr-x 5 root root 4.0K Feb 19 13:33 garysawesomeenvironment
drwxr-xr-x 5 root root 4.0K Feb 19 11:43 master
drwxr-xr-x 5 root root 4.0K Feb 19 11:42 production

[root@master1 production]# cd /etc/puppetlabs/puppet/environments/garysawesomeenvironment/

[root@master1 garysawesomeenvironment]# git branch
* garysawesomeenvironment
  master

Run Puppet to test the new environment

Perfect! Now classify your node to include the ‘notifyme’ class, and let’s run Puppet to see what we get when we try to join the environment called ‘garysawesomeenvronment’:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snipping facts loading for brevity>
Info: Caching catalog for master1
Info: Applying configuration version '1392845863'
Notice: This is the notifyme module and its master branch
Notice: /Stage[main]/Notifyme/Notify[This is the notifyme module and its master branch]/message: defined 'message' as 'This is the notifyme module and its master branch'
Notice: Finished catalog run in 11.10 seconds

Cool! Now let’s try to run Puppet with another environment, say ‘production’:

1
2
3
4
5
6
[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snipping facts loading for brevity>
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class notifyme for master1 on node master1
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

We get an error because that module hasn’t been loaded by R10k for that environment.

Tie a module version to an environment

Okay, so we added a module to a new environment, but what if we want to test out a specific commit, branch, or tag of a module and test it in this new environment? This is frequently what you’ll be doing – making a change to an existing module, pushing your change to a topic branch of that module’s repository, tying it to an environment (or creating a new environment by branching the main Puppet repository), and then testing the change.

Let’s go back to my ‘notifyme’ module that I’ve cloned to my laptop and push a change to a BRANCH of that module’s Github repository:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
└(~/src/puppet-notifyme)▷ git branch
* master

└(~/src/puppet-notifyme)▷ git checkout -b change_the_message
Switched to a new branch 'change_the_message'

└(~/src/puppet-notifyme)▷ vim manifests/init.pp
## Make changes to the notify message

└(~/src/puppet-notifyme)▷ git add manifests/init.pp

└(~/src/puppet-notifyme)▷ git commit
[change_the_message bc3975b] Change the Message
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin change_the_message:change_the_message
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 448 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
 * [new branch]      change_the_message -> change_the_message

└(~/src/puppet-notifyme)▷ git branch -a
* change_the_message
  master
  remotes/origin/change_the_message
  remotes/origin/master

└(~/src/puppet-notifyme)▷ git log
commit bc3975bb5c75ada86bfc2c45db628b5a156f85ce
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 13:55:26 2014 -0800

    Change the Message

    This commit changes the message to test my workflow.

What I’m showing you is the workflow that creates a new local branch called ‘change_the_message’ to the notifyme module, changes the message in my notify resource, commits the change, and pushes the changes to a remote branch ALSO called ‘change_the_message’.

Because I created a topic branch, I can provide that branch name in the Puppetfile located in the ‘garysawesomeenvironment’ branch of the main Puppet repo. THAT is the piece that ties together the specific version of the module with the Puppet environment we want on the Puppet master. Here’s that change:

Puppetfile
1
2
3
mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git",
  :ref => 'change_the_message'

Again, that change gets put into the ‘garysawesomeenvironment’ branch of the main Puppet repo and pushed up to the remote:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
└(~/src/puppet_repository)▷ vim Puppetfile
## Make changes

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[garysawesomeenvironment 89b139c] Update garysawesomeenvironment
 1 file changed, 2 insertions(+), 1 deletion(-)

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 411 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   5239538..89b139c  garysawesomeenvironment -> garysawesomeenvironment

└(~/src/puppet_repository)▷ git log -p
commit 89b139c8c2faa888a402b98ea76e4ca138b3463d
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 14:04:18 2014 -0800

    Update garysawesomeenvironment

    Tie this environment to the 'change_the_message' branch of my notifyme module.

diff --git a/Puppetfile b/Puppetfile
index 5e5d091..27fc06e 100644
--- a/Puppetfile
+++ b/Puppetfile
@@ -31,4 +31,5 @@ mod 'redis',
   :ref => 'feature/debian_support'

 mod "notifyme",
-  :git => "git://github.com/glarizza/puppet-notifyme.git"
+  :git => "git://github.com/glarizza/puppet-notifyme.git",
+  :ref => 'change_the_message'

Now let’s synchronize again!!

1
2
3
4
5
6
[root@master1 garysawesomeenvironment]# `r10k deploy environment -pv`
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
<snip>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
<snip>

Cool, let’s check our work on the VM:

1
2
3
4
5
[root@master1 garysawesomeenvironment]# pwd
/etc/puppetlabs/puppet/environments/garysawesomeenvironment
[root@master1 garysawesomeenvironment]# git branch
* garysawesomeenvironment
  master

And finally, let’s run Puppet:

1
2
3
4
5
6
7
8
root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392847743'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 12.10 seconds

TADA! We’ve successfully tied a specific version of a module to a specific dynamic environment, deployed it to a master, and tested it out! Smell that? That’s the smell of awesome. Or Jeff in the next cubicle eating a burrito. Either way, I like it.

Merge your changes with master/production

It’s green – fuck it; ship it! NOW you’re speaking ‘agile’! Assuming everything went according to plan, let’s merge our changes in with the production environment and synchronize. This is up to your company’s workflow docs (whether you use pull requests, a merge master, or poke Patrick and tell him to tell Andy to merge in your change). I’m using git and Github, so let’s merge.

First, do the Module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
└(~/src/puppet-notifyme)▷ git checkout master
Switched to branch 'master'

└(~/src/puppet-notifyme)▷ git merge change_the_message
Updating d44a790..bc3975b
Fast-forward
 manifests/init.pp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin master:master
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
   d44a790..bc3975b  master -> master

└(~/src/puppet-notifyme)▷ cat manifests/init.pp
class notifyme {
  notify { "This is the changed message in the change_the_message branch": }
}

So now we have an issue, and that issue is that the production environment has YET to have the ‘notifyme’ module added to it. If we merge the contents of the ‘garysawesomeenvironment’ branch with the ‘production’ branch of the main Puppet repo, then we’re going to be pointing at the ‘change_the_message’ branch of the ‘notifyme’ module (because that was our last commit).

Because of this, I can’t do a straight merge, can I? For posterity’s sake (in the event that someone in the future wants to look for that branch on my Github repo), I’m going to keep that branch alive. In a production environment, I most likely would NOT have additional branches open for all my component modules as that would get pretty annoying/confusing. Understand that this is a one-off case because I’m doing a demo. BECAUSE of this, I’m going to modify the Puppetfile in the ‘production’ branch of the main Puppet repo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
└(~/src/puppet_repository)▷ git checkout production
Switched to branch 'production'

└(~/src/puppet_repository)▷ vim Puppetfile
## Make changes here

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[production a74f269] Add notifyme module to Production environment
 1 file changed, 4 insertions(+)

└(~/src/puppet_repository)▷ git push origin production:production
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 362 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   5ecefc8..a74f269  production -> production

└(~/src/puppet_repository)▷ git log -p
commit a74f26975102f3786eedddace89bda086162d801
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 14:24:05 2014 -0800

    Add notifyme module to Production environment

diff --git a/Puppetfile b/Puppetfile
index 0b1da68..9168a81 100644
--- a/Puppetfile
+++ b/Puppetfile
@@ -29,3 +29,7 @@ mod "property_list_key",
 mod 'redis',
   :git => 'git://github.com/glarizza/puppet-redis',
   :ref => 'feature/debian_support'
+
+mod 'notifyme',
+  :git => 'git://github.com/glarizza/puppet-notifyme'
+

Alright, we’ve updated the production environment, now synchronize again (I’ll spare you and do it WITHOUT verbose mode):

1
[root@master1 garysawesomeenvironment]# r10k deploy environment -p

Okay, now run Puppet with the PRODUCTION environment:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snipping fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392848588'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 12.66 seconds

Beautiful, we’re synchronized!!!

Making a change to an EXISTING module in an environment

Okay, so we saw previously how to add a NEW module to an environment, but what if we already HAVE a module in an environment and we want to make an update/change to it? Well, it’s largely the same process:

  • Cut a branch to the module
  • Commit your code and push it up to the module’s repo
  • Cut a branch to the main Puppet repo
  • Push that branch up to the main Puppet repo
  • Perform an R10k synchronization to sync the environments
  • Test your changes
  • Merge the changes with the master branch of the module
  • DONE!

Let’s go back and change that notify message again, shall we?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
└(~/src/puppet-notifyme)▷ git checkout -b 'another_change'
Switched to a new branch 'another_change'

└(~/src/puppet-notifyme)▷ vim manifests/init.pp
## Make changes to the message

└(~/src/puppet-notifyme)▷ git add manifests/init.pp

└(~/src/puppet-notifyme)▷ git commit
[another_change 608166e] Change the message that already exists!
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin another_change:another_change
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 426 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
 * [new branch]      another_change -> another_change

Okay, let’s re-use ‘garysawesomeenvironment’ because I like the name, but tie it to the new ‘another_change’ branch of the ‘notifyme’ module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
└(~/src/puppet_repository)▷ git checkout garysawesomeenvironment
Switched to branch 'garysawesomeenvironment'

└(~/src/puppet_repository)▷ vim Puppetfile
## Make change to Puppetfile to tie it to 'another_change' branch

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[garysawesomeenvironment ce84a30] Tie garysawesomeenvironment to 'another_change'
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 386 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   89b139c..ce84a30  garysawesomeenvironment -> garysawesomeenvironment

The Puppetfile for that branch now has an entry for the ‘notifyme’ module that looks like this:

Puppetfile
1
2
3
mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git",
  :ref => 'another_change'

Okay, synchronize again!

1
[root@master1 garysawesomeenvironment]# r10k deploy environment -p

And now run Puppet in the ‘garysawesomeenvironment’ environment:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392849521'
Notice: This changes the message that already exists!!!!
Notice: /Stage[main]/Notifyme/Notify[This changes the message that already exists!!!!]/message: defined 'message' as 'This changes the message that already exists!!!!'
Notice: Finished catalog run in 12.54 seconds

There’s the message that I changed in the ‘another_change’ branch of my ‘notifyme’ module! What’s it look like if I run in the ‘production’ environment, though?

1
2
3
4
5
6
7
8
9

ot@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392848588'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 14.11 seconds

There’s the old message that’s in the ‘master’ branch of the ‘notifyme’ module (which is where the ‘production’ branch Puppetfile is pointing). To merge the changes into the production environment, we now only have to do one thing: that’s merge the changes in the ‘another_change’ branch of the ‘notifyme’ module to the ‘master’ branch – that’s it! Why? Because the Puppetfile in the production branch of the main Puppet repo (and thus the production Puppet ENVIRONMENT) is already POINTING at the master branch of the ‘notifyme’ module. Let’s do the merge:

1
2
3
4
5
6
7
8
9
10
11
12
13
└(~/src/puppet-notifyme)▷ git checkout master
Switched to branch 'master'

└(~/src/puppet-notifyme)▷ git merge another_change
Updating bc3975b..608166e
Fast-forward
 manifests/init.pp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin master:master
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
   bc3975b..608166e  master -> master

Another R10k synchronization is needed on the master:

1
[root@master1 garysawesomeenvironment]# r10k deploy environment -p

And now let’s run Puppet in the production environment:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392850004'
Notice: This changes the message that already exists!!!!
Notice: /Stage[main]/Notifyme/Notify[This changes the message that already exists!!!!]/message: defined 'message' as 'This changes the message that already exists!!!!'
Notice: Finished catalog run in 11.82 seconds

There’s the message that was previously in the ‘another_change’ branch that’s been merged to the ‘master’ branch (and thus is entered into the production Puppet environment).

OR, use tags

One more note – for production environments that want a BIT more stability (rather than hoping that someone follows the policy of pushing commits to a BRANCH of a module rather than pushing directly to master – by accident or otherwise – and allowing that commit to make DIRECTLY it into production), the better way is to tie all modules to some sort of release version. For modules released to the Puppet Forge, that’s a version, for modules stored in git repositories, that would be a tag. Tying all modules in your production environment (and thus production Puppetfile) to specific tags in git repositories IS a “best practice” to ensure that the code that’s executed in production has some sort of ‘safe guard’.

TL;DR: Example tied to ‘master’ branch above was demo, and not necessarily recommended for your production needs.

Holy crap, that’s a lot to take in…

Yeah, tell me about it. And, believe it or not, I’m STILL not done with everything that I want to talk about regarding R10k – there’s still more info on:

  • Using R10k with a monolithic modules repo
  • Incorporating Hiera data
  • Triggering R10k with MCollective
  • Tying R10k to CI workflow

Those will come in a later post once I have time to decide how to tackle them. Until then, this should give you more than enough information to get started with R10k in your own environment.

If you have any questions/comments/corrections, PLEASE enter them in the comments below and I’ll be happy to respond when I’m not flying from gig to gig! :) Cheers!

EDIT: 2/19/2014 – correct librarian-puppet assumption thanks to Reid Vandewiele

Building a Functional Puppet Workflow Part 2: Roles and Profiles

In my first post, I talked about writing functional component modules. Well, I didn’t really do much detailing other than pointing out key bits of information that tend to cause problems. In this post, I’ll describe the next layer to the functional Puppet module workflow.

People usually stop once they have a library of component modules (whether hand-written, taken from Github, or pulled from The Forge). The idea is that you can classify all of your nodes in site.pp, the Puppet Enterprise Console, The Foreman, or with some other ENC, so why not just declare all your classes for every node when you need them?

Because that’s a lot of extra work and opportunities for fuckups.

People recognized this, so in the EARLY days of Puppet they would create node blocks in site.pp and use inheritance to inherit from those blocks. This was the right IDEA, but probably not the best PLACE for it. Eventually, ‘Profiles’ were born.

The idea of ‘Roles and Profiles’ originally came from a piece that Craig Dunn wrote while he worked for the BBC, and then Adrien Thebo also wrote a piece that documents the same sort of pattern. So why am I writing about it a THIRD time? Well, because I feel it’s only a PIECE of an overall puzzle. The introduction of Hiera and other awesome tools (like R10k, which we will get to on the next post) still make Roles and Profiles VIABLE, but they also extend upon them.

One final note before we move on – the terms ‘Roles’ and ‘Profiles’ are ENTIRELY ARBITRARY. They’re not magic reserve words in Puppet, and you can call them whatever the hell you want. It’s also been pointed out that Craig MIGHT have misnamed them (a ROLE should be a model for an individual piece of tech, and a PROFILE should probably be a group of roles), but, like all good Puppet Labs employees – we suck at naming things.

Profiles: technology-specific wrapper classes

A profile is simply a wrapper class that groups Hiera lookups and class declarations into one functional unit. For example, if you wanted Wordpress installed on a machine, you’d probably need to declare the apache class to get Apache setup, declare an apache::vhost for the Wordpress directory, setup a MySQL database with the appropriate classes, and so on. There are a lot of components that go together when you setup a piece of technology, it’s not just a single class.

Because of this, a profile exists to give you a single class you can include that will setup all the necessary bits for that piece of technology (be it Wordpress, or Tomcat, or whatever).

Let’s look at a simple profile for Wordpress:

profiles/manifests/wordpress.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class profiles::wordpress {

  ## Hiera lookups
  $site_name               = hiera('profiles::wordpress::site_name')
  $wordpress_user_password = hiera('profiles::wordpress::wordpress_user_password')
  $mysql_root_password     = hiera('profiles::wordpress::mysql_root_password')
  $wordpress_db_host       = hiera('profiles::wordpress::wordpress_db_host')
  $wordpress_db_name       = hiera('profiles::wordpress::wordpress_db_name')
  $wordpress_db_password   = hiera('profiles::wordpress::wordpress_db_password')
  $wordpress_user          = hiera('profiles::wordpress::wordpress_user')
  $wordpress_group         = hiera('profiles::wordpress::wordpress_group')
  $wordpress_docroot       = hiera('profiles::wordpress::wordpress_docroot')
  $wordpress_port          = hiera('profiles::wordpress::wordpress_port')

  ## Create user
  group { 'wordpress':
    ensure => present,
    name   => $wordpress_group,
  }
  user { 'wordpress':
    ensure   => present,
    gid      => $wordpress_group,
    password => $wordpress_user_password,
    name     => $wordpress_group,
    home     => $wordpress_docroot,
  }

  ## Configure mysql
  class { 'mysql::server':
    root_password => $wordpress_root_password,
  }

  class { 'mysql::bindings':
    php_enable => true,
  }

  ## Configure apache
  include apache
  include apache::mod::php
  apache::vhost { $::fqdn:
    port    => $wordpress_port,
    docroot => $wordpress_docroot,
  }

  ## Configure wordpress
  class { '::wordpress':
    install_dir => $wordpress_docroot,
    db_name     => $wordpress_db_name,
    db_host     => $wordpress_db_host,
    db_password => $wordpress_db_password,
  }
}

Name your profiles according to the technology they setup

Profiles are technology-specific, so you’ll have one to setup wordpress, and tomcat, and jenkins, and…well, you get the picture. You can also namespace your profiles so that you have profiles::ssh::server and profiles::ssh::client if you want. You can even have profiles::jenkins::tomcat and profiles::jenkins::jboss or however you need to namespace according to the TECHNOLOGIES you use. You don’t need to include your environment in the profile name (a la profiles::dev::tomcat) as the bits of data that make the dev environment different from production should come from HIERA, and thus aren’t going to be different on a per-profile basis. You CAN setup profiles according to your business unit if multiple units use Puppet and have different setups (a la security::profiles::tomcat versus ops::profiles::tomcat), but the GOAL of Puppet is to have one main set of modules that every group uses (and the Hiera data being different for every group). That’s the GOAL, but I’m pragmatic enough to understand that not everywhere is a shiny, happy ‘DevOps Garden.’

Do all Hiera lookups in the profile

You’ll see that I declared variables and set their values with Hiera lookups. The profile is the place for these lookups because the profile collects all external data and declares all the classes you’ll need. In reality, you’ll USUALLY only see profiles looking up parameters and declaring classes (i.e. declaring users and groups like I did above will USUALLY be left to component classes).

I do the Hiera lookups first to make it easy to debug from where those values came. I don’t rely on ‘Automatic Parameter Lookup’ in Puppet 3.x.x because it can be ‘magic’ for people who aren’t aware of it (for people new to Puppet, it’s much easier to see a function call and trace back what it does rather than experience Puppet doing something unseen and wondering what the hell happened).

Finally, you’ll notice that my Hiera lookups have NO DEFAULT VALUES – this is BY DESIGN! For most people, their Hiera data is PROBABLY located in a separate repository as their Puppet module data. Imagine making a change to your profile to have it lookup a bit of data from Hiera, and then imagine you FORGOT to put that data into Hiera. What happens if you provide a default value to Hiera? The catalog compiles, that default value gets passed down to the component module, and gets enforced on disk. If you have good tests, you MIGHT see that the component you configured has a bit of data that’s not correct, but what if you don’t have a great post-Puppet testing workflow? Puppet will correctly set this default value, according to Puppet everything is green and worked just fine, but now your component is setup incorrectly. That’s one of the WORST failures – the ones that you don’t catch. Now, imagine you DON’T provide a default value. In THIS case, Puppet will raise a compilation error because a Hiera lookup didn’t return a value. You’ll catch your error before anything gets pushed to Production and you can catch the screwup. This is a MUCH better solution.

Use parameterized class declarations and explicitly pass values you care about

The parameterized class declaration syntax can be dangerous. The difference between the include function and the parameterized class syntax is that the include function is idempotent. You can do the following in a Puppet manifest, and Puppet doesn’t raise an error:

1
2
3
include apache
include apache
include apache

This is because the include function checks to see if the class is in the catalog. If it ISN’T, then it adds it. If it IS, then it exits cleanly. The include function is your pal.

Consider THIS manifest:

1
2
3
class { 'apache': }
include apache
include apache

Does this work? Yep. The parameterized class syntax adds the class to the catalog, the include function detects this and exits cleanly twice. What about THIS manifest:

1
2
3
include apache
class { 'apache': }
include apache

Does THIS work? Nope! Puppet raises a compilation error because a class was declared more than once in a catalog. Why? Well, consider that Puppet is ‘declarative’…all the way up until it isn’t. Puppet’s PARSER reads from the top of the file to the bottom of the file, and we have a single-pass parser when it comes to things like setting variables and declaring classes. When the parser hits the first include function, it adds the class to the catalog. The parameterized class syntax, however, is a honey badger: it doesn’t give a shit. It adds a class to the catalog regardless of whether it already exists or not. So why would we EVER use the parameterized class declaration syntax? We need to use it because the include function doesn’t allow you to pass parameters when you declare a class.

So wait – why did I spend all this time explaining why the parameterized class syntax is more dangerous than the include function ONLY to recommend its use in profiles? For two reasons:

  • We need to use it to pass parameters to classes
  • We’re wrapping its use in a class that we can IN TURN declare with the include function

Yes, we can get the best of BOTH worlds, the ability to pass parameters and the use of our pal the include function, with this wrapper class. We’ll see the latter usage when we come to roles, but for now let’s focus on passing parameter values.

In the first section, we set variables with Hiera lookups, now we can pass those variables to classes we’re declaring with the parameterized class syntax. This allows the declaration of the class to be static, but the parameters we pass to that class to change according to the Hiera hierarchy. We’ve explicitly called the hiera function, so it makes it easier to debug, and we’re explicitly passing parameter values so we know definitively which parameters are being passed (and thus are overriding default values) to the component module. Finally, since our component modules do NOT use Hiera at all, we can be sure that if we’re not passing a parameter that it’s getting its value from the default set in the module’s ::params class.

Everything we do here is meant to make things easier to debug when it’s 3am and things aren’t working. Any asshole can do crazy shit in Puppet, but a seasoned sysadmin writes their code for ease of debugging during 3am pages.

An annoying Puppet bug – top-level class declarations and profiles

Oh, ticket 2053, how terrible are you? This is one of those bug numbers that I can remember by heart (like 8040 and 86). Puppet has the ability to do ‘relative namespacing’, which allows you to declare a variable called $port in a class called $apache and refer to it as $port instead of fully-namespacing the variable, and thus having to call it $apache::port inside the apache class. It’s a shortcut – you can STILL refer to the variable as $apache::port in the class – but it comes in handy. The PROBLEM occurs when you create a profile, as we did above, called profiles::wordpress and you try to declare a class called wordpress. If you do the following inside the profiles::wordpress class, what class is being declared:

1
include wordpress

If you think you’re declaring a wordpress class from within a wordpress module in your Puppet modulepath, you would be wrong. Puppet ACTUALLY thinks you’re trying to declare profiles::wordpress because you’re INSIDE the profiles::wordpress class and it’s doing relative namespacing (i.e. in the same way you refer to $port and ACTUALLY mean $apache::port it thinks you’re referring to wordpress and ACTUALLY mean profiles::wordpress.

Needless to say, this causes LOTS of confusion.

The solution here is to declare a class called ::wordpress which tells Puppet to go to the top-level namespace and look for a module called wordpress which has a top-level class called wordpress. It’s the same reason that we refer to Facter Fact values as $::osfamily instead of $osfamily in class definitions (because you can declare a local variable called $osfamily in your class). This is why in the profile above you see this:

1
2
3
4
5
6
class { '::wordpress':
  install_dir => $wordpress_docroot,
  db_name     => $wordpress_db_name,
  db_host     => $wordpress_db_host,
  db_password => $wordpress_db_password,
}

When you use profiles and roles, you’ll need to do this namespacing trick when declaring classes because you’re frequently going to have a profile::<sometech> that will declare the <sometech> top-level class.

Roles: business-specific wrapper classes

How do you refer to your machines? When I ask you about that cluster over there, do you say “Oh, you mean the machines with java 1.6, apache, mysql, etc…”? I didn’t think so. You usually have names for them, like the “internal compute cluster” or “app builder nodes” or “DMZ repo machines” or whatever. These names are your Roles. Roles are just the mapping of your machine’s names to the technology that should be ON them. In the past we had descriptive hostnames that afforded us a code for what the machine ‘did’ – roles are just that mapping for Puppet.

Roles are namespaced just like profiles, but now it’s up to your organization to fill in the blanks. Some people immediately want to put environments into the roles (a la roles::uat::compute_cluster), but that’s usually not necessary (as MOST LIKELY the compute cluster nodes have the SAME technology on them when they’re in dev versus when they’re in prod, it’s just the DATA – like database names, VIP locations, usernames/passwords, etc – that’s different. Again, these data differences will come from Hiera, so there should be no reason to put the environment name in your role). You still CAN put the environment name in the role if it makes you feel better, but it’ll probably be useless.

Roles ONLY include profiles

So what exactly is in the role wrapper class? That depends on what technology is on the node that defines that role. What I can tell you for CERTAIN is that roles should ONLY use the include function and should ONLY include profiles. What does this give us? This gives us our pal the include function back! You can include the same profile 100 times if you want, and Puppet only puts it in the catalog once.

Every node is classified with one role. Period.

The beautiful thing about roles and profiles is that the GOAL is that you should be able to classify a node with a SINGLE role and THAT’S IT. This makes classification simple and static – the node gets its role, the role includes profiles, profiles call out to Hiera for data, that data is passed to component modules, and away we go. Also, since classification is static, you can use version control to see what changes were introduced to the role (i.e. what profiles were added or removed). In my opinion, if you need to apply more than one role to a node, you’ve introduced a new role (see below).

Roles CAN use inheritance…if you like

I’ve seen people implement roles a couple of different ways, and one of them is to use inheritance to build a catalog. For example, you can define a base roles class that includes something like a base security profile (i.e. something that EVERY node in your infrastructure should have). Moving down the line, you COULD namespace according to function like roles::app for your application server machines. The roles::app class could inherit from the roles class (which gets the base security profile), and could then include the profiles necessary to setup an application server. Next, you could subclass down to roles::app::site_foo for an application server that supports some site in your organization. That class inherits from the roles::app class, and then adds profiles that are specific to that site (maybe they use Jboss instead of Tomcat, and thus that’s where the differentiation occurs). This is great because you don’t have a lot of repeated use of the include function, but it also makes it hard to definitively look at a specific role to see exactly what’s being declared (i.e. all the profiles). You have to weigh what you value more: less typing or greater visibility. I will err on the side of greater visibility (just due to that whole 3am outage thing), but it’s up to you to decide what to optimize for.

A role similar, yet different, from another role is: a new role

EVERYBODY says to me “Gary, I have this machine that’s an AWFUL LOT like this role over here, but…it’s different.” My answer to them is: “Great, that’s another role.” If the thing that’s different is data (i.e. which database to connect to, or what IP address to route traffic through), then that difference should be put in HIERA and the classification should remain the same. If that difference is technology-specific (i.e. this server uses JBoss instead of Tomcat) then first look and see if you can isolate how you know this machine is different (maybe it’s on a different subnet, maybe it’s at a different location, something like that). If you can figure that out and write a Fact for it (or use similar conditional logic to determine this logically), then you can just drop that conditional logic in your role and let it do the heavy lifting. If, in the end, this bit of data is totally arbitrary, then you’ll need to create another role (perhaps a subclass using the above namespacing) and assign it to your node.

The hardest thing about this setup is naming your roles. Why? Every site is different. It’s hard for me to account for differences in your setup because your workplace is dysfunctional (seriously).

Review: what does this get you?

Let’s walk through every level of this setup from the top to the bottom and see what it gets you. Every node is classified to a single role, and, for the most part, that classification isn’t going to change. Now you can take all the extra work off your classifier tool and put it back into the manifests (that are subject to version control, so you can git blame to your heart’s content and see who last changed the role/profile). Each role is going to include one or more profile, which gives us the added idempotent protection of the include function (of course, if profiles have collisions with classes you’ll have to resolve those. Say one or more profiles tries to include an apache class – simply break that component out into a separate profile, extract the parameters from Hiera, and include that profile at a higher level). Each profile is going to do Hiera lookups which should give you the ability to provide different data for different host types (i.e. different data on a per-environment level, or however you lay out your Hiera hierarchy), and that data will be passed directly to class that is declared. Finally, each component module will accept parameters as variables internal to that module, default parameters/variables to sane values in the ::params class, and use those variables when declaring each resource throughtout its classes.

  • Roles abstract profiles
  • Profiles abstract component modules
  • Hiera abstracts configuration data
  • Component modules abstract resources
  • Resources abstract the underlying OS implementation

Choose your level of comfortability

The roles and profiles pattern also buys you something else – the ability for less-skilled and more-skilled Puppet users to work with the same codebase. Let’s say you use some GUI classifier (like the Puppet Enterprise Console), someone who’s less skilled at Puppet looks and sees that a node is classified with a certain role, so they open the role file and see something like this:

1
2
3
include profiles::wordpress
include profiles::tomcat
include profiles::git::repo_server

That’s pretty legible, right? Someone who doesn’t regularly use Puppet can probably make a good guess as to what’s on the machine. Need more information? Open one of the profiles and look specifically at the classes that are being declared. Need to know the data being passed? Jump into Hiera. Need to know more information? Dig into each component module and see what’s going on there.

When you have everything abstracted correctly, you can have developers providing data (like build versions) to Hiera, junior admins grouping nodes for classification, more senior folk updating profiles, and your best Puppet people creating/updating component modules and building plugins like custom facts/functions/whatever.

Great! Now go and refactor…

If you’ve used Puppet for more than a month, you’re probably familiar with the “Oh shit, I should have done it THAT way…let me refactor this” game. I know, it sucks, and we at Puppet Labs haven’t been shy of incorporating something that we feel will help people out (but will also require some refactoring). This pattern, though, has been in use by the Professional Services team at Puppet Labs for over a year without modification. I’ve used this on sites GREAT and small, and every site with which I’ve consulted and implemented this pattern has been able to both understand its power and derive real value within a week. If you’re contemplating a refactor, you can’t go wrong with Roles and Profiles (or whatever names you decide to use).

Building a Functional Puppet Workflow Part 1: Module Structure

Working as a professional services engineer for Puppet Labs, my life consists almost entirely of either correcting some of the worst code atrocities you’ve seen in your life, or helping people get started with Puppet so that they don’t need to call us again due to: A.) Said code atrocities or B.) Refactor the work we JUST helped them start. It wasn’t ALWAYS like this – I can remember some of my earliest gigs, and I almost feel like I should go revisit them if only to correct some of the previous ‘best practices’ that didn’t quite pan out.

This would be exactly why I’m wary of ‘Best Practices’ – because one person’s ‘Best Practice’ is another person’s ‘What the fuck did you just do?!’

Having said that, I’m finding myself repeating a story over and over again when I train/consult, and that’s the story of ‘The Usable Puppet Workflow.’ Everybody wants to know ‘The Right Way™’, and I feel like we finally have a way that survives a reasonable test of time. I’ve been promoting this workflow for over a year (which is a HELL of a long time in Startup time), and I’ve yet to really see an edge case it couldn’t handle.

(If you’re already savvy: yes, this is the Roles and Profiles talk)

I’ll be breaking this workflow down into separate blog posts for every component, and, as always, your comments are welcome…

It all starts with the component module

The first piece of a functional Puppet deployment starts with what we call ‘component modules’. Component modules are the lowest level in your deployment, and are modules that configure specific pieces of technology (like apache, ntp, mysql, and etc…). Component modules are well-encapsulated, have a reasonable API, and focus on doing small, specific things really well (i.e. the *nix way).

I don’t want to write thousands of words on building component modules because I feel like others have done this better than I. As examples, check out RI’s Post on a simple module structure, Puppet Labs’ very own docs on the subject, and even Alessandro’s Puppetconf 2012 session. Instead, I’d like to provide some pointers on what I feel makes a good component module, and some ‘gotchas’ we’ve noticed.

Parameters are your API

In the current world of Puppet, you MUST define the parameters your module will accept in the Puppet DSL. Also, every parameter MUST ultimately have a value when Puppet compiles the catalog (whether by explicitly passing this parameter value when declaring the class, or by assuming a default value). Yes, it’s funny that, when writing a Puppet class, if you typo a VARIABLE Puppet will not alert you to this (in a NON use strict-ian sort of approach) and will happily accept a variable in an undefined state, but the second you don’t pass a value to your class parameter you’re in for a rude compilation error. This is the way of Puppet classes at the time of this writing, so you’re going to see Puppet classes with LINES of defined parameters. I expect this to change in the future (please let this change in the near future), but for now, it’s a necessary evil.

The parameters you expose to your top-level class (i.e. given class names like apache and apache::install, I’m talking specifically about apache) should be treated as an API to your module. IDEALLY, they’re the ONLY THING that a user needs to modify when using your module. Also, whenever possible, it should be the case that a user need ONLY interact with the top-level class when using your module (of course, defined resource types like apache::vhost are used on an ad-hoc basis, and thus are the exception here).

Inherit the ::params class

We’re starting to make enemies at this point. It’s been a convention for modules to use a ::params class to assign values to all variables that are going to be used for all classes inside the module. The idea is that the ::params class is the one-stop-shop to see where a variable is set. Also, to get access to a variable that’s set in a Puppet class, you have to declare the class (i.e. use the include() function or inherit from that class). When you declare a class that has both variables AND resources, those resources get put into the catalog, which means that Puppet ENFORCES THE STATE of those resources. What if you only needed a variable’s value and didn’t want to enforce the rest of the resources in that class? There’s no good way in Puppet to do that. Finally, when you inherit from a class in Puppet that has assigned variable values, you ALSO get access to those variables in the parameter definition section of your class (i.e. the following section of the class:

class apache (
  $port = $apache::params::port,
  $user = $apache::params::user,
) inherits apache::params {

See how I set the default value of $apache::port to $apache::params::port? I could only access the value of the variable $apache::params::port in that section by inheriting from the apache::params class. I couldn’t insert include apache::params below that section and be allowed access to the variable up in the parameter defaults section (due to the way that Puppet parses classes).

FOR THIS REASON, THIS IS THE ONLY RECOMMENDED USAGE OF INHERITANCE IN PUPPET!

We do NOT recommend using inheritance anywhere else in Puppet and for any other reason because there are better ways to achieve what you want to do INSTEAD of using inheritance. Inheritance is a holdover from a scarier, more lawless time.

NOTE: Data in Modules – There’s a ‘Data in Modules’ pattern out there that attempts to eliminate the ::params class. I wrote about it in a previous post, and I recommend you read that post for more info (it’s near the bottom).

Do NOT do Hiera lookups in your component modules!

This is something that’s really only RECENTLY been pushed. When Hiera was released, we quickly recognized that it would be the answer to quite a few problems in Puppet. In the rush to adopt Hiera, many people started adding Hiera calls to their modules, and suddenly you had ‘Hiera-compatible’ modules out there. This caused all kinds of compatibility problems, and it was largely because there wasn’t a better module structure and workflow by which to integrate Hiera. The pattern that I’ll be pushing DOES INDEED use Hiera, BUT it confines all Hiera calls to a higher-level wrapper class we call a ‘profile’. The reasons for NOT using Hiera in your module are:

  • By doing Hiera calls at a higher level, you have a greater visibility on exactly what parameters were set by Hiera and which were set explicitly or by default values.
  • By doing Hiera calls elsewhere, your module is backwards-compatible for those folks who are NOT using Hiera

Remember – your module should just accept a value and use it somewhere. Don’t get TOO smart with your component module – leave the logic for other places.

Keep your component modules generic

We always get asked “How do I know if I’m writing a good module?” We USED to say “Well, does it work?” (and trust me, that was a BIG hurdle). Now, with data separation models out there like Hiera, I have a couple of other questions that I ask (you know, BEYOND asking if it compiles and actually installs the thing it’s supposed to install). The best way I’ve found to determine if your module is ‘generic enough’ is if I asked you TODAY to give me your module, would you give it to me, or would you be worried that there was some company-specific data locked in there? If you have company-specific data in your module, then you need to refactor the module, store the data in Hiera, and make your module more generic/reusable. Also, does your module focus on installing one piece of technology, or are you declaring packages for shared libraries or other components (like gcc, apache, or other common components)? You’re not going to win any prizes for having the biggest, most monolithic module out there. Rather, if your module is that large and that complex, you’re going to have a hell of a time debugging it. Err on the side of making your modules smaller and more task-specific. So what if you end up needing to declare 4 classes where you previously declared 1? In the roles and profiles pattern we will show you in the next blog post, you can abstract that away ANYHOW.

Don’t play the “what if” game

I’ve had more than a couple of gigs where the customer says something along the lines of “What if we need to introduce FreeBSD/Solaris/etc… nodes into our organization, shouldn’t I account for them now?” This leads more than a few people down a path of entirely too-complex modules that become bulky and unwieldy. Yes, your modules should be formatted so that you can simply add another case in your ::params class for another OS’s parameters, and yes, your module should be formatted so that your ::install or ::config class can handle another OS, but if you currently only manage Redhat, and you’ve only EVER managed Redhat, then don’t start adding Debian parameters RIGHT NOW just because you’re afraid you might inherit Ubuntu machines. The goal of Puppet is to automate the tasks that eat up the MAJORITY of your time so you can focus on the edge cases that really demand your time. If you can eventually automate those edge cases, then AWESOME! Until then, don’t spend the majority of your time trying to automate the edge cases only to drown under the weight of deadlines from simple work that you COULD have already automated (but didn’t, because you were so worried about the exceptions)!

Store your modules in version control

This should go without saying, but your modules should be stored in version control (a la git, svn, hg, whatever). We tend to prefer git due to its lightweight branching and merging (most of our tooling and solutions will use git because we’re big git users), but you’re free to use whatever you want. The bigger question is HOW to store your modules in version control. There are usually two schools of thought:

  • One repository per module
  • All modules in a single repository

Each model has its pros and cons, but we tend to recommend one module per repository for the following reasons:

  • Individual repos mean individual module development histories
  • Most VCS solutions don’t have per-folder ACLs for a single repositories; having multiple repos allows per-module security settings.
  • With the one-repository-per-module solution, modules you pull down from the Forge (or Github) must be committed to your repo. Having multiple repositories for each module allow you to keep everything separate

NOTE: This becomes important in the third blog post in the series when we talk about moving changes to each Puppet Environment, but it’s important to introduce it NOW as a ‘best practice’. If you use our recommended module/environment solution, then one-module-per-repo is the best practice. If you DON’T use our solution, then the single repository per for all modules will STILL work, but you’ll have to manage the above issues. Also note that even if you currently have every module in a single repository, you can STILL use our solution in part 3 of the series (you’ll just need to perform a couple of steps to conform).

Best practices are shit

In general, ‘best practices’ are only recommended if they fit into your organizational workflow. The best and worst part of Puppet is that it’s infinitely customizable, so ‘best practices’ will invariably be left wanting for a certain subset of the community. As always, take what I say under consideration; it’s quite possible that I could be entirely full of shit.

Seriously, What Is This Provider Doing?

Clarke’s third law states: “Any sufficiently advanced technology is indistinguishable from magic.” In the case of Ruby and Puppet provider interaction, I’m inclined to believe it. If you want proof, take a look at some of the native Puppet types – no amount of ‘Expecto Patronum’ will free you from the Ruby metaprogramming dementors that hover around lib/puppet/provider/exec-land.

In my first post tackling Puppet types and providers, I introduced the concept of Puppet types and the utility they provide. In the second post, I brought you to the great plain of Puppet providers and introduced the core methods necessary for creating a very basic Puppet provider with a single property (HINT: if you’ve not read either of those posts, or you’ve never dealt with basic types and providers, you might want to stop here and read up a bit on the topics). The problems with a provider like the one created in that post were:

  • puppet resource support wasn’t implemented, so you couldn’t query for existing instances of the type on the system (and their corresponding values)
  • The getter method would be called for EVERY instance of the type on the system, which would mean shelling-out multiple times during a run
  • Ditto for the setter method (if changes to multiple instances of the type were necessary)
  • That type was VERY basic (i.e. ensurable with a single property)

Unfortunately, when most of us have the need of a Puppet type and provider, we usually require multiple properties and reasonably complex system interaction. When it comes to creating both a getter and a setter method for every property (including the potential performance hit that could come from shelling-out many times during a Puppet run), ain’t nobody got time for that. And finally, puppet resource is a REALLY handy tool for querying the current state of your resources on a system. These problems all have solutions, but up until recently there was just one more problem:

Good luck finding documentation for those solutions.

NOTE: The Puppet Types and Providers book written by Nan and Dan is a great resource that provides a bit of a deeper dive than I’ll be doing in this post – DO check it out if you want to know more

Something, something, puppet resource

The puppet resource command (or ralsh, as it used to be known), is a very handy command for querying a system and returning the current state of resources for a specific Puppet type. Try it out if you never have (note that the following is being run on CentOS 6.4):

[root@linux ~]# puppet resource user
user { 'abrt':
  ensure           => 'present',
  gid              => '173',
  home             => '/etc/abrt',
  password         => '!!',
  password_max_age => '-1',
  password_min_age => '-1',
  shell            => '/sbin/nologin',
  uid              => '173',
}
user { 'adm':
  ensure           => 'present',
  comment          => 'adm',
  gid              => '4',
  groups           => ['sys', 'adm'],
  home             => '/var/adm',
  password         => '*',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/sbin/nologin',
  uid              => '3',
}
< ... and more users below ... >

The puppet resource command returns a list of all users on the system and their current property values (note you can only see the password hash if you’re running Puppet with sufficient privileges). You can even query puppet resource for the values of a specific resource:

[root@gary ~]# puppet resource user glarizza
user { 'glarizza':
  ensure           => 'present',
  gid              => '502',
  home             => '/home/glarizza',
  password         => '$1$hsUuCygh$kgLKG5epuRaXHMX5KmxrL1',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/bin/bash',
  uid              => '502',
}

puppet resource seems magical, and you might think that if you create a custom type and sync it to your machine then puppet resource will automatically work for you.

And you would be wrong.

puppet resource will only work if you’ve implemented a special method in your provider called self.instances.

self.instances

The self.instances method is pretty sparsely documented, so let’s go straight to the source…code, that is:

lib/puppet/provider.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  # Returns a list of system resources (entities) this provider may/can manage.
  # This is a query mechanism that lists entities that the provider may manage on a given system. It is
  # is directly used in query services, but is also the foundation for other services; prefetching, and
  # purging.
  #
  # As an example, a package provider lists all installed packages. (In contrast, the File provider does
  # not list all files on the file-system as that would make execution incredibly slow). An implementation
  # of this method should be made if it is possible to quickly (with a single system call) provide all
  # instances.
  #
  # An implementation of this method should only cache the values of properties
  # if they are discovered as part of the process for finding existing resources.
  # Resource properties that require additional commands (than those used to determine existence/identity)
  # should be implemented in their respective getter method. (This is important from a performance perspective;
  # it may be expensive to compute, as well as wasteful as all discovered resources may perhaps not be managed).
  #
  # An implementation may return an empty list (naturally with the effect that it is not possible to query
  # for manageable entities).
  #
  # By implementing this method, it is possible to use the `resources´ resource type to specify purging
  # of all non managed entities.
  #
  # @note The returned instances are instance of some subclass of Provider, not resources.
  # @return [Array<Puppet::Provider>] a list of providers referencing the system entities
  # @abstract this method must be implemented by a subclass and this super method should never be called as it raises an exception.
  # @raise [Puppet::DevError] Error indicating that the method should have been implemented by subclass.
  # @see prefetch
  def self.instances
    raise Puppet::DevError, "Provider #{self.name} has not defined the 'instances' class method"
  end

You’ll find that method around lines 348 – 377 of the lib/puppet/provider.rb file in Puppet’s source code (as of this writing, which is a Friday… on a flight from DC to Seattle). To summarize, implementing self.instances in your provider means that you need to return an array of provider instances that have been discovered on the current system and all the current property values (we call these values the ‘is’ values for the properties, since each value IS the current value of the property on the system). It’s recommended to only implement self.instances if you can gather all resource property values in a reasonably ‘cheap’ manner (i.e. a single system call, read from a single file, or some similar low-IO means). Implementing self.instances not only gives you the ability to run puppet resource (which also affords you a quick-and-dirty way of testing your provider without creating unit tests by simply running puppet resource in debug mode and checking the output), but it also allows the ‘resources’ resource to work its magic (If you’ve never heard of the ‘resources’ resource, check this link for more information on this terribly/awesomely named resource type).

An important note about scope and self.instances

The self.instances method is a method of the PROVIDER, which is why it is prefixed with self. Even though it may be located in the provider file itself, and even though it sits among other methods like create, exists?, and destroy (which are methods of the INSTANCE of the provider), it does NOT have the ability to directly access or call those methods. It DOES have the ability to access other methods of the provider directly (i.e. other methods prefixed with self.). This means that if you were to define a method like:

1
2
3
def self.proxy_type
  'web'
end

You could access that directly from self.instances by simply calling it:

1
type_of_proxy = proxy_type()

Let’s say you had a method of the INSTANCE of the provider, like so:

1
2
3
def system_type
  'OS X'
end

You COULD NOT access this method from self.instances directly (there are always hacky ways around EVERYTHING in Ruby, sure, but there is no easy/straightforward way to access this method).

And here’s where it gets confusing…

Methods of the INSTANCE of the provider CAN access provider methods directly. Given our previous example, what if the system_type method wanted to access self.proxy_type for some reason? It could be done like so:

1
2
3
4
def system_type
  type_of_proxy = self.class.proxy_type()
  'OS X'
end

A method of the instance of the provider can access provider methods by simply calling the class method on itself (which returns the provider object). This is a one-way street for method creation that needs to be heeded when designing your provider.

Building a provider that uses self.instances (or: more Mac problems)

In the previous two posts on types/providers, I created a type and provider for managing bypass domains for network proxies on OS X. For this post, let’s create a provider for actually MANAGING the proxy settings for a given network interface. Here’s a quick type for managing a web proxy on a network interface on OS X:

puppet-mac_proxy/lib/puppet/type/mac_web_proxy.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Puppet::Type.newtype(:mac_web_proxy) do
  desc "Puppet type that models a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
    munge do |value|
      value.downcase
    end
    def insync?(is)
      is.downcase == should.downcase
    end
  end

  newproperty(:proxy_server) do
    desc "Proxy Server setting for the interface"
  end

  newparam(:authenticated_username) do
    desc "Username for proxy authentication"
  end

  newparam(:authenticated_password) do
    desc "Password for proxy authentication"
  end

  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end
end

This type has three properties, is ensurable, and a namevar called ‘name’. As for the provider, let’s start with self.instances and get the web proxy values for all interfaces. To do that we’re going to need to know how to get a list of all network interfaces, and also how to get the current proxy state for every interface. Fortunately, both of those tasks are accomplished with the networksetup binary:

▷ networksetup -listallnetworkservices
An asterisk (*) denotes that a network service is disabled.
Bluetooth DUN
Display Ethernet
Ethernet
FireWire
Wi-Fi
iPhone USB
Bluetooth PAN

▷ networksetup -getwebproxy Ethernet
Enabled: No
Server: proxy.corp.net
Port: 1234
Authenticated Proxy Enabled: 0

Cool, so one binary will do both tasks and they’re REASONABLY low-cost to run.

Helper methods

To keep things separated and easier to test, let’s create separate helper methods for each task. Since these methods are going to be called by self.instances, they will be provider methods.

The first method will simply return an array of network interfaces:

1
2
3
4
5
def self.get_list_of_interfaces
  interfaces = networksetup('-listallnetworkservices').split("\n")
  interfaces.shift
  interfaces.sort
end

Remember from above that the networksetup -listallnetworkservices command returns an info line before each interface, so this code strips that line off and returns a sorted list of interfaces based on a one-line-per-interface assumption.

The next method we need will accept a network interface name as an argument, will run the networksetup -getwebproxy (interface) command, and will use its output to return all the current property values (including the ensure value) for every instance of the type on the system (i.e. every interface’s proxy settings and whether the proxy is enabled, which means the resource is ensured as ‘present’, or disabled, which means the resource is ensured as ‘absent’.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def self.get_proxy_properties(int)
  interface_properties = {}

  begin
    output = networksetup(['-getwebproxy', int])
  rescue Puppet::ExecutionFailure => e
    raise Puppet::Error, "#mac_web_proxy tried to run `networksetup -getwebproxy #{int}` and the command returned non-zero. Failing here..."
  end

  output_array = output.split("\n")
  output_array.each do |line|
    line_values = line.split(':')
    line_values.last.strip!
    case line_values.first
    when 'Enabled'
      interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
    when 'Server'
      interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
    when 'Port'
      interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
    when 'Authenticated Proxy Enabled'
      interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
    end
  end

  interface_properties[:provider] = :ruby
  interface_properties[:name]     = int.downcase
  interface_properties
end

A couple of notes on the method itself – first, the networksetup command must exit zero on success or non-zero on failure (which it does). If ever the networksetup command were to return non-zero, we’re raising our own Puppet::Error, documenting what happened, and bailing out.

This method is going to return a hash of properties and values that is going to be used by self.instances – so the case statement needs to account for that. HOWEVER you populate that hash is up to you (in my case, I’m checking for specific output that networksetup returns), but make sure that the hash has a value for the :ensure key at the VERY least.

Assembling self.instances

Once the helper provider methods have been defined, self.instances becomes reasonably simple:

1
2
3
4
5
6
def self.instances
  get_list_of_interfaces.collect do |int|
    proxy_properties = get_proxy_properties(int)
    new(proxy_properties)
  end
end

Remember that self.instances must return an array of provider instances, and each one of these instances must include the namevar and ensure value at the very least. Since self.get_proxy_properties returns a hash containing all the property ‘is’ values for a resource, declaring a new provider instance is as easy as calling the new() method on the return value of self.get_proxy_properties for every network interface. In the end, the return value of the collect method on get_list_of_interfaces will be an array of provider instances.

Existance, @property_hash, and more magical methods

Even though we have assembled a functional self.instances method, we don’t have complete implementation that will work with puppet resource. The problem is that Puppet can’t yet determine the existance of a resource (even though the resource’s ensure value has been set by self.instances). If you were to execute the code with puppet resource mac_web_proxy, you would get the error:

Error: Could not run: No ability to determine if mac_web_proxy exists

To satisfy Puppet, we need to implement an exists?() method for the instance of the provider. Fortunately, we don’t need to re-implement any existing logic and can instead use @property_hash

A @property_hash is born…

I’ve omitted one last thing that is borne out of self.instances, and that’s the @property_hash instance variable. @property_hash is populated by self.instances as an instance variable that’s available to methods of the INSTANCE of the provider (i.e. methods that ARE NOT prefixed with self.) containing all the ‘is’ values for a resource. Do you need to get the ‘is’ value for a property? Just use @property_hash[:property_name]. Since the exists? method is a method of the instance of the provider, and it’s essentially the same thing as the ensure value for a resource, let’s implement exists? by doing a check on the ensure value from the @property_hash variable:

1
2
3
def exists?
  @property_hash[:ensure] == :present
end

Perfect, now exists? will return true or false accordingly and Puppet will be satisfied.

Getter methods – the slow way

Puppet may be happy that you have an exists? method, but puppet resource won’t successfully run until you have a method that returns an ‘is’ value for every property of the type (i.e. the proxy_server, proxy_authenticated, and proxy_port attributes for the mac_web_proxy type). These ‘is value methods’ are called ‘getter’ methods: they’re methods of the instance of the provider, and are named exactly the same as the properties they represent.

You SHOULD be thinking: “Hey, we already have @property_hash, why can’t we just use it again? We can, and you COULD implement all the getter methods like so:

1
2
3
def proxy_server
  @property_hash[:proxy_server]
end

If you did that, you would be TECHNICALLY correct, but it would seem to be a waste of lines in a provider (especially if you have many properties).

Getter methods – the quicker ‘method’

Because uncle Luke hated excess lines of code, he made available a method called mk_resource_methods which works very similarly to Ruby’s attr_accessor method. Adding mk_resource_methods to your provider will AUTOMATICALLY create getter methods that pull values out of @property_hash in the similar way that I just demonstrated (it will also create SETTER methods too, but we’ll look at those later). Long story short – don’t make getter/setter methods if you’re using self.instances – just implement mk_resource_methods.

JUST enough for puppet resource

Putting everything that we’ve learned up until now, we should have a provider that looks like this:

lib/puppet/provider/mac_web_proxy/ruby.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Puppet::Type.type(:mac_web_proxy).provide(:ruby) do
  commands :networksetup => 'networksetup'

  mk_resource_methods

  def self.get_list_of_interfaces
    interfaces = networksetup('-listallnetworkservices').split("\n")
    interfaces.shift
    interfaces.sort
  end

  def self.get_proxy_properties(int)
    interface_properties = {}

    begin
      output = networksetup(['-getwebproxy', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug "#get_proxy_properties had an error -> #{e.inspect}"
      return {}
    end

    output_array = output.split("\n")
    output_array.each do |line|
      line_values = line.split(':')
      line_values.last.strip!
      case line_values.first
      when 'Enabled'
        interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
      when 'Server'
        interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
      when 'Port'
        interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
      when 'Authenticated Proxy Enabled'
        interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
      end
    end

    interface_properties[:provider] = :ruby
    interface_properties[:name]     = int.downcase
    Puppet.debug "Interface properties: #{interface_properties.inspect}"
    interface_properties
  end

  def self.instances
    get_list_of_interfaces.collect do |int|
      proxy_properties = get_proxy_properties(int)
      new(proxy_properties)
    end
  end

  def exists?
    @property_hash[:ensure] == :present
  end
end

Here’s a tree of the module I’ve assembled on my machine:

└(~/src/puppet-mac_web_proxy)▷ tree .
.
└── lib
   └── puppet
       ├── provider
       │   └── mac_web_proxy
       │       └── ruby.rb
       └── type
           └── mac_web_proxy.rb

To test out puppet resource, we need to make Puppet aware of our new custom module. To do that, let’s set the $RUBYLIB environmental variable. $RUBYLIB is queried by Puppet and is added to its load path when looking for additional Puppet plugins. You will need to set $RUBYLIB to the path of the lib directory in the custom module that you’ve assembled. Because my custom module is located in ~/src/puppet-mac_web_proxy, I’m going to set $RUBYLIB like so:

export RUBYLIB=~/src/puppet-mac_web_proxy/lib

You can execute that command from the command line, or set it in your ~/.{bash,zsh}rc and source that file.

Finally, with all the files in place and $RUBYLIB set, it’s time to officially run puppet resource (I’m going to do it in --debug mode to see the debug output that I’ve written into the code):

└(~/src/blogtests)▷ envpuppet puppet resource mac_web_proxy --debug
Debug: Executing '/usr/sbin/networksetup -listallnetworkservices'
Debug: Executing '/usr/sbin/networksetup -getwebproxy Bluetooth DUN'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"bluetooth dun"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Bluetooth PAN'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"bluetooth pan"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Display Ethernet'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>"foo.bar.baz", :proxy_port=>"80", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"display ethernet"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Ethernet'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>"proxy.corp.net", :proxy_port=>"1234", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"ethernet"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy FireWire'
Debug: Interface properties: {:ensure=>:present, :proxy_server=>"stuff.bar.blat", :proxy_port=>"8190", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"firewire"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Wi-Fi'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"wi-fi"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy iPhone USB'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"iphone usb"}
mac_web_proxy { 'bluetooth dun':
  ensure => 'absent',
}
mac_web_proxy { 'bluetooth pan':
  ensure => 'absent',
}
mac_web_proxy { 'display ethernet':
  ensure => 'absent',
}
mac_web_proxy { 'ethernet':
  ensure => 'absent',
}
mac_web_proxy { 'firewire':
  ensure       => 'present',
  proxy_port   => '8190',
  proxy_server => 'stuff.bar.blat',
}
mac_web_proxy { 'iphone usb':
  ensure => 'absent',
}
mac_web_proxy { 'wi-fi':
  ensure => 'absent',
}

Note that you will only see ‘is’ values if you have a proxy set on any of your network interfaces (obviously, if you’ve not setup a proxy, then it will show as ‘absent’ on every interface. You can setup a proxy by opening System Preferences, clicking on the Network icon, choosing an interface from the list on the left, clicking the Advanced button in the lower right corner of the window, clicking the ‘Proxies” tab at the top of the window, clicking the checkbox next to the “Web Proxy (HTTP)” choice, and entering a proxy URL and port. NOW do you get why we automate this bullshit?). Also, your list of network interfaces may not match mine if you have more or less interfaces than I do.

TADA! puppet resource WORKS! ISN’T THAT AWESOME?! WHY AM I TYPING IN CAPS?!

Prefetching, flushing, caching, and other hard shit

Okay, so up until now we’ve implemented one half of the equation – we can query ‘is’ values and puppet resource works. What about using this ‘more efficient’ method of getting values for a type on the OTHER end of the spectrum? What if instead of calling setter methods one-by-one to set values for all resources of a type in a catalog we had a way to do it all at once? Well, such a way exists, and it’s called the flush method…but we’re getting slightly ahead of ourselves. Before we get to flushing, we need to point out that self.instances is ONLY used by puppet resource – THAT’S IT (and it’s only used by self.instances when you GET values from the system, not when you SET values on the system…and if you never knew that puppet resource could actually SET values on the system, well, I guess you got another surprise today). If we want puppet agent or puppet apply to use the behavior that self.instances implements, we need to create another method: self.prefetch

self.prefetch

If you thought self.instances didn’t have much documentation, wait until you see self.prefetch. After wading the waters of self.prefetch, I’m PRETTY SURE its implementation might have come to uncle Luke after a long night in Reed’s chem lab where he might have accidently synthesized mescaline.

Let’s look at the codebase:

lib/puppet/provider.rb
1
2
3
4
5
6
7
8
9
# @comment Document prefetch here as it does not exist anywhere else (called from transaction if implemented)
# @!method self.prefetch(resource_hash)
# @abstract A subclass may implement this - it is not implemented in the Provider class
# This method may be implemented by a provider in order to pre-fetch resource properties.
# If implemented it should set the provider instance of the managed resources to a provider with the
# fetched state (i.e. what is returned from the {instances} method).
# @param resources_hash [Hash<{String => Puppet::Resource}>] map from name to resource of resources to prefetch
# @return [void]
# @api public

That’s right, documentation for self.prefetch in the Puppet codebase is 9 lines of comments in lib/puppet/provider.rb, which is awesome. So when is self.prefetch used to provide information to Puppet and when is self.instances used?

Puppet Subcommand Provider Method Execution Mode
puppet resource self.instances getting values
puppet resource self.prefetch setting values
puppet agent self.prefetch getting values
puppet agent self.prefetch setting values
puppet apply self.prefetch getting values
puppet apply self.prefetch setting values

.

This doesn’t mean that self.instances is really only handy for puppet resource – that’s definitely not the case. In fact, frequently you will find that self.instances is used by self.prefetch to do some of the heavy lifting. Even though self.prefetch works VERY SIMILARLY to the way that self.instances works for puppet resource (and by that I mean that it’s going to gather a list of instances of a type on the system, and it’s also going to populate @property_hash for puppet apply, puppet agent, and when when puppet resource is setting values), it’s not an exact one-for-one match with self.instances. The self.prefetch method for a type is called once per run when Puppet encounters a resource of that type in the catalog. The argument to self.prefetch is a hash of all managed resources of that type that are encountered in a compiled catalog for that node (the hash’s key will be the namevar of the resource, and the value will be an instance of Puppet::Type – in this case, Puppet::Type::Mac_web_proxy). Your task is to implement a self.prefetch method that gets an array of instances of the provider that are discovered on the system, iterates through the hash passed to self.prefetch (containing all the resources of the type that were discovered in the catalog), and passes the correct instance of the provider that was discovered on the system to the provider= method of the correct instance of the type that was discovered in the catalog.

What the actual fuck?!

Okay, let’s break that apart to try and discover exactly what’s going on here. Assume that I’ve setup a proxy for the ‘FireWire’ interface on my laptop, and I want to try and manage that resource with puppet apply (i.e. something that uses self.prefetch). The resource in the manifest used to manage the proxy will look something like this:

1
2
3
4
5
mac_web_proxy { 'firewire':
  ensure       => 'present',
  proxy_port   => '8080',
  proxy_server => 'proxy.server.org',
}

When self.prefetch is called by Puppet, it’s going to be passed a hash looking something like this:

1
{ "firewire" => Mac_web_proxy[firewire] }

Because only one resource is encountered in the catalog, only one key/value pair shows up in the hash that’s passed as the argument to self.prefetch.

The job of self.prefetch is to find the current state of Mac_web_proxy['firewire'] on the system, create a new instance of the mac_web_proxy provider that contains the ‘is’ values for the Mac_web_proxy['firewire'] resource, and assign this provider instance as the value of the provider= method to the instance of the mac_web_proxy TYPE that is the VALUE of the ‘firewire’ key of the hash that’s passed to self.prefetch.

No, really, that’s what it’s supposed to do. I’m not even sure what’s real anymore

You’ll remember that self.instances gives us an array of resources that were discovered on the system, so we have THAT part of the implementation written. We also have the hash of resources that were encountered in the catalog – so we have THAT part done too. Our only job is to connect the dots (la la la la), programmatically speaking. This should just about do it:

1
2
3
4
5
6
7
def self.prefetch(resources)
  instances.each do |prov|
    if resource = resources[prov.name]
      resource.provider = prov
    end
  end
end

I want to make a confession right now – I’ve only ever copied and pasted this code into every provider I’ve ever written that needed self.prefetch implemented. It wasn’t until someone actually asked me what it DID that I had to walk the path of figuring out EXACTLY what it did. Based on the last couple of paragraphs – can you blame me?

This code iterates through the array of resources returned by self.instances, tries to assign a variable resource based on referencing a key in the resources hash using the name of the resource (remember, resources is a hash containing all resources in the catalog), and, if this assignment works (i.e. it isn’t nil, which is what happens when you reference a key in a Ruby hash that doesn’t exist), then we’re calling the provider= method on the instance of the type that was referenced in the resources hash, and passing it the resource that was discovered on the system by self.instances.

Wow.

Why DID we do all of that? We did it all for the @provider_hash. Doing this will populate @provider_hash in all methods of the instance of the provider (i.e. exists?, create, destroy, etc..) just like self.instances did for puppet resource.

Flush it; Ship it

As I alluded to above, the opposite side of the coin to prefetching (which is a way to query the state for all resources at once) is flushing (or specifically the flush method). The flush method is called once per resource whenever the ‘is’ and ‘should’ values for a property differ (and synchronization needs to occur). The flush method does not take the place of property setter methods, but, rather, is used in conjunction with them to determine how to synchronize resource property values. In this vein, it’s a single trigger that can be used to set all property values for an individual resource simultaneously.

There are a couple of strategies for implementing flush, but one of the more popular ones in use is to create an instance variable that will hold values to be synchronized, and then determine inside flush how best to make as-few-as-possible calls to the system to synchronize all the property values for an individual resource.

Our resource type is unique because the networksetup binary that we’ll be using to synchronize values allows us to set most every property value with a single command. Because of this, we really only need that instance variable for one property – the ensure value. But let’s start with the initialization of that instance variable for the flush method:

1
2
3
4
def initialize(value={})
  super(value)
  @property_flush = {}
end

The initialize method is magic to Ruby – it’s invoked when you instantiate a new object. In our case, we want to create a new instance variable – @property_flush – that will be available to all methods of the instance of the provider. This instance variable will be a hash and will contain all the ‘should’ values that will need to be synchronized for a resource. The super method in Ruby sends a message to the parent of the current object, asking it to invoke a method of the same name (e.g. intialize). Basically, the initialize method is doing the exact same thing as it has always done with one exception – making the instance variable available to all methods of the instance of the provider.

The only ‘setter’ method you need

This provider is going to be unique not only because the networksetup binary will set values for ALL properties, but because to change/set ANY property values you have to change/set ALL the property values at the same time. Typically, you’ll see providers that will need to pass arguments to a binary in order to set individual values. For example, if you had a binary fooset that took arguments of --bar and --baz to set values respectively for bar and baz properties of a resource, you might see the following setter and flush methods for bar and baz:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def bar=(value)
  @property_flush[:bar] = value
end

def baz=(value)
  @property_flush[:baz] = value
end

def flush
  array_arguments = []
  if @property_flush
    array_arguments << '--bar' << @property_flush[:bar] if @property_flush[:bar]
    array_arguments << '--baz' << @property_flush[:baz] if @property_flush[:baz]
  end
  if ! array_arguments.empty?
    fooset(array_arguments, resource[:name])
  end
end

That’s not the case for networksetup – in fact, one of the ONLY places in our code where we’re going to throw a value inside @property_flush is going to be in the destroy method. If our intention is to ensure a proxy absent (or, in this case, disable the proxy for a network interface), then we can short-circuit the method we’re going to create to set proxy values by simply checking for a value in @property_flush[:ensure]. Here’s what the destroy method looks like:

1
2
3
def destroy
  @property_flush[:ensure] = :absent
end

Next, we need a method that will set values for our proxy. This method will handle all interaction to networksetup. So, how do you set proxy values with networksetup?

networksetup -setwebproxy <networkservice> <domain> <port number> <authenticated> <username> <password>

The three properties to our mac_web_proxy type are proxy_port, proxy_server, and proxy_authenticated which map to the ‘<port number>’, ‘<domain>’, and ‘<authenticated>’ values in this command. To change any of these values means we have to pass ALL of these values (again, which is why our flush implementation may be unique from other flush implementations). Here’s what the set_proxy method looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def set_proxy
  if @property_flush[:ensure] == :absent
      networksetup(['-setwebproxystate', resource[:name], 'off'])
      return
  end

  if (resource[:proxy_server].nil? or resource[:proxy_port].nil?)
    raise Puppet::Error, "Proxy types other than 'auto' require both a proxy_server and proxy_port setting"
  end
  if resource[:proxy_authenticated] != :true
    networksetup(
      [
        '-setwebproxy',
        resource[:name],
        resource[:proxy_server],
        resource[:proxy_port]
      ]
    )
  else
    networksetup(
      [
        '-setwebproxy',
        resource[:name],
        resource[:proxy_server],
        resource[:proxy_port],
        'on',
        resource[:authenticated_username],
        resource[:authenticated_password]
      ]
    )
  end
  networksetup(['-setwebproxystate', resource[:name], 'on'])
end

This helper method does all the validation checks for required properties, executes the correct command, and enables the proxy. Now, let’s implement flush:

1
2
3
4
5
6
7
def flush
  set_proxy

  # Collect the resources again once they've been changed (that way `puppet
  # resource` will show the correct values after changes have been made).
  @property_hash = self.class.get_proxy_properties(resource[:name])
end

The last line re-populates @property_hash with the current resource values, and is necessary for puppet resource to return correct values after it makes a change to a resource during a run.

The final method

We’ve implemented logic to query the state of all resources, to prefetch those states, to make changes to all properties at once, and to destroy a resource if it exists, but we’ve yet to implement logic to CREATE a resource if it doesn’t exist and it should. Well, this is a bit of a lie – the logic is in the code, but we don’t have a create method, so Puppet’s going to complain:

1
2
3
def create
  @property_flush[:ensure] = :present
end

Technically, this method doesn’t have to do a DAMN thing. Why? Remember how the flush method is triggered when a resource’s ‘is’ values differ from its ‘should’ values? Also, remember how the flush method only calls the set_proxy method? And, finally, remember how set_proxy only checks if @property_flush[:ensure] == :absent (and if it doesn’t, then it goes about its merry way running networksetup)? Right, well add these things up and you’ll realize that the create method is essentially meaningless based on our implementation (but if you OMIT create, then Puppet’s going to throw a shit-fit in the shape of of a Puppet::Error exception):

Error: /Mac_web_proxy[firewire]/ensure: change from absent to present failed: Could not set 'present' on ensure: undefined method `create' for Mac_web_proxy[firewire]:Puppet::Type::Mac_web_proxy

So make Puppet happy and write the goddamn create method, okay?

The complete provider:

Wow, that was a wild ride, huh? If you’ve been coding along, you should have created a file that looks something like this:

lib/puppet/provider/mac_web_proxy/ruby.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
Puppet::Type.type(:mac_web_proxy).provide(:ruby) do
  commands :networksetup => 'networksetup'

  mk_resource_methods

  def initialize(value={})
    super(value)
    @property_flush = {}
  end

  def self.get_list_of_interfaces
    interfaces = networksetup('-listallnetworkservices').split("\n")
    interfaces.shift
    interfaces.sort
  end

  def self.get_proxy_properties(int)
    interface_properties = {}

    begin
      output = networksetup(['-getwebproxy', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug "#get_proxy_properties had an error -> #{e.inspect}"
      return {}
    end

    output_array = output.split("\n")
    output_array.each do |line|
      line_values = line.split(':')
      line_values.last.strip!
      case line_values.first
      when 'Enabled'
        interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
      when 'Server'
        interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
      when 'Port'
        interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
      when 'Authenticated Proxy Enabled'
        interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
      end
    end

    interface_properties[:provider] = :ruby
    interface_properties[:name]     = int.downcase
    Puppet.debug "Interface properties: #{interface_properties.inspect}"
    interface_properties
  end

  def self.instances
    get_list_of_interfaces.collect do |int|
      proxy_properties = get_proxy_properties(int)
      new(proxy_properties)
    end
  end

  def create
    @property_flush[:ensure] = :present
  end

  def exists?
    @property_hash[:ensure] == :present
  end

  def destroy
    @property_flush[:ensure] = :absent
  end

  def self.prefetch(resources)
    instances.each do |prov|
      if resource = resources[prov.name]
        resource.provider = prov
      end
    end
  end

  def set_proxy
    if @property_flush[:ensure] == :absent
        networksetup(['-setwebproxystate', resource[:name], 'off'])
        return
    end

    if (resource[:proxy_server].nil? or resource[:proxy_port].nil?)
      raise Puppet::Error, "Both the proxy_server and proxy_port parameters require a value."
    end
    if resource[:proxy_authenticated] != :true
      networksetup(
        [
          '-setwebproxy',
          resource[:name],
          resource[:proxy_server],
          resource[:proxy_port]
        ]
      )
    else
      networksetup(
        [
          '-setwebproxy',
          resource[:name],
          resource[:proxy_server],
          resource[:proxy_port],
          'on',
          resource[:authenticated_username],
          resource[:authenticated_password]
        ]
      )
    end
    networksetup(['-setwebproxystate', resource[:name], 'on'])
  end

  def flush
    set_proxy

    # Collect the resources again once they've been changed (that way `puppet
    # resource` will show the correct values after changes have been made).
    @property_hash = self.class.get_proxy_properties(resource[:name])
  end
end

Undoubtedly there are better ways to write this Ruby code, no? Also, I’m SURE I have some errors/bugs in that code. It’s those things that keep me in a job…

Final Thoughts

So, I write these posts not to belittle or mock anyone who works on Puppet or wrote any of its implementation (except the amazing/terrifying bastard who came up with self.prefetch). Anybody who contributes to open source and who builds a tool to save some time for a bunch of sysadmins is fucking awesome in my book.

No, I write these posts so that you can understand the ‘WHY’ piece of the puzzle. If you fuck up the ‘HOW’ of the code, you can spend some time in Google and IRB to figure it out, but if you don’t understand the ‘WHY’ then you’re probably not going to even bother.

Also, selfishly, I move from project to project so quickly that it’s REALLY easy to forget both why AND how I did what I did. Posts like these give me someplace to point people when they ask me “What’s self.prefetch?” that ISN’T just the source code or a liquor store.

This isn’t the last post in the series, by the way. I haven’t even TOUCHED on writing unit tests for this code, so that’s going to be a WHOLE other piece altogether. Also, while this provider manages a WEB proxy for a network interface, understand that there are MANY MORE kinds of proxies for OS X network interfaces (including socks and gopher!). A future post will show you how to refactor the above into a parent provider that can be inherited to allow for code re-use among all the proxy providers that I need to create.

As always, you’re more than welcome to comment, ask questions, or simply bitch at me both on this blog as well as on Twitter: @glarizza. Hopefully this post helped you out and you learned a little bit more about how Puppet providers do their dirty work…

Namaste, bitches.

When to Hiera (Aka: How Do I Module?)

I’m convinced that writing Puppet modules is the ultimate exercise in bikeshedding: if it works, someone’s probably going to tell you that you could have done it better, if you’re using the methods suggested today, they’re probably going to be out-of-date in about 6 months, and good luck writing something that someone else can use cleanly without needing to change it.

I can help you with the last two.

Data and Code Separation == bliss?

I wrote a blog post about 2 years ago detailing why separating your data from your Puppet code was a good idea. The idea is still valid, which means it’s probably one of the better ideas I’ve ever stolen (Does anyone want any HD-DVDs?). Hunter Haugen and I put together a quick blog post on using Hiera to solve the data/code problem because there wasn’t a great bit of documentation on Hiera at that point in time. Since then, Hiera’s been widely accepted as “a good idea” and is in use in production Puppet environments around the world. In most every environment, usage of Hiera by more than just one person eventually gives birth to the question that inspired this post:

“What the hell does and does NOT belong in Hiera?”

Puppet data models

The params class pattern

Many Puppet modules out there since Puppet 2.6 have begun using this pattern:

puppetlabs-mysql/manifests/server.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class mysql::server (
  $config_file             = $mysql::params::config_file,
  $manage_config_file      = $mysql::params::manage_config_file,
  $old_root_password       = $mysql::params::old_root_password,
  $override_options        = {},
  $package_ensure          = $mysql::params::server_package_ensure,
  $package_name            = $mysql::params::server_package_name,
  $purge_conf_dir          = $mysql::params::purge_conf_dir,
  $remove_default_accounts = false,
  $restart                 = $mysql::params::restart,
  $root_group              = $mysql::params::root_group,
  $root_password           = $mysql::params::root_password,
  $service_enabled         = $mysql::params::server_service_enabled,
  $service_manage          = $mysql::params::server_service_manage,
  $service_name            = $mysql::params::server_service_name,
  $service_provider        = $mysql::params::server_service_provider,
  # Deprecated parameters
  $enabled                 = undef,
  $manage_service          = undef
) inherits mysql::params {

  ## Puppet goodness goes here
}

If you’re not familiar, this is a Puppet class definition for mysql::server that has several parameters defined and defaulted to values that come out of the mysql::params class. The mysql::params class looks a bit like this:

puppetlabs-mysql/manifests/params.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class mysql::params {
  case $::osfamily {
    'RedHat': {
      if $::operatingsystem == 'Fedora' and (is_integer($::operatingsystemrelease) and $::operatingsystemrelease >= 19 or $::operatingsystemrelease == "Rawhide") {
        $client_package_name = 'mariadb'
        $server_package_name = 'mariadb-server'
      } else {
        $client_package_name = 'mysql'
        $server_package_name = 'mysql-server'
      }
      $basedir             = '/usr'
      $config_file         = '/etc/my.cnf'
      $datadir             = '/var/lib/mysql'
      $log_error           = '/var/log/mysqld.log'
      $pidfile             = '/var/run/mysqld/mysqld.pid'
      $root_group          = 'root'
    }

    'Debian': {
      ## More parameters defined here
    }
  }
}

This pattern puts all conditional logic for all the variables/parameters used in the module inside one class – the mysql::params class. It’s called the ‘params class pattern’ because we suck at naming things.

Pros:

  • All conditional logic is in a single class
  • You always know which class to seek out if you need to change any of the logic used to determine a variable’s value
  • You can use the include function because parameters for each class will be defaulted to the values that came out of the params class
  • If you need to override the value of a particular parameter, you can still use the parameterized class declaration syntax to do so
  • Anyone using Puppet version 2.6 or higher can use it (i.e. anyone who’s been using Puppet since about 2010).
Cons:
  • Conditional logic is repeated in every module
  • You will need to use inheritance to inherit parameter values in each subclass
  • It’s another place to look if you ALSO use Hiera inside the module
  • Data is inside the manifest, so business logic is also inside params.pp

Hiera defaults pattern

When Hiera hit the scene, one of the first things people tried to do was to incorporate it into existing modules. The logic at that time was that you could keep all parameter defaults inside Hiera, rid yourself of the params class, and then just make Hiera calls out for your data. This pattern looks like this:

puppetlabs-mysql/manifests/server.pp
1
2
3
4
5
6
7
8
9
class mysql::server (
  $config_file             = hiera('mysql::params::config_file', 'default value'),
  $manage_config_file      = hiera('mysql::params::manage_config_file', 'default value'),
  $old_root_password       = hiera('mysql::params::old_root_password', 'default value'),
  ## Repeat the above pattern
) {

  ## Puppet goodness goes here
}

Pros:

  • All data is locked up in Hiera (and its multiple backends)
  • Default values can be provided if a Hiera lookup fails

Cons:

  • You need to have Hiera installed, enabled, and configured to use this pattern
  • All data, including non-business logic, is in Hiera
  • If you use the default value, data could either come from Hiera OR the default (multiple places to look when debugging)

Hybrid data model

This pattern is for those people who want the portability of the params.pp class combined with the power of Hiera. Because it’s a hybrid, there are multiple ways that people have set it up. Here’s a general example:

puppetlabs-mysql/manifests/server.pp
1
2
3
4
5
6
class mysql::server (
  $config_file             = hiera('mysql::params::config_file', $mysql::params::config_file),
  $manage_config_file      = hiera('mysql::params::manage_config_file', $mysql::params::manage_config_file),
  $old_root_password       = hiera('mysql::params::old_root_password', $mysql::params::old_root_password),
  ## Repeat the above pattern
) inherits mysql::params {

Pros:

  • Data is sought from Hiera first and then defaulted back to the params class parameter
  • Keep non-business logic (i.e. OS specific data) in the params class and business logic in Hiera
  • Added benefits of both models

Cons:

  • Where did the variable get set – Hiera or the params class? Debugging can be hard
  • Requires Hiera to be setup to use the module
  • If you fudge a variable name in Hiera, you get the params class default – see Con #1

Hiera data bindings in Puppet 3.x.x

In Puppet 3.0.0, there was a concept introduced called Data Bindings. This created a federated data model automatically incorporating a Hiera lookup. Previously, the order that Puppet would use to determine the value of a parameter was to first use a value passed with the parameterized class declaration syntax (i.e. the below:).

parameterized class declaration
1
2
3
class { 'apache':
  package_name => 'httpd',
}

If a parameter was not passed with the parameterized class syntax (like the ‘package_name’ parameter above’), Puppet would then look for a default value inside the class definition (i.e. the below:).

parameter default in a class definition
1
2
3
4
5
class ntp (
  $ntpserver = 'default.ntpserver.org'
) {
  # Use $ntpserver in a file declaration...
}

If the value of ‘ntpserver’ wasn’t passed with a parameterized class declaration, then the value would be set to ‘default.ntpserver.org’, since that’s the default set in the above class definition.

Failing both of these conditions, Puppet would throw a parse error and say that it couldn’t determine a value for a class parameter.

As of Puppet 3.0.0, Puppet will now do a Hiera lookup for the fully namespaced value of a class parameter

Roles and Profiles

The roles and profiles pattern has been written about a number of times and is ALSO considered to be ‘a best practice’ when setting up your Puppet environment. What roles and profiles gets you is a ‘wrapper class’ that allows you to declare classes within this wrapper class:

profiles/manifests/wordpress.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class profiles::wordpress {
  # Data Lookups
  $site_name               = hiera('profiles::wordpress::site_name')
  $wordpress_user_password = hiera('profiles::wordpress::wordpress_user_password')
  $mysql_root_password     = hiera('profiles::wordpress::mysql_root_password')
  $wordpress_db_host       = hiera('profiles::wordpress::wordpress_db_host')
  $wordpress_db_name       = hiera('profiles::wordpress::wordpress_db_name')
  $wordpress_db_password   = hiera('profiles::wordpress::wordpress_db_password')

  ## Create user
  group { 'wordpress':
    ensure => present,
  }
  user { 'wordpress':
    ensure   => present,
    gid      => 'wordpress',
    password => $wordpress_user_password,
    home     => '/var/www/wordpress',
  }

  ## Configure mysql
  class { 'mysql::server':
    root_password => $wordpress_root_password,
  }

  class { 'mysql::bindings':
    php_enable => true,
  }

  ## Configure apache
  include apache
  include apache::mod::php
}

## Continue with declarations...

Notice that any variables that might have business specific logic are set with Hiera lookups. These Hiera lookups do NOT have default values, which means the hiera() function will throw a parse error if a value is not returned. This is IDEAL because we WANT TO KNOW if a Hiera lookup fails – this means we failed to put the data in Hiera and should be corrected BEFORE a state that might contain invalid data is enforced with Puppet.

You then have a ‘Role’ wrapper class that simply includes many of the ‘Profile’ wrapper classes:

roles/manifests/frontend.pp
1
2
3
4
5
6
7
class roles::frontend {
  include profiles::mysql
  include profiles::apache
  include profiles::java
  include profiles::jboss
  # include more profiles...
}

The idea being that Profiles abstract all the technical bits that need to declared to setup a piece of technology, and Roles will abstract all the business logic for what pieces of technology should be installed on a certain ‘class’ of machine. Basically, you can say that “all our frontend infrastructure should have mysql, apache, java, jboss…”. In this statement, the Role is ‘frontend infrastructure’ and the Profiles are ‘mysql, apache, java, jboss…’.

Pros:

  • Hiera data lookups are confined to a wrapper class OUTSIDE of the component modules (like mysql, apache, java, etc…)
  • Data lookups for parameters containing business logic are done with Hiera
  • Non-business specific data is pulled from the module (i.e. the params class)
  • Wrapper modules can be ‘included’ with the include function, helping to eliminate multiple class declarations using the parameterized class declaration syntax
  • Component modules are backward-compatible to Puppet 2.6 while wrapper modules still get to use a modern data lookup mechanism (Hiera)
  • Component modules do NOT contain any business specific logic, which means they’re portable

Cons:

  • Hiera must be setup to use the wrapper modules
  • Wrapper modules add another debug path for variable data
  • Wrapper modules add another layer of abstraction

Data in Puppet Modules

R.I. Pienaar (the original author of MCollective, Hiera, and much more) published a blog post recently on implementing a folder for Puppet modules that Hiera can traverse when it does data lookups. This construct isn’t new, there was a feature request for this behavior filed in October of 2012 with a subsequent pull request that implemented this functionality (they’re both worth reads for further information). The pull request didn’t get merged, and so R.I. implemented the functionality inside a module on the Puppet Forge. In a nutshell, it’s a hiera.yaml configuration file INSIDE THE MODULE that implements a module-specific hierarchy, and a ‘data’ folder (also inside the module) that allows for individual YAML files that Hiera could read. This hierarchy is consulted AFTER the site-specific hiera.yaml file is read (i.e. /etc/puppet/hiera.yaml or /etc/puppetlabs/puppet/hiera.yaml), and the in-module data files are consulted AFTER the site-specific Hiera data files are read (normally found in either /etc/puppet/hieradata or /etc/puppetlabs/puppet/hieradata).

The argument here is that there’s a data store for SITE-SPECIFIC Hiera data that should be kept outside of modules, but there’s not a MODULE-SPECIFIC data store that Hiera can use. The argument isn’t whether data that should be shared with other people belongs inside a site-specific Hiera datastore (protip: it doesn’t. Data that’s not business-specific should be shared with others and kept inside the module), the argument is that it shouldn’t be locked up inside the DSL where the barrier-to-entry is learning Puppet’s DSL syntax. Whereas /etc/puppet/hiera.yaml or /etc/puppetlabs/puppet/hiera.yaml sets up the hierarchy for all your site-specific data, there’s no per-module hiera.yaml file for all module-specific data, and there’s no place to put module-specific Hiera data.

But module-specific data goes inside the params class and business-specific data goes inside Hiera, right?

Sure, but for some people the Puppet DSL is a barrier. The argument is that there should be a lower barrier to entry to contribute parameter data to Puppet that doesn’t require you to learn the syntax of if/case/selector statements in the Puppet DSL. There’s also the argument that if you want to add support for an operatingsystem to your module, you have to modify the params class file and add another entry to the if/case/selector statement – wouldn’t it be easier to just add another YAML file into a data folder that doesn’t affect existing datafiles?

Great, ANOTHER hierarchy to traverse for data – that’s going to get confusing

Well, think about it right now – most EVERY params class of EVERY module (if it supports multiple operatingsystems) does some sort of conditional logic to determine values for parameters on a per-OS basis. That’s something that you need to traverse. And many modules use different conditional data to determine what paramters to use. Look at the mysql params class example above – it not only splits on $osfamily, but it also checks specific operatingsystems. That’s a conditional inside a conditional. You’re TRAVERSING conditional data right now to find a value – the only difference is that this method doesn’t use the DSL, it uses Hiera and YAML.

Sure, but this is outside of Puppet and you’re losing visibility inside Puppet with your data

You’re already doing that if you’re using the params class. In this case, visibility is moved to YAML files instead of separate Puppet classes.

Setting it up

You will first need to install R.I.’s module from the Puppet Forge. As of this writing, it’s version 0.0.1, so ensure you have the most recent version using the puppet module tool:

[root@linux modules]# puppet module install ripienaar/module_data
Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└── ripienaar-module_data (v0.0.1)

Next, you’ll need to setup a module to use the data-in-modules pattern. Take a look at the tree of a sample module:

[root@linux modules]# tree mysql/
mysql/
├── data
│   ├── hiera.yaml
│   └── RedHat.yaml
└── manifests
    └── init.pp

I created a sample mysql module based on the examples above. All of the module’s Hiera data (including the module-specific hiera.yaml file) goes in the data folder. This module should be placed in Puppet’s modulepath – and if you don’t know where Puppet’s modulepath is set, run the puppet config face to determine that:

[root@linux modules]# puppet config print modulepath
/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules

In my case, I’m putting the module in /etc/puppetlabs/puppet/modules (since I’m running Puppet Enterprise). Here’s the hiera.yaml file from the sample mysql module:

mysql/data/hiera.yaml
1
2
:hierarchy:
  - "%{::osfamily}"

I’ve also included a YAML file for the $osfamily of RedHat:

mysql/data/RedHat.yaml
1
2
3
4
---
mysql::config_file: '/path/from/data_in_modules'
mysql::manage_config_file: true
mysql::old_root_password: 'password_from_data_in_modules'

Finally, here’s what the mysql class definition looks like from manifests/init.pp:

mysql/manifests/init.pp
1
2
3
4
5
6
7
8
9
class mysql (
  $config_file        = 'module_default',
  $manage_config_file = 'module_default',
  $old_root_password  = 'module_default'
) {
  notify { "The value of config_file: ${config_file}": }
  notify { "The value of manage_config_file: ${manage_config_file}": }
  notify { "The value of old_root_password: ${old_root_password}": }
}

Everything should be setup to notify the value of a couple of parameters. Now, to test it out…

Testing data-in-modules

Let’s include the mysql class with puppet apply to see where it’s looking for data:

[root@linux modules]# puppet apply -e 'include mysql'
Notice: The value of config_file: /path/from/data_in_modules
Notice: /Stage[main]/Mysql/Notify[The value of config_file: /path/from/data_in_modules]/message: defined 'message' as 'The value of config_file: /path/from/data_in_modules'
Notice: The value of manage_config_file: true
Notice: /Stage[main]/Mysql/Notify[The value of manage_config_file: true]/message: defined 'message' as 'The value of manage_config_file: true'
Notice: The value of old_root_password: password_from_data_in_modules
Notice: /Stage[main]/Mysql/Notify[The value of old_root_password: password_from_data_in_modules]/message: defined 'message' as 'The value of old_root_password: password_from_data_in_modules'
Notice: Finished catalog run in 0.62 seconds

Since I’m running on an operatingsystem whose family is ‘RedHat’ (i.e. CentOS), you can see that the values of all the parameters were pulled from the Hiera data files inside the module. Let’s temporarily change the $osfamily fact value and see what happens:

[root@linux modules]# FACTER_osfamily=Debian puppet apply -e 'include mysql'
Notice: The value of config_file: module_default
Notice: /Stage[main]/Mysql/Notify[The value of config_file: module_default]/message: defined 'message' as 'The value of config_file: module_default'
Notice: The value of old_root_password: module_default
Notice: /Stage[main]/Mysql/Notify[The value of old_root_password: module_default]/message: defined 'message' as 'The value of old_root_password: module_default'
Notice: The value of manage_config_file: module_default
Notice: /Stage[main]/Mysql/Notify[The value of manage_config_file: module_default]/message: defined 'message' as 'The value of manage_config_file: module_default'
Notice: Finished catalog run in 0.51 seconds

This time, when I specified a value of Debian for $osfamily, the parameter values were pulled from the declaration in the mysql class definition (i.e. from inside mysql/manifests/init.pp).

Testing outside of Puppet

One of the big pros of Hiera is that it comes with the hiera binary that can be run from the command line to test values. This works just fine for site-specific module data that’s defined in the central hiera.yaml file that’s usually defined in /etc/puppet or /etc/puppetlabs/puppet, but the data-in-modules pattern relies on a Puppet indirector to point to the current module’s data folder, and thus (as of right now) there’s not a simple way to run the hiera binary to pull data out of modules WITHOUT running Puppet. This is not a dealbreaker, and doesn’t stop anybody from hacking up something that WILL look inside modules for data, but as of right now it doesn’t yet exist. It also makes debugging for values that come out of modules a bit more difficult.

The scorecard for data-in-modules

Pros:

  • Parameters are defined in YAML and not Puppet DSL (i.e. you only need to know YAML and not the Puppet DSL)
  • Adding parameters is as simple as adding another YAML file to the module
  • Module authors provide module data that can be read by Puppet 3.x.x Hiera data bindings

Cons:

  • Must be using Puppet 3.0.0 or higher
  • Additional hierarchy and additional Hiera data file to consult when debugging
  • Not (currently) an easy/straightforward way to use the hiera binary to test values
  • Currently depends on a Puppet Forge module being installed on your system

What are you trying to say?

I am ALL ABOUT code portability, re-usability, and not building 500 apache modules. Ever since people have been building modules, they’ve been putting too much data inside modules (to the point where they can’t share them with anyone else). I can’t tell you how many times I’ve heard “We have a module for that, but I can’t share it because it has all our company-specific data in it.”

Conversely, I’ve also seen organizations put EVERYTHING in their site-specific Hiera datastore because “that’s the place for Puppet data.” They usually end up with 15+ levels in their Hiera hierarchies because they’re doing things like this:

hiera.yaml
1
2
3
4
5
6
7
8
9
10
11
12
---
:backends:
  - yaml

:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - "%{osfamily}"
  - "%{osfamily}/%{operatingsystem}"
  - "%{osfamily}/%{operatingsystem}/%{os_version_major}"
  - "%{osfamily}/%{operatingsystem}/%{os_version_minor}"
  # Repeat until you have 15 levels of WTF

This leads us back again to “What does and DOESN’T go in Hiera?” I usually say the following:

Data in site-specific Hiera datastore

  • Business-specific data (i.e. internal NTP server, VIP address, per-environment java application versions, etc…)
  • Sensitive data
  • Data that you don’t want to share with anyone else

Data that does NOT go in the site-specific Hiera datastore

  • OS-specific data
  • Data that EVERYONE ELSE who uses this module will need to know (paths to config files, package names, etc…)

Basically, if I ask you if I can publish your module to the Puppet Forge, and you object because it has business-specific or sensitive data in it, then you probably need to pull that data out of the module and put it in Hiera.

The recommendations that I give when I go on-site with Puppet users is the following:

  • Use Roles/Profiles to create wrapper-classes for class declaration
  • Do ALL Hiera lookups for site-specific data inside your ‘Profile’ wrapper classes
  • All module-specific data (like paths to config files, names of packages to install, etc…) should be kept in the module in the params class
  • All ‘Role’ wrapper classes should just include ‘Profile’ wrapper classes – nothing else

But what about Data in Modules?

I went through all the trouble of writing up the Data in Modules pattern, but I didn’t recommend or even MENTION it in the previous section. The reason is NOT because I don’t believe in it (I actually think the future will be data outside of the DSL inside a Puppet module), the reason is because it’s not YET in Puppet’s core and because it’s not YET been widely tested. If you’re an existing Puppet user that’s been looking for a way to split data outside of the DSL, here is your opportunity. Use the pattern and PLEASE report back on what you like and don’t like about it. The functionality is in a module, so it’s easy to tweak. If you’re new to Puppet and are comfortable with the DSL, then the params class exists and is available to you.

To voice your opinion or to follow the progress of data in modules, follow or comment on this Puppet ticket.

Update

R.I. posted another article on the problem with params.pp that is worth reading. He gives compelling reasons on why he built Hiera, why params.pp WORKS, but also why he believes it’s not the future of Puppet. R.I. goes even further to explain that it’s not necessarily the Puppet DSL that is the barrier to entry, it’s that this sort of data belongs in a file for config data and not INSIDE THE CODE itself (i.e. inside the Puppet DSL). Providing data inside modules gives module authors a way to provide this configuration data in files that AREN’T the Puppet DSL (i.e. not inside the code).

Who Abstracted My Ruby?

Previously, on Lost, I said a lot of words about Puppet Types; you should totally check it out. In this second installment, you’re going to find out how to actually throw pure Ruby at Puppet in a way that makes you feel accomplished. And useful. And elitist. Well, possibly just elitist. Either way, read on – there’s much thought-leadership to be done…

In the last post, we learned that Types will essentially dictate the attributes that you’ll be passing in your resource declaration using the DSL. In the simplest and crudest explanation I could muster, types model how your declaration will look in the manifest. Providers are where the actual IMPLEMENTATION happens. If you’ve ever wondered how this:

1
2
3
package { 'httpd':
  ensure => installed,
}

eventually gets turned into this:

1
yum install -e 0 -d 0 -y httpd

your answer would be “It’s in the provider file”.

Dirty black magic

I’ve seen people do the craziest shit imaginable in the Puppet DSL simply because they’re:

  • Unsure how types and providers work
  • Afraid of Ruby
  • Confused by error messages
  • Afraid to ask for help

Sometimes you have a problem that can only be solved by interacting with data that’s returned by a binary (using some binary to get a value, and then using that binary to set a value, and so on…). I see people writing defined resource types with a SHIT TON of exec statements and conditional logic to model this data when a type and provider would not only BETTER model the problem but would also be shareable and re-useable by other folk. The issue is that while the DSL is REALLY easy to get started with, types and providers still feel like dirty black magic.

The reason is because they’re dirty black magic.

Hopefully, I can help get you over the hump and onto a working implementation. Let’s take a problem I had last week:

Do this if that, and then be done

I was working with a group who wanted to set a list of domains that would bypass their web proxy for a specific network interface on an OS X workstation. It sounds so simple, because it was. Due to the amount of time I had on-site, I wrote a class with some nasty exec statements, a couple of facts, and some conditional logic because that’s what you do when you’re in a hurry…but it doesn’t make it right. When I left, I hacked up a type and provider, and it’s a GREAT example because you probably have a similar problem. Let’s look at the information we have:

The list of network interfaces:

1
2
3
4
5
6
7
8
9
└▷ networksetup -listallnetworkservices
An asterisk (*) denotes that a network service is disabled.
Bluetooth DUN
Display Ethernet
Ethernet
FireWire
Wi-Fi
iPhone USB
Bluetooth PAN

Getting the list of bypass domains for an interface:

1
2
3
4
└▷ networksetup -getproxybypassdomains Ethernet
www.garylarizza.com
*.corp.net
10.13.1.3/24

The message displayed when no domains are set for an interface:

1
2
└▷ networksetup -getproxybypassdomains FireWire
There aren't any bypass domains set on FireWire.

Setting the list of bypass domains for an interface:

1
└▷ networksetup -setproxybypassdomains Ethernet '*.corp.net' '10.13.1.3/24' 'www.garylarizza.com'

Perfect – all of that is done with a single binary, and it’s pretty straightforward. Let’s look at the type I ended up creating for this problem:

lib/puppet/type/mac_proxy_bypassdomains.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Puppet::Type.newtype(:mac_proxy_bypassdomains) do
  desc "Puppet type that models bypass domains for a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
  end

  newproperty(:domains, :array_matching => :all) do
    desc "Domains which should bypass the proxy"
    def insync?(is)
      is.sort == should.sort
    end
  end
end

The type uses a namevar parameter called ‘name’, which is the name of the network interface. This means that we can set one list of bypass domains for every network interface. There’s a single property, ‘domains’ that accepts an array of domains that should bypass the proxy for the network interface. I’ve overridden the insync? method for the domains property to sort the array values on both ends – this means that the ORDER of the domains doesn’t matter, I only care that the domains specified exist on the system. Finally, the type is ensurable (which means that we can create a list of domains and remove/destroy the list of domains for a network interface).

Setup the provider

Okay, so we’ve defined the problem, seen how to interact with the system to get us the data that we need, setup a type to model the data, and now the last thing left to do is to wire up the provider to make the binary calls we need and return the data we want.

Typos are not your friend.

The first thing you will encounter is “Puppet’s predictable naming pattern” that is used by the Puppet autoloader. Typos are not fun, and omitting a single letter in either the filename or the provider name will render your provider (emotionally) unavailable to Puppet. Our type is called ‘mac_proxy_bypassdomains’, as types are generally named along the lines of ‘what does this data model?’ The provider name is generally the name of the underlying technology that’s doing the modeling. For the package type, the providers are named after the package management systems (e.g. yum, apt, pacman, zypper, pip), for the file type, the providers are loosely named for the operatingsystem kernel type on which files are to be created (e.g. windows, posix). In our example, I simply chose to name the provider ‘ruby’ because, as a Puppet Labs employee, I TOO suck at naming things.

Here’s a tree of my module to understand how the type and provider files are to be laid out:

Module tree
1
2
3
4
5
6
7
8
9
├── Modulefile
├── README.markdown
└── lib
    └── puppet
        ├── provider
        │   ├── mac_proxy_bypassdomains
        │   │   └── ruby.rb
        └── type
            └── mac_proxy_bypassdomains.rb

As you can see from above, the name of both the type and provider must EXACTLY match the filename of their corresponding files. Also, the provider file lives in a directory named after the type. There are MANY things that can be typoed here (filenames, foldernames, type/provider names in their files), so be absolutely sure that you’ve named your files correctly.

The reason for all this naming bullshit is because of the way Puppet syncs down plugin files (coincidentally, with a process known as Pluginsync). Everything in the lib directory in a Puppet module is going to get synced down to your nodes inside the vardir directory on the node itself. The vardir is a known library path to Puppet, and all files in the vardir are treated as if they had lived in Puppet’s source code (in the same relative paths). Because the Puppet source code has all type files in the lib/puppet/type directory, all CUSTOM types must go in the module’s lib/puppet/type directory for confirmity. This is repeated for EVERY custom Puppet/Facter plugin (including custom facts, custom functions, and etc…).

More scaffolding

Let’s layout the shell of our provider, first, to ensure that we haven’t typoed anything. Here’s the provider declaration:

lib/puppet/type/mac_proxy_bypassdomains/ruby.rb
1
2
3
Puppet::Type.type(:mac_proxy_bypassdomains).provide(:ruby) do
  # Provider work goes here
end

Note that the name of the type and the name of the provider are symbolized (i.e. they’re prepended with a colon). Like I mentioned above, they must be spelled EXACT or Puppet will complain very loudly. You may see variants on that declaration line because there are multiple ways in Ruby to extend a class object. The method I’ve listed above is the ‘generally accepted best-practice’, which is to say it’s the way we’re doing it this month.

Congrats! You have THE SHELL of a provider that has yet to do a single goddamn thing! Technically, you’re further than about 90% of other Puppet users at this point! Let’s go the additional 20% (since we’re basing this on a mangement metric of 110%) by wiring up the methods and making the damn thing work!

Are you (en)sure about this?

We’ve explained before that a type is ‘ensurable’ when you can check for its existance on a system, create it when it doesn’t exist (and it SHOULD exist), and destroy it when it does exist (and it SHOULDN’T exist). The bare minimum amount of methods necessary to make a type ensurable is three, and they’re called exists?, create, and destroy.

Method: exists?

The exists? method is a predicate method – that means it should either return the boolean true or false value based on whether the bypass domain list exists. Puppet will always call the exists? provider method to determine if that ‘thing’ (in this case, ‘thing’ means ‘a list of domains to bypass for a specific network interface’) exists before calling any other methods. How do we know if this thing exists? Like I showed before, you need to run the networksetup -getproxybypassdomains command and pass the interface name. If it returns ‘There aren’t any bypass domains set on (interface name)’, then the list doesn’t exist. Let’s do some binary execution…

Calling binaries from Puppet

Puppet provides some helper syntax around basic actions that most providers perform. MOST providers are going to need to call out to an external binary (e.g. yum, apt, etc…) at some point, and so Puppet allows you to create your own method JUST for a system binary. The commands method abstracts all the dirtyness of making a method for each system binary you want to call. The way you use the commands method is like so:

1
commands :networksetup => 'networksetup'

The commands method accepts a hash whose key must be a symbolized name. The CONVENTION is to use a symbolized name that matches the binary name, but it’s not REQUIRED to do so. The value for that symbolized key MUST be the binary name. Note that I’ve not passed a full path to the binary. Why? Well, Puppet will automatically do a path lookup for that binary and store its full path for use when the binary is invoked. We don’t REQUIRE you to pass the full path because sometimes the same binary exists in different locations for different operatingsystems. Instead of creating a provider for each OS you manage with Puppet, we abstract away the path stuff. You CAN still pass a full path as a value, but if you elect to do that an the binary doesn’t exist at that path, Puppet will disqualify the provider and you’ll be quite upset.

In the event that Puppet CANNOT find this binary, it will disqualify the entire provider, and you’ll get a message saying as much in the debug output of your Puppet run. Because of that, the commands method is a good way to confine your provider to a specific system or class of system.

When the commands method is successfully invoked, you will get a new provider method named after the SYMBOLIZED key, and not necessarily the binary name (unless you made them the same). After the above command is evaluated, Puppet will now have a networksetup() method in our provider. The argument to the networksetup method should be an array of arguments that are passed to the binary. It’s c-style, so each element is going to be individually quoted. You can run into issues here if you pass values containing quotes as part of your argument array. Read that again – quoting your values is totally acceptable (e.g. [‘foo’, ‘bar’]), but passing a value that contains quotes can potentially cause problems (e.g. [“‘foo’”, “‘bar’”]).

You’re probably thinking “Why the hell would I go through this trouble when I can use the %x{} syntax in ruby to execute a shell command?!” And to that I would say “Quit yelling at me” and also “Because: testing.” When you write spec tests for your provider (which will be covered in a later blog post, since it’s its OWN path of WTF), you’re going to need to mock out calls to the system during your tests (i.e. sometimes you may be running the tests on a system that doesn’t have the binary you’re meant to be calling in your provider. You don’t want the tests to fail due to the absence of a binary file). The %x{} construct in Ruby is hard to mock out, but a method of our provider is a relatively easy thing to mock out. Also – see the path problem above. We don’t STOP you from doing %x{} in your code (it will still totally work), but we give you a couple of good reasons to NOT do it.

Objects are a provider’s best friend

Within your provider, you’re going to be doing lots of system calls and data manipulation. Often we’re asked whether you do that ugliness inside the main methods (i.e. inside the exists? method directly), or if you create a helper method for some of this data manipulation. The answer I usually give is that you should probably create a helper method if:

  • The code is going to be called more than once
  • The code does something that would be tricky to test (like reading from a file)
  • Complexity would be reduced by creating a helper method

The act of getting a list of domains for a specific interface is definitely going to be utilized in more than one place in our provider (we’ll use it in the exists? method as well as in a ‘getter’ method for the domains property). Also, you could argue that it might be tricky to test since it’s going to be a binary call that’s going to return some data. Because of this, let’s create a helper method that returns a list of domains for a specific interface:

1
2
3
4
5
6
7
8
9
10
11
def get_proxy_bypass_domains(int)
  begin
    output = networksetup(['-getproxybypassdomains', int])
  rescue Puppet::ExecutionFailure => e
    Puppet.debug("#get_proxy_bypass_domains had an error -> #{e.inspect}")
    return nil
  end
  domains = output.split("\n").sort
  return nil if domains.first =~ /There aren\'t any bypass domains set/
  domains
end

Ruby convention is to use underscores (i.e. versus camelCase or hyphens) in method names. You want to give your methods very descriptive names based on what it is that they DO. In this case, get_proxy_bypass_domains seems adequately descriptive. Also, you should err on the side of readability when you’re writing code. You can get pretty creative with Ruby metaprogramming, but that can quickly become hard to follow (and then you’re just a dick). Finally, error-handling is a good thing. If you’re going to do any error-handling, though, be very specific about the errors you catch/rescue. When you have a rescue block, make sure you catch a specific exception class (in the case above, we’re catching a Puppet::ExecutionFailure – which means the binary is returning a non-zero exit code).

The code above will return an array containing all the domains, or it will return nil if domains aren’t found or the networksetup binary had an issue.

Using the helper method above, here’s what the final exists? method looks like:

1
2
3
def exists?
  get_proxy_bypass_domains(resource[:name]) != nil
end

All provider methods have the ability to access the ‘should’ values for the resource (and by that I mean the values that are set in the Puppet maniest on the Puppet master server, or locally if you’re using puppet apply). Those values reside in the resource method that responds with a hash. In the code above, resource[:name] will return the network interface name (e.g. Ethernet, FireWire, etc…) that was specified in the Puppet manifest. The exists method will return true of a list of domains exists for an interface, or it will return false if a list of domains does not exist (i.e. get_proxy_bypass_domains returns nil).

Method: create

The create method is called when exists? returns false and a resource has an ensure value set to present. Because of this, you don’t need to call the exists? method explicitly in create – it’s already been evaluated. Remember from above that the -setproxybypassdomains argument to the networksetup binary will set a domain list, so the create method is going to be very short-and-sweet:

1
2
3
def create
  networksetup(['-setproxybypassdomains', resource[:name], resource[:domains]])
end

In the end, the create method will call the networksetup binary with the -setproxybypassdomains argument, pass the interface name (from resource[:name]) and pass an array of domain values (which comes from resource[:domains]). That’s it; it’s done!

Method: destroy

The destroy method is easier than the create method:

1
2
3
def destroy
  networksetup(['-setproxybypassdomains', nil])
end

Here, we’re calling networksetup with the -setproxybypassdomains argument and passing nothing else. This will initialize the list and set it to be empty.

Synchronizing properties

Getter method: domains

At this point our type is ensurable, which means we can create and destroy resources. What we CAN’T do, however, is change the value of any properties that are out-of-sync. A property is out-of-sync when the value discovered by Puppet on the node differs from the value in the catalog (i.e. set by the Puppet manifest using the DSL on the Puppet master). Just like exists? is called to determine if a resource exists, Puppet needs a way to get the current value for a property on a node. The method that gets this value is called the ‘getter method’ for a property, and its name must match the name of the property. Because we have a property called domains, the provider must have a domains method that returns a value (in this case, an array of domains to be bypassed by the proxy). We’ve already written a helper method that does this work for us, so the domains getter method is pretty easy:

1
2
3
def domains
  get_proxy_bypass_domains(resource[:name])
end

Tada! Just call the helper method and pass the interface name. Boom – instant array of values. The getter method will return the ‘is’ value, because that’s what the value IS (currently on the node). Get it? Anyone? The IS value is the other side of the coin to the ‘should’ value (that comes from the Puppet manifest) because that’s what the value SHOULD be set on the node.

Setter method: domains=

If the getter method (e.g. domains) returns a value that doesn’t match the value in the catalog, then Puppet changes the value on the node and sets it to the value in the catalog. It does this by calling the ‘setter’ method for the property, which is the name of the property and the equals ( = ) sign. In this case, the setter method for the domains property must be called domains=. It looks like this:

1
2
3
def domains=(value)
  networksetup(['-setproxybypassdomains', resource[:name], value])
end

Setter methods are always passed a single argument – the ‘should’ value of the property. In our example, we’re calling the networksetup binary with the -setproxybypassdomains argument, passing the name of the interface, and then passing the ‘should’ value – or the array of domains. It’s easy, it’s one line, and I love it when a plan comes together

Putting the whole damn thing together

I’ve broken down the provider line by line, but here’s the entire file:

lib/puppet/provider/mac_proxy_bypassdomains/ruby.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Puppet::Type.type(:mac_proxy_bypassdomains).provide(:ruby) do
  commands :networksetup => 'networksetup'

  def get_proxy_bypass_domains(int)
    begin
      output = networksetup(['-getproxybypassdomains', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug("#get_proxy_bypass_domains had an error -> #{e.inspect}")
      return nil
    end
    domains = output.split("\n").sort
    return nil if domains.first =~ /There aren\'t any bypass domains set/
    domains
  end

  def exists?
    get_proxy_bypass_domains(resource[:name]) != nil
  end

  def destroy
    networksetup(['-setproxybypassdomains', nil])
  end

  def create
    networksetup(['-setproxybypassdomains', resource[:name], resource[:domains]])
  end

  def domains
    get_proxy_bypass_domains(resource[:name])
  end

  def domains=(value)
    networksetup(['-setproxybypassdomains', resource[:name], value])
  end
end

Testing the type/provider

And that’s it, we’re done! The last thing to do is to test it out. You can test out your provider in one of two ways: the first is to add the module to the modulepath of your Puppet master and include it that way, or test it locally by setting the $RUBYLIB environmental variable to point to the lib directory of your module (which is the more preferred method since it won’t serve it out to all of your nodes without it being tested). Because this module is on my system at /users/glarizza/src/puppet-mac_proxy, here’s how my $RUBYLIB is set:

1
export RUBYLIB=/users/glarizza/src/puppet-mac_proxy/lib

Next, we need to create a resource declaration to try and set a couple of bypass domains. I’ll create a tests directory and simple test file in tests/mac_proxy_bypassdomains.pp:

tests/mac_proxy_bypassdomains.pp
1
2
3
4
mac_proxy_bypassdomains { 'Ethernet':
  ensure  => 'present',
  domains => ['www.garylarizza.com','*.puppetlabs.com','10.13.1.3/24'],
}

Finally, let’s run Puppet and test it out:

1
2
3
4
└▷ puppet apply ~/src/puppet-mac_proxy/tests/mac_proxy_bypassdomains.pp
Notice: Compiled catalog for satori.local in environment production in 0.06 seconds
Notice: /Stage[main]//Mac_proxy_bypassdomains[Ethernet]/domains: domains changed [] to 'www.garylarizza.com *.puppetlabs.com 10.13.1.3/24'
Notice: Finished catalog run in 3.47 seconds

NOTE: If you run this as a local user, you will be prompted by OS X to enter an administrative password for a change. Since Puppet will ultimately be run as root on OS X when we’re NOT testing out code, this shouldn’t be required during a normal Puppet run. To test this out (i.e. that you don’t always have to enter an admin password in a pop-up window), you’ll need to sudo -s to change to root, set the $RUBYLIB as the root user, and then run Puppet again.

And that’s it – looks like our code worked! To check and make sure it will notice a change, open System Preferences, then the Network pane, click on the Ethernet interface, then the Advanced button, then the Proxies tab, and finally note the ‘Bypass proxy settings…’ text box at the bottom of the screen (now do you see why we automate this shit?!). Make a change to the entries in there and run Puppet again – it should correct it for you

Wait…so that was it? Really? We’re done?

Yeah, that was a whole type and provider. Granted, it has only one property and it’s not too complicated, but that’s the point. We’ve still got some latent bugs (the network interface passed must be capitalized exactly like OS X expects it, we could do some better error handling, etc…), and the type doesn’t work with puppet resource (yet), but we’ll handle all of these things in the next blog post (or two…or three).

Until then, take this time to crack open a type and a provider for something that’s been pissing you off and FIX it! Better yet, push it up to Github, tweet about it, and post it up on The Forge so the rest of the community can use it!

Like always, feel free to comment, tweet me (@glarizza), email me (gary AT puppetlabs DOT com), or use the social media platform of choice to get a hold of me (Snapchats may or may not get a response. Maybe.) Cheers!

Fun With Puppet Providers - Part 1 of Whatever

I don’t know why I write blog posts – everybody in open-source software knows that the code IS the documentation. If you’ve ever tried to write a Puppet type/provider, you know this fact better than ANYONE. To this day, when someone asks me for the definitive source on this activity I usually refer them first to Nan Liu and Dan Bode’s awesome Types and Providers book (which REALLY is a fair bit of quality information), and THEN to the source code for Puppet. Everything else falls in-between those sources (sadly).

As someone who truly came from knowing absolute fuckall about Ruby and only marginally more than that about Puppet, I’ve walked through the valley of the shadow of self.instances and have survived to tell the tale. That’s what this post is about – hopefully some GOOD information if you want to start writing your own Puppet type and provider. I also wrote this because this knowledge has been passed down from Puppet employee to Puppet employee, and I wanted to break the priesthood being held on type and provider magic. If you don’t hear from me after tomorrow, well, then you know what happened…

Because 20 execs in a defined type…

What would drive someone to write a custom type and provider for Puppet anyhow? Afterall, you can do ANYTHING IMAGINABLE in the Puppet DSL*! After drawing back my sarcasm a bit, let me explain where the Puppet DSL tends to fall over and the idea of a custom type and provider starts becoming more than just an incredibly vivid dream:

  • You have more than a couple of exec statements in a single class/defined type that have multiple conditional properties like ‘onlyif’ and/or ‘unless’.
  • You need to use pure Ruby to manipulate data and parse it through a system binary
  • Your defined type has more conditional logic than your pre-nuptual agreement
  • Any combination of similar arguments related to the above

If the above sounds familiar to you, then you’re probably ready to build your own custom Puppet type and provider. Do note that custom types and providers are written in Ruby and not the Puppet DSL. This can initially feel very scary, but get over it (there are much scarier things coming).

* Just because you can doesn’t mean you don’t, in fact, suck.

I’m not your Type

This blog post is going to focus on types and type-interaction, while later posts will focus on providers and ultimately dirty provider tricks to win friends and influence others. Type and provider interaction can be totally daunting for newcomers, let ALONE just naming files correctly due to Puppet’s predictable (note: anytime I write the word “predictable”, just substitute the phrase “annoying pain in the ass”) naming pattern. Let’s break it down a bit for you – somebody que Dre…

(NOTE: I’m going to ASSUME you understand the fundamentals of a Puppet run already. If you’re pretty hazy on that concept, checkout docs.puppetlabs.com for more information)

Types are concerned about your looks

The type file defines all the properties and parameters that can be used by your new custom resource. Think of the type file like the opening stanza to a new Puppet class – we’re describing all the tweakable knobs and buttons to the new thing we’re creating. The type file also gives you some added validation abilities, which is very handy.

It’s important to understand that there is a BIG difference between a ‘property’ and a ‘parameter’ with regard to a type (even though they’re both assigned values identically in a resource declaration). Think of it this way: a property is something that can be inspected and changed by Puppet, while a parameter is just helper data that Puppet uses to do its job. A property would be something like a file’s mode. You can inspect a file and determine its mode, and you can even CHANGE a file’s mode on disk. The file resource type also has a parameter called ‘backup’. Its sole job is to tell Puppet whether to backup the file to the filebucket before making changes. This data is useful for Puppet during a run, but you can’t inspect a file on disk and know definitively whether Puppet is going to back it up or not (and it goes without saying that if you can’t determine this aspect about a file on disk just by inspecting it, than you also can’t CHANGE this aspect about a file on disk either). You’ll see later where the property/parameter distinction becomes very important.

Recently I built a type modeling the setting of proxy data for network interfaces on OS X, so we’ll use that as a demonstration of a type. It looks like the following:

lib/puppet/type/mac_web_proxy.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Puppet::Type.newtype(:mac_web_proxy) do
  desc "Puppet type that models a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
    munge do |value|
      value.downcase
    end
    def insync?(is)
      is.downcase == should.downcase
    end
  end

  newproperty(:proxy_server) do
    desc "Proxy Server setting for the interface"
  end

  newparam(:authenticated_username) do
    desc "Username for proxy authentication"
  end

  newparam(:authenticated_password) do
    desc "Password for proxy authentication"
  end

  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end
end

First note the type file’s path in the grey titlebar of the graphic: lib/puppet/type/mac_web_proxy.rb This path is relative to the module that you’re building, and it’s VERY important that it be named EXACTLY this way to appease Puppet’s predictable naming pattern. The name of the file directly correllates to the name of the type listed in the Puppet::Type.newtype() method.

Next, let’s look at a sample parameter declaration – for starters, let’s look at the ‘authenticated_password’ parameter declaration on line 24 of the above type. The newparam() method is called and the lone argument passed is the symbolized name of our parameter (i.e. it’s prepended with a colon). This parameter provides the password to use when setting up an authenticated web proxy on OS X. It’s a parameter because as far as I know, there’s no way for me to query the system for this password (it’s obfuscated in the GUI and I’m not entirely certain where it’s stored on-disk). If there were a way for us to query this value from the system, then we could turn it into a property (since we could both ‘GET’ as well as ‘SET’ the value). As of right now, it exists as helper data for when I need to setup an authenticated proxy.

Having seen a parameter, let’s look at the ‘proxy_server’ property that’s declared on line 16 of the type file above. We’re able to both query the system for this value, as well as change/set the value by using the networksetup binary, so it’s able to be ‘synchronized’ (according to Puppet). Because of this, it must be a property.

Just enough validation

The second major function of the type file is to provide methods to validate property and parameter data that is being passed. There are two methods to validate this data, and one method that allows you to massage the data into an acceptable format (which is called ‘munging’).

validate()

The first method, named ‘validate’, is widely believed to be the only successfully-named method in the entire Puppet codebase. Validate accepts a block and allows you to perform free-form validation in any way you prefer. For example:

lib/puppet/type/user.rb
1
2
3
validate do |value|
  raise ArgumentError, "Passwords cannot include ':'" if value.is_a?(String) and value.include?(":")
end

This example, pulled straight from the Puppet codebase, will raise an error if a password contains a colon. In this case, we’re looking for a specific exception and are raising errors accordingly.

newvalues()

The second method, named ‘newvalues’, accepts a regex that property/parameter values need to match (if you’re one of the 8 people in the world that speak regex fluently), or a list of acceptable values. From the example above:

lib/puppet/type/mac_web_proxy.rb
1
2
3
4
5
6
7
8
9
  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end

munge()

The final method, named ‘munge’ accepts a block like newvalues but allows you to convert an unacceptable value into an acceptable value. Again, this is from the example above:

lib/puppet/type/mac_web_proxy.rb
1
2
3
munge do |value|
  value.downcase
end

In this case, we want to ensure that the parameter value is lower case. It’s not necessary to throw an error, but rather it’s acceptable to ‘munge’ the value to something that is more acceptable without alerting the user.

Important type considerations

You could write half a book just on how types work (and, again, check out the book referenced above which DOES just that), but there are a couple of final considerations that will prove helpful when developing your type.

Defaulting values

The defaultto method provides a default value should the user not provide one for your property/parameter. It’s a pretty simple construct, but it’s important to remember when you write spec tests for your type (which you ARE doing, right?) that there will ALWAYS be values for properties/parameters that utilize defaultto. Here’s a quick example:

Defaultto example
1
2
3
4
newparam(:enable_lacp) do
  defaultto :true
  newvalues(:true, :false)
end

Ensurable types

A resource is considered ‘ensurable’ when its presence can be verified (i.e. it exists on the system), it can be created when it doesn’t exist and it SHOULD, and it can be destroyed when it exists and it SHOULDN’T. The simplest way to tell Puppet that a resource type is ensurable is to call the ensurable method within the body of the type (i.e. outside of any property/parameter declarations). Doing this will automatically create an ‘ensure’ property that accepts values of ‘absent’ and ‘present’ that are automatically wired to the ‘exists?’, ‘create’ and ‘destroy’ methods of the provider (something I’ll write about in the next post). Optionally, you can choose to pass a block to the ensurable method and define acceptable property values as well as the methods of the provider that are to be called. That would look something like this:

lib/puppet/type/package.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ensurable do
  newvalue(:present) do
    provider.install
  end

  newvalue(:absent) do
    provider.uninstall
  end

  newvalue(:purged) do
    provider.purge
  end

  newvalue(:held) do
    provider.hold
  end
end

This means that instead of calling the create method to create a new resource that SHOULD exist (but doesn’t), Puppet is going to call the install method. Conversely, it will call the uninstall method to destroy a resource based on this type. The ensure property will also accept values of ‘purged’ and ‘held’ which will be wired up to the purge and hold methods respectively.

Namevars are unique little snowflakes

Puppet has a concept known as the ‘namevar’ for a resource. If you’re hazy about the concept check out the documentation, but basically it’s the parameter that describes the form of uniqueness for a resource type on the system. For the package resource type, the ‘name’ parameter is the namevar because the way you tell one package from another is its name. For the file resource, it’s the ‘path’ parameter, because you can differentiate unique files from each other according to their path (and not necessarily their filename, since filenames don’t have to be unique on systems).

When designing a type, it’s important to consider WHICH parameter will be the namevar (i.e. how can you tell unique resources from one another). To make a parameter the namevar, you simply set the :namevar attribute to :true like below:

1
2
3
newparam(:name, :namevar => :true) do
  # Type declaration attributes here...
end

Handling array values

Nearly every property/parameter value that is declared for a resource is ‘stringified’, or cast to a string. Sometimes, however, it’s necessary to accept an array of elements as the value for a property/parameter. To do this, you have to explicitly tell Puppet that you’ll be passing an array by setting the :array_matching attribute to :all (if you don’t set this attribute, it defaults to :first, which means that if you pass an array as a value for a property/parameter, Puppet will only accept the FIRST element in that array).

1
2
3
newproperty(:domains, :array_matching => :all) do
  # Type declaration attributes here... 
end

If you set :array_matching to :all, EVERY value passed for that parameter/property will be cast to an array (which means if you pass a value of ‘foo’, you’ll get an array with a single element – the string of ‘foo’).

Documenting your property/parameter

It’s a best-practice to document the purpose of your property or parameter declaration, and this can be done by passing a string to the desc method within the body of the property/parameter declaration.

1
2
3
4
newproperty(:domains, :array_matching => :all) do
  desc "Domains which should bypass the proxy"
# Type declaration attributes here...
end

Synchronization tricks

Puppet uses a method called insync? to determine whether a property value is synchronized (i.e. if Puppet needs to change its value, or it’s set appropriately). You usually have no need to change the behavior of this method since most of the properties you create for a type will have string values (and the == operator does a good job of checking string equality). For structured data types like arrays and hashes, however, that can be a bit trickier. Arrays, for example, are ordered construct – they have a definitive idea of what the first element and the last element of the array are. Sometimes you WANT to ensure that values are in a very specific order, and sometimes you don’t necessarily care about the ORDER that values for a property are set – you just want to make sure that all of them are set.

If the latter cases sounds like what you need, then you’ll need to override the behavior of the insync? method. Take a look at the below example:

1
2
3
4
5
6
newproperty(:domains, :array_matching => :all) do
  desc "Domains which should bypass the proxy"
  def insync?(is)
    is.sort == should.sort
  end
end

In this case, I’ve overridden the insync? method to first sort the ‘is’ value (or, the value that was discovered by Puppet on the target node) and compare it with the sorted ‘should’ value (or, the value that was specified in the Puppet manifest when the catalog was compiled by the Puppet master). You can do WHATEVER you want in here as long as insync? returns either a true or a false value. If insync? returns true, then Puppet determines that everything is in sync and no changes are necessary, whereas if it returns false then Puppet will trigger a change.

And this was the EASY part!

Wow this went longer than I expected… and types are usually the ‘easier’ bit since you’re only describing the format to be used by the Puppet admin in manifests. There are some hacky type tricks that I’ve not yet covered (i.e. features, ‘inheritance’, and other meta-bullshit), but those will be saved for a final ‘dirty tips and tricks’ post. In the next section, I’ll touch on providers (which is where all interaction with the system takes place), so stay tuned for more brain-dumping-goodness…

From the Archive: Using Crankd

Supporting laptops in a managed environment is tricky (and doubly so if you allow them to be taken off your corporate network). While you can be reasonably assured that your desktops will remain on and connected during the workday, it’s not uncommon for laptops to go to sleep, change wireless access points, and even change between an Ethernet or AirPort connection several times during the day. It’s important to have a tool that can “tweak” certain settings in response to these changes.

This is where crankd comes in.

Crankd is a cool utility that’s part of the Pymacadmin (http://code.google.com/p/pymacadmin/) suite of tools co-authored by Chris Adams and Nigel Kersten. Specifically crankd is a Python daemon that lets you trigger shell scripts, or execute Python methods, based upon state changes in SystemConfiguration, NSWorkspace and FSEvents.

Use Cases

It’s easier to see how crankd can help you with a couple of scenarios:

  1. Your laptops, like all of the other machines in your organization, are bound to your corporate LDAP servers. When they’re on network, they will query the LDAP servers for things like authentication information. Unless your corporate LDAP directory is accessible outside your corporate network, your laptops may exhibit the “spinning wheel of death” when they attempt to contact a suddenly-unreachable LDAP directory at the neighborhood Starbucks. A solution to this is to remove the LDAP servers from your Search (and Contacts) path whenever the laptop is taken off-network and add the LDAP servers when you come back on-network.

  2. Perhaps you’re using Puppet, Munki, Chef, StarDeploy, Filewave, Absolute Manage, Casper, or any other configuration management system that needs to contact a centralized server for configuration information. Usually these tools will have your machine contact their servers once an hour or so, but this can be a problem if the machine is constantly sleeping and waking. Plus if you take your machine off-network, you don’t want it trying to contact a server that might not be reachable from the outside world. It would be nice to have your laptop “phone home” when it establishes a network connection on your corporate network, and skip this step when the laptop is taken outside your organization.

  3. OS X allows you to set a preferred order for your network connections, but it would be nice to disable the AirPort when your laptop establishes an Ethernet connection.

  4. Finally, maybe you have the need to perform an action whenever your laptop sleeps (or wakes), changes a network connection, mounts a volume, or runs a specific Application (whether it’s located in the Applications directory or anywhere else on your machine).

All of these situations can be made trivial through the help of crankd.

How do I get it working?

Crankd is a daemon, so it’s running in the background while you work. It uses an XML plist file that tells it which scripts (or which Python methods) to execute in response to specific state changes (like a network connection going up or down or a volume being mounted). Since it’s a small Python library, the files aren’t huge and the entire finished installation is around 100 Kb (or larger with your custom code/scripts). Lets download crankd and experiment with its settings:

  1. Download the Pymacadmin source. You can do this through Google Code or Github – I’ll demonstrate the Github method. Navigate to http://github.com/acdha/pymacadmin, click the Downloads button, and download either the .tar.gz or the .zip version of the source code. Drag it to your desktop and then double-click on the file to expand it. It should open a folder named “acdha-pymacadmin-

  2. Install crankd Upon opening the pymacadmin folder, you should see a series of folders, readme files, and an “install-crankd.sh” installation script. Let’s open Terminal.app and navigate to the pymacadmin folder that we expanded on our desktop (you can type “cd” into Terminal.app and then drag and drop the folder into the Terminal window. Hit the Return button on your keyboard to change to the directory.). The install-crankd.sh script is executable, so run it by typing “sudo ./install-crankd.sh” into the Terminal window and hitting Return. Enter your password when it prompts you

  3. Setup a plist file for crankd If you’ve never worked with crankd before, it’s best to let it setup a configuration plist for you. If you don’t specify a configuration plist with the “—config” argument, or you don’t have a com.googlecode.pymacadmin.crankd.plist file in your /Users//Library/Preferences folder, crankd will automatically create a sample plist for you. Let’s do that by typing “/usr/local/sbin/crankd.py” into Terminal and hitting the Return button. Take a look at the sample configuration plist file:

<?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE plist PUBLIC “–//Apple Computer//DTD PLIST 1.0//EN” “http://www.apple.com/DTDs/PropertyList-1.0.dtd”>

<key>NSWorkspace</key>
<dict>
  <key>NSWorkspaceDidMountNotification</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "A new volume was mounted!"</string>
  </dict>
  <key>NSWorkspaceDidWakeNotification</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "The system woke from sleep!"</string>
  </dict>
  <key>NSWorkspaceWillSleepNotification</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "The system is about to go to sleep!"</string>
  </dict>
</dict>
<key>SystemConfiguration</key>
<dict>
  <key>State:/Network/Global/IPv4</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "Global IPv4 config changed"</string>
  </dict>
</dict>

This XML file has two main keys – one for NSWorkspace events (such as mounted volumes and sleeping/waking your laptop), and one for SystemConfiguration events (such as network state changes). Followed by a key for the specific event that we’re monitoring, a key specifying whether we’ll be executing a command or a Python method in response to this event, and a string (or an array of strings, as we’ll see later) specifying the actual command that’s to be executed. For all of the events in the sample plist, we’re going to be echoing a message to the console.

  1. Start crankd Once crankd has been installed and your configuration plist file is setup, you’re ready to let crankd monitor for state changes. Let’s start crankd with the sample plist that was created in the previous step by executing the following command in Terminal “/usr/local/sbin/crankd.py —config=/Users//Library/Preferences/com.googlecode.pymacadmin.crankd.plist” Remember to substitute your username for in that command (if you don’t know your username, you can type “whoami” into Terminal and hit the Return button). If everything was executed correctly, you should see the following lines displayed in Terminal:

Module directory /Users//Library/Application Support/crankd does not exist: Python handlers will need to use absolute pathnames INFO: Loading configuration from /Users//Library/Preferences/com.googlecode.pymacadmin.crankd.plist INFO: Listening for these NSWorkspace notifications: NSWorkspaceWillSleepNotification, NSWorkspaceDidWakeNotification, NSWorkspaceDidMountNotification INFO: Listening for these SystemConfiguration events: State:/Network/Global/IPv4

It might look like Terminal isn’t doing anything, but in all actuality crankd is listening for changes. You can make crankd come to life by either connecting to (or disconnecting from) an AirPort network, sleeping/waking your machine, or mounting a volume (by inserting a USB memory stick, for example). Performing any of these actions will cause crankd to echo messages to your Terminal window. Here’s the message I received when I disconnected from an AirPort network:

INFO: SystemConfiguration: State:/Network/Global/IPv4: executing /bin/echo “Global IPv4 config changed” Global IPv4 config changed

To quit this sample configuration of crankd, simply hold down the control button on your keyboard and press the C key. Congratulations, crankd is now up and running!

A more complex example

Let’s look at one of our previous situations