Shit Gary Says

...things I don't want to forget

Building a Functional Puppet Workflow Part 3b: More R10k Madness

In the last workflows post, I talked about dynamic Puppet environments and introduced R10k, which is an awesome tool for mapping modules to their environments which are dynamically generated by git branches. I didn’t get out everything I wanted to say because:

  • I was tired of that post sitting stale in a Google Doc
  • It was already goddamn long

So because of that, consider this a continuation of that previous monstrosity that talks about additional uses of R10k beyond the ordinary

Let’s talk Hiera

But seriously, let’s not actually talk about what Hiera does since there are better docs out there for that. I’m also not going to talk about WHEN to use Hiera because I’ve already done that before. Instead, let’s talk about a workflow for submitting changes to Hiera data and testing it out before it enters into production.

Most people store their Hiera data (if they’re using a backend that reads Hiera data from disk anyways) in separate repos as their Puppet repo. Some DO tie the Hiera datadir folder to something like the main Puppet repo that houses their Puppetfie (if they’re using R10k), but for the most part it’s a separate repo because you may want separate permissions for accessing that data. For the purposes of this post, I’m going to refer to a repository I use for storing Hiera data that’s out on Github.

The next logical step would be to integrate that Hiera repo into R10k so R10k can track and create paths for Hiera data just like it did for Puppet.

NOTE: Fundamentally, all that R10k does is checkout modules to a specific path whose folder name comes from a git branch. PUPPET ties its environment to this folder name with some puppet.conf trickery. So, to say that R10k “creates dynamic environments” is the end-result, but not the actual job of the tool.

We COULD add Hiera’s repository to the /etc/r10k.yaml file to track and create folders for us, and if we did it EXACTLY like we did for Puppet we would most definitely run into this R10k bug (AND, it comes up again in this bug).

UPDATE: So, I originally wrote this post BEFORE R10k version 1.1.4 was released. Finch released version 1.1.4 which FIXES THESE BUGS…so the workflow I’m going to describe (i.e. using prefixing to solve the problem of using multiple repos in /etc/r10k.yaml that could possibly share branch names) TECHNICALLY does NOT need to be followed ‘to the T’, as it were. You can disable prefixing when it comes to that step, and modify /etc/puppetlabs/puppet/hiera.yaml so you don’t prepend ‘hiera_’ to the path of each environment’s folder, and you should be totally fine…you know, as long as you use version 1.1.4 or greater of R10k. So, be forewarned

The issue is those bugs is that R10k collects the names of ALL the environments from ALL the sources at once, so if you have multiple source repositories and they share branch names, then you have clashes (since it only stores ONE branch name internally). The solution that Finch came up with was prefixing (or, prefixing the name of the branch with the name of the source). When you prefix, however, it creates a folder on-disk that matches the prefixed name (e.g. NameOfTheSource_NameOfTheBranch ). This is actually fine since we’ll catch it and deal with it, but you should be aware of it. Future versions of R10k may most likely deal with this in a different manner, so make sure to check out the R10k docs before blindly copying my code, okay? (Update: See the previous, bolded paragraph where I describe how Finch DID JUST THAT).

In the previous post I setup a file called r10k_installation.pp to setup R10k. Let’s revisit that manifest it and modify it for my Hiera repo:

/var/tmp/r10k_installation.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class { 'r10k':
  version           => '1.1.4',
  sources           => {
    'puppet' => {
      'remote'  => 'https://github.com/glarizza/puppet_repository.git',
      'basedir' => "${::settings::confdir}/environments",
      'prefix'  => false,
    },
    'hiera' => {
      'remote'  => 'https://github.com/glarizza/hiera_environment.git',
      'basedir' => "${::settings::confdir}/hiera",
      'prefix'  => true,
    }
  },
  purgedirs         => ["${::settings::confdir}/environments"],
  manage_modulepath => true,
  modulepath        => "${::settings::confdir}/environments/\$environment/modules:/opt/puppet/share/puppet/modules",
}

NOTE: For the duration of this post, I’ll be referring to Puppet Enterprise specific paths (like /etc/puppetlabs/puppet for $confdir). Please do the translation for open source Puppet, as R10k will work just fine with either the open source edition or the Enterprise edition of Puppet

You’ll note that I added a source called ‘hiera’ that tracks my Hiera repository, creates sub-folders in /etc/puppetlabs/puppet/hiera, and enables prefixing to deal with the bug I mentioned in the previous paragraph. Now, let’s run Puppet and do an R10k synchronization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
[root@master1 garysawesomeenvironment]# puppet apply /var/tmp/r10k_installation.pp
Notice: Compiled catalog for master1 in environment production in 1.78 seconds
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/content: content changed '{md5}c686917fcb572861429c83f1b67cfee5' to '{md5}69d38a14b5de0d9869ebd37922e7dec4'
Notice: Finished catalog run in 1.24 seconds

[root@master1 puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_testing
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/hiera

[root@master1 puppet]# ls /etc/puppetlabs/puppet/hiera
hiera_master  hiera_production  hiera_testing

[root@master1 puppet]# ls /etc/puppetlabs/puppet/environments/
development  garysawesomeenvironment  master  production

Great, so it configured R10k to clone the Hiera repository to /etc/puppetlabs/puppet/hiera like we wanted it to, and you can see that with prefixing enabled we have folders named “hiera_${branchname}”.

In Puppet, the magical connection that maps these subfolders to Puppet environments is in puppet.conf, but for Hiera that’s the hiera.yaml file. I’ve included that file in my Hiera repo, so let’s look at the copy at /etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml:

/etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml
1
2
3
4
5
6
7
8
9
10
---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - global

:yaml:
  :datadir: '/etc/puppetlabs/puppet/hiera/hiera_%{environment}/hieradata'

The magical line is in the :datadir: setting of the :yaml: section; it uses %{environment} to evaluate the environment variable set by Puppet and set the path accordingly.

As of right now R10k is configured to clone Hiera data from a known repository to /etc/puppetlabs/puppet/hiera, to create sub-folders based on branches to that repository, and to tie data provided to each Puppet environment to the respective subfolder of /etc/puppetlabs/puppet/hiera that matches the pattern of “hiera_(environment_name)”.

The problem with hiera.yaml

You’ll notice that each subfolder to /etc/puppetlabs/puppet/hiera contains its own copy of hiera.yaml. You’re probably drawing the conclusion that each Puppet environment can read from its own hiera.yaml for Hiera configuration.

And you would be wrong.

For information on this bug, check out this link. You’ll see that we provide a ‘hiera_config’ configuration option in Puppet that allows you to specify the path to hiera.yaml, but Puppet loads that config as singleton, which means that it’s read initially when the Puppet master process starts up and it’s NOT environment-aware. The workaround is to use one hiera.yaml for all environments on a Puppet master but to dynamically change the :datadir: path according to the current environment (in the same way that dynamic Puppet environments abuse ‘$environment’ in puppet.conf). You gain the ability to have per-environment changes to Hiera data but lose the ability to do things like using different hierarchies for different environments. As of right now, if you want a different hierarchy then you’re going to need to use a different master (or do some hacky things that I don’t even want to BEGIN to approach in this article).

In summary – there will be a hiera.yaml per environment, but they will not be consulted on a per-environment basis.

Workflow for per-environment Hiera data

Looking back on the previous post, you’ll see that the workflow for updating Hiera data is identical to the workflow for updating code to your Puppet environments. Namely, to create a new environment for testing Hiera data, you will:

  • Push a branch to the Hiera repository and name it accordingly (remembering that the name you choose will be a new environment).
  • Run R10k to synchronize the data down to the Puppet master
  • Add your node to that environment and test out the changes

For existing environments, simply push changes to that environment’s branch and repeat the last two steps.

NOTE: Puppet environments and Hiera environments are linked – both tools use the same ‘environment’ concept and so environment names MUST match for the data to be shared (i.e. if you create an environment in Puppet called ‘yellow’, you will need a Hiera environment called ‘yellow’ for that data).

This tight-coupling can cause issues, and will ultimately mean that certain branches are longer-lived than others. It’s also the reason why I don’t use defaults in my hiera() lookups inside Puppet manifests – I WANT the early failure of a compilation error to alert me of something that needs fixed.

You will need to determine whether this tight-coupling is worth it for your organization to tie your Hiera repository directly into R10k or to handle it out-of-band.

R10k and monolithic module repositories

One of the first requirements you encounter when working with R10k is that your component modules need to be stored in their own repositories. That convention is still relatively new – it wasn’t so long ago that we were recommending that modules be locked away in a giant repo. Why?

  • It’s easier to clone
  • The state of module reusability was poor

The main reason was that it was easier to put everything in one repo and clone it out on all your Puppet master servers. This becomes insidious as your module count rises and people start doing lovely things like committing large binaries into modules, pulling in old versions of modules they find out on the web, and the like. It also becomes an issue when you start needing to lock committers out of specific directories due to sensitive data, and blah blah blah blah…

There are better posts out there justifying/villafying the choice of one or multiple repositories, this section’s meant only to show you how to incorporate a single repository containing multiple modules into your R10k workflow.

From the last post you’ll remember that the Puppetfile allows you to tie a repository, and some version reference, to a directory using R10k. Incorporating a monolithic repository starts with an entry in the Puppetfile like so:

Puppetfile
1
2
3
mod "my_big_module_repo",
  :git => "git://github.com/glarizza/my_big_module_repo.git",
  :ref => '1.0.0'

NOTE: That git repository doesn’t exist. I don’t HAVE a monolithic repo to demonstrate, so I’ve chosen an arbitrary URI. Also note that you can use ANY name you like after the mod syntax to name the resultant folder – it doesn’t HAVE to mirror the URI of the repository.

Adding this entry to the Puppetfile would checkout that repository to wherever all the other modules are checked out with a folder name of ‘my_big_module_repo’. Within that folder would most-likely (again, depending on how you’ve laid out your repository) contain subfolders containing Puppet modules. This entry gets the modules onto your Puppet master, but it doesn’t make Puppet aware of their location. For that, we’re going to need to add an entry to the ‘modulepath’ configuration item in puppet.conf

Inside /etc/puppetlabs/puppet/puppet.conf you should see a configuration item called ‘modulepath’ that currently has a value of:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules

The modulepath itself works like a PATH environment variable in Linux – it’s a priority-based lookup mechanism that Puppet uses to find modules. Currently, Puppet will first look in /etc/puppetlabs/puppet/environments/$environment/modules for a module. If a the module that Puppet was looking for was found, Puppet will use it and not inspect the second path. If the module was not found at the FIRST path, it will inspect the second path. Failing to find the module at the second path results in a compilation error for Puppet. Using this to our advantage, we can add the path to the monolithic repository checked-out by the Puppetfile AFTER the path to where all the individual modules are checked-out. This should look something like this:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/etc/puppetlabs/puppet/environments/$environment/modules/my_big_module_repo:/opt/puppet/share/puppet/modules

Note: This assumes all modules are in the root of the monolithic repo. If they’re in a subdirectory, you must adjust accordingly

That’s a huge line (and if you’re afraid of anything over 80 column-widths then I’m sorry…and you should probably buy a new monitor…and the 80s are over), but the gist is that we’re first going to look for modules checked out by R10k, THEN we’re going to look for modules in our monolithic repo, then we’re going to look in Puppet Enterprise’s vendored module directory, and finally, like I said above, we’ll fail if we can’t find our module. This will allow you to KEEP using your monolithic repository and also slowly cut modules inside that monolithic repo over to their own repositories (since when they gain their own repository, they will be located in a path that COMES before the monolithic repo, and thus will be given priority).

Using MCollective to perform R10k synchronizations

This section is going to be much less specific than the rest because the piece that does the ACTION is part of a module for R10k. As of the time of this writing, this agent is in one state, but that could EASILY change. I will defer to the module in question (and specifically its README file) should you need specifics (or if my module is dated). What I CAN tell you, however, is that the R10k module does come with a class that will setup and configure both an MCollective agent for R10k and also a helper application that should make doing R10k synchroniations on multiple Puppet masters much easier than doing them by hand. First, you’ll need to INSTALL the MCollective agent/application, and you can do that by pulling down the module and its dependencies, and classifying all Puppet masters with R10k enabled by doing the following:

1
include r10k::mcollective

Terribly difficult, huh? With that, both the MCollective agent and application should be available to MCollective on that node. The way to trigger a syncronization is to login to an account on a machine that has MCollective client access (in Puppet Enterprise, this would be any Puppet master that’s allowed the role, and then, specifically, the peadmin user…so doing a su - peadmin should afford you access to that user), and perform the following command:

1
mco r10k deploy

This is where the README differs a bit, and the reason for that is because Finch changed the syntax that R10k uses to synchronize and deploy modules to a Master. The CURRENTLY accepted command (because, knowing Finch, that shit might change) is r10k deploy environment -p, and the action to the MCollective agent that EXECUTES that command is the ‘deploy’ action. The README refers to the ‘synchronize’ action, which executes the r10k synchronize command. This command MAY STILL WORK, but it’s deprecated, and so it’s NOT recommended to be used.

Like I said before, this agent is subject to change (mainly do to R10k command deprecation and maturation), so definitely refer to the README and the code itself for more information (or file issues and pull requests on the module repo directly).

Tying R10k to CI workflows

I spent a year doing some presales work for the Puppet Labs SE team, so I can hand-wave and tapdance like a motherfucker. I’m going to need those skills for this next section, because if you thought the previous section glossed over the concepts pretty quickly and without much detail, then this section is going to feel downright vaporous (is that a word? Fuck it; I’m handwaving – it’s a word). I really debated whether to include the following sections in this post because I don’t really give you much specific information; it’s all very generic and full of “ideas” (though I do list some testing libraries below that are helpful if you’ve never heard of them). Feel free to abandon ship and skip to the FINAL section right now if you don’t want to hear about ‘ideas’.

For the record, I’m going to just pick and use the term “CI” when I’m referring to the process of automating the testing and deployment of, in this case, Puppet code. There have definitely been posts arging about which definition is more appropriate, but, frankly, I’m just going to pick a term and go with it,

The issue at hand is that when you talk “CI” or “CD” or “Continuous (fill_in_the_blank)”, you’re talking about a workflow that’s tailored to each organization (and sometimes each DEPARTMENT of an organization). Sometimes places can agree on a specific tool to assist them with this process (be it Jenkins, Hudson, Bamboo, or whatever), but beyond that it’s anyone’s game.

Since we’re talking PUPPET code, though, you’re restricted to certain tasks that will show up in any workflow…and THAT is what I want to talk about here.

To implement some sort of CI workflow means laying down a ‘pipeline’ that takes a change of your Puppet code (a new module, a change to an existing module, some Hiera data updates, whatever) from the developer’s/operations engineer’s workstation right into production. The way we do this with R10k currently is to:

  • Make a change to an individual module
  • Commit/push those changes to the module’s remote repository
  • Create a test branch of the puppet_repository
  • Modify the Puppetfile and tie your module’s changes to this environment
  • Commit/push those changes to the puppet_repository
  • Perform an R10k synchronization
  • Test
  • Repeat steps 1-7 as necessary until shit works how you like it
  • Merge the changes in the test branch of the puppet_repository with the production branch
  • Perform an R10k synchronization
  • Watch code changes become active in your production environment

Of those steps, there’s arguably about 3 unique steps that could be automated:

  • R10k synchronizations
  • ‘Testing’ (whatever that means)
  • Merging the changes in the test branch of the puppet_repository with the production branch

NOTE: As we get progressively-more-handwavey (also probably not a word, but fuck it – let’s be thought leaders and CREATE IT), each one of these steps is going to be more and more…generic. For example – to say “test your code” is a great idea, but, seriously, defining how to do that could (and should) be multiple blog posts.

Laying down the pipeline

If I were building an automated workflow, the first thing I would do is setup something like Jenkins and configure it to watch the puppet_repository that contains the Puppetfile mapping all my modules and versions to Puppet environments. On changes to this repository, we want Jenkins to perform an R10k synchronization, run tests, and then, possibly, merge those changes into production (depending on the quality of your tests and how ‘webscale’ you think you are on that day).

R10k synchronizations

If you’re paying attention, we solved this problem in the previous section with the R10k MCollective agent. Jenkins should be running on a machine that has the ability to execute MCollective client commands (such as triggering mco r10k deploy when necessary). You’ll want to tailor your calls from Jenkins to only deploy environments it’s currently testing (remember in the puppet_repository that topic branches map to Puppet environments, so this is a per-branch action) as opposed to deploying ALL environments every time.

Also, if you’re buiding a pipeline, you might not want to do R10k synchronizations on ALL of your Puppet Masters at this point. Why not? Well, if your testing framework is good enough and has sufficient coverage that you’re COMPLETELY trusting it to determine whether code is acceptable or not, then this is just the FIRST step – making the code available to be tested. It’s not passed tests yet, so pushing it out to all of your Puppet masters is a bit wasteful. You’ll probably want to only synchronize with a single master that’s been identified for testing (and a master that has the ability to spin up fresh nodes, enforce the Puppet code on them, submit those nodes to a battery of tests, and then tear them down when everything has been completed).

If you’re like the VAST majority of Puppet users out there that DON’T have a completely automated testing framework that has such complete coverage that you trust it to determine whether code changes are acceptable or not, then you’re probably ‘testing’ changes manually. For these people, you’ll probably want to synchronize code to whichever Puppet master(s) are suitable.

The cool thing about these scenarios is that MCollective is flexible enough to handle this. MCollective has the ability to filter your nodes based on things like available MCollective agents, Facter facts, Puppet classes, and even things like the MD5 hashes of arbitrary files on the filesystem…so however you want to restrict synchronization, you can do it with MCollective.

After all of that, the answer here is “Use MCollective to do R10k syncs/deploys.”

Testing

This section needs its own subset of blog posts. There are all kinds of tools that will allow you to test all sorts of things about your Puppet code (from basic syntax checking and linting, to integration tests that check for the presence of resources in the catalog, to acceptance-level tests that check the end-state of the system to make sure Puppet left it in a state that’s acceptable). The most common tools for these types of tests are:

Unfortunately, the point of this section is NOT to walk you through setting up one or more of those tools (I’d love to write those posts soon…), but rather to make you aware of their presence and identify where they fit in our Pipeline.

Once you’ve synchronized/deployed code changes to a specific machine (or subset of machines), the next step is to trigger tests.

Backing up the train a bit, certain kinds of ‘tests’ should be done WELL in advance of this step. For example, if code changes don’t even pass basic syntax checking and linting, they shouldn’t even MAKE it into your repository. Things like pre-commit hooks will allow you to trigger syntactical checks and linting before a commit is allowed. We’re assuming you’ve already set those up (and if you’ve NOT, then you should probably do that RIGHT NOW).

Rather, in this section, we’re talking about doing some basic integration smoke testing (i.e. running the rspec-puppet tests on all the modules to ensure that what we EXPECT in the catalog is actually IN the catalog), moving into acceptance level testing (i.e. spinning up pristine/clean nodes, actually applying the Puppet code to the nodes, and then running things like Beaker or Serverspec on the nodes to check the end-state of things like services, open ports, configuration files, and whatever to ensure that Puppet ACTUALLY left the system in a workable state), and then returning a “PASS” or “FAIL” response to Jenkins (or whatever is controlling your pipeline).

These tests can be as thorough or as loose as is acceptable to you (obviously, the goal is to automate ALL of your tests so you don’t have to manually check ANY changes, but that’s the nerd-nirvana state where we’re all browsing the web all day), but they should catch the most NOTORIOUS and OBVIOUS things FIRST. Follow the same rules you did when you got started with Puppet – catch the things that are easiest to catch and start building up your cache of “Total Time Saved.”

Jenkins needs to be able to trigger these tests from wherever it’s running, so your Jenkins box needs the ability to, say, spin up nodes in ESX, or locally with something like Vagrant, or even cloud nodes in EC2 or GCE, then TRIGGER the tests, and finally get a “PASS” or “FAIL” response back. The HARDEST part here, by far, is that you have to define what level of testing you’re going to implement, how you’re going to implement it, and devise the actual process to perform the testing. Like I said before, there are other blog posts that talk about this (and I hope to tackle this topic in the very near future), so I’ll leave it to them for the moment.

To merge or not to merge

The final step for any test code is to determine whether it should be merged into production or not. Like I said before, if your tests are sufficient and are adequate at determining whether a change is ‘good’ or not, then you can look at automating the process of merging those changes into production and killing off the test branch (or, NOT merging those changes, and leaving the branch open for more changes).

Automatically merging is scary for obvious reasons, but it’s also a good ‘test’ for your test coverage. Committing to a ‘merge upon success’ workflow takes trust, and there’s absolutely no shame in leaving this step to a human, to a change review board, or to some out-of-band process.

Use your illusion

These are the most common questions I get asked after the initial shock of R10k, and its workflow, wears off. Understand that I do these posts NOT from a “Here’s what you should absolutely be doing!” standpoint, but more from a “Here’s what’s going on out there.” vantage. Every time I’m called on-site with a customer, I evaluate:

  • The size and experience level of the team involved
  • The processes that the team must adhere to
  • The Puppet experience level of the team
  • The goals of the team

Frankly, after all those observations, sometimes I ABSOLUTELY come to the conclusion that something like R10k is entirely-too-much process for not-enough benefit. For those who are a fit, though, we go down the checklists and tailor the workflow to the environment.

What more IS there on R10k?

I do have at least a couple of more posts in me on some specific issues I’ve hit when consulting with companies using R10k, such as:

  • How best to use Hiera and R10k with Puppet ‘environments’ and internal, long-term ‘environments’
  • Better ideas on ‘what to branch and why’ with regard to component modules and the puppet_repository
  • To inherit or not to inherit with Roles
  • How to name things (note that I work for Puppet Labs, so I’m most likely very WRONG with this section)
  • Other random things I’ve noticed…

Also, I apologize if it’s been awhile since I’ve replied to a couple of comments. I’m booked out 3 months in advance and things are pretty wild at the moment, but I’m REALLY thankful of everyone who cares enough to drop a note, and I hope I’m providing some good info you can actually use! Cheers!

Comments