Shit Gary Says

...things I don't want to forget

Puppet Workflows 4: Using Hiera in Anger

Hiera. That thing nobody is REALLY quite sure how to say (FYI: It’s pronounced ‘hiera’), the tool that everyone says you should be using, and the tool that will make you hate YAML syntax errors with a passion. It’s a data/code separation dream, (potentially) a debugging nightmare, and absolutely vital in creating a Puppet workflow that scales better than your company’s Wifi strategy (FYI: your company’s Wifi password just changed. Again. Because they’re not using certificates). I’ve already written a GOOD AMOUNT on why/how to use it, but now I’m going to give you a couple of edge cases. Call them “best practices” (and I’ll cut you), but I like to call it “shit I learned after using Hiera in anger.” Here are a couple of the most popular questions I hear, and my usual responses…

“How should I setup my hierarchy?”

This is such a subjective question because it’s specific to your organization (because it’s your data). I usually ask back “What are the things about your nodes that are different, and when are they different?” Usually I hear something back like “Well, nodes in this datacenter have different DNS settings” or “Application servers in production use one version of java, and those in dev use a different version” or “All machines in the dev environment in this datacenter need to have a specific repository”. All of these replies give me ideas to your hierarchy. When you think of Hiera as a giant conditional statment, you can start seeing how your hierarchy could be laid out. With the first response, we know we need a location fact to determine where a node is, and then we can have a hierarchy level for that location. The second response tells me we need a level for the application tier (i.e. dev/test/prod). The third response tells me we need a level that combines both the location and the application tier. When you add in that you should probably have a node-specific level at the top (for overrides) and a default level at the bottom (or not: see the next section), I’m starting to picture this:

1
2
3
4
5
6
:hierarchy:
  - "nodes/%{::clientcert}"
  - "%{::location}/%{::applicationtier}"
  - "%{::location}/common"
  - "tier/%{::applicationtier}"
  - common

Every time you have a need, you consider a level. Now, obviously, it doesn’t mean that you NEED a level for every request (sometimes if it’s an edge case you can handle it in the profile or the role). There’s a performance hit for every level of your Hiera hierarchy, so ideally keep it minimal (or around 5 levels or so), but we’re talking about flexibility here, and, if that’s more important than performance then you should go for it.

Next comes ordering. This one’s SLIGHTLY easier – your hierarchy should read from most-specific to least-specific. Note that when you specify an application tier at a specific location that that it is MORE specific than just saying “all nodes in this application tier.” Sometimes you will have levels that might be hard to define an order – such as location vs. application tier. You kinda just have to go with your gut here. In many cases you may find that the data you put in those two levels will be entirely different (location-based data may not ever overlap with application-tier-specific data). Do remember than any time you change the order of your hierarchy you’re going to introduce the possibility that values get flip/flopped.

If you look at level 3 of the hierarchy above, you’ll see that I have ‘common’ at the end. Some people like this syntax (where they put a ‘common’ file in a folder that matches the fact they’re checking against), and some people prefer a filename matching the fact. Do what makes you happy, but, in this case, we can unify the location folder and just put the common file underneath the application tier files.

Finally, DO MAKE USE OF FOLDERS! For the love of god, this. Putting all files in a single folder both makes that a BIG folder, but also introduces a namespace collision (i.e. what if you have a location named ‘dev’ for example? Now you have both an application tier and a location with the same name. Oops).

How you setup your hierarchy is up to you, but this should hopefully give you somewhere to start.

Common.yaml, your organization’s common values – REVISED

UPDATE – 28 October

Previously, this section was where I presented the idea of removing the lowest level of the hierarchy as a way of ensuring that you didn’t omit a value in Hiera (the idea being that common values would be in the profile, anything higher would be in Hiera, and all your ‘defaults’, or ‘common values’ would be inside the profile). The idea of removing the lowest level of the Hiera hierarchy was always something I was kicking around in my head, but R.I. made a comment below that’s made me revise my thought process. There’s still a greater concern around definitively tracking down values pulled from Hiera, but I think we can accomplish that through other means. I’m going to revise what I wrote below to point out the relevant details.

When using Hiera, you need to define a hierarchy that Hiera uses in its search for your data. Most often, it looks something like this:

hiera.yaml
1
2
3
4
5
6
7
8
9
10
---
:backends:
  - yaml
:yaml:
  :datadir: /etc/puppetlabs/puppet/hieradata
:hierarchy:
  - "nodes/%{::clientcert}"
  - "location/%{::location}"
  - "environment/%{::applicationtier}"
  - common

Notice that little “common” at the end? That means that, failing everything else, it’s going to look in common.yaml for a value. I had thought of common as the ‘defaults’ level, but the reality is that it is a list of values common across all the nodes in your infrastructure. These are the values, SPECIFIC TO YOUR ORGANIZATION, that should be the same everywhere. Barring an override at a higher level, these values are your organization’s ‘defaults’, if you will.

Previously, you may have heard me rail against Hiera’s optional second argument and how I really don’t like it. Take this example:

1
$foo = hiera('port', '80')

Given this code, Hiera is going to look for a parameter called ‘port’ in its hierarchy, and, if it doesn’t find one in ANY of the levels, assign back a default value of ‘80’. I don’t like using this second argument because:

  1. If you forget to enter the ‘port’ parameter into the hierarchy, or typo it in the YAML file, Hiera will gladly assign the default value of ‘80’ (which, unless you’re checking for this, might sneak and get into production)
  2. Where is the real ‘default’ value: the value in common.yaml or the optional second argument?

It actually depends on where you do the hiera() call as to what ‘kind’ of default value this is. Note that previously we talked about how the ‘common’ level represented values common across your infrastructure. If you do this hiera() call inside a profile (which is where I recommend it be done), providing the optional second argument ends up being redundant (i.e. the value should be inside Hiera).

The moral of this story being: values common to all nodes should be in the lowest level of the Hiera hierarchy, and all explicit hiera calls should omit the default second argument if that common value is expected to be found in the hierarchy.

Data Bindings

In Puppet 3, we introduced the concept of ‘data bindings’ for parameterized classes, which meant that Puppet now had another choice for gathering parmeter values. Previously, the order Puppet would look to assign a value for parameters to classes was:

  1. A value passed to the class via the parameterized class syntax
  2. A default value provided by the class

As of Puppet 3, this is the new parameter assignment order:

  1. A value passed to the class via the parameterized class syntax
  2. A Hiera lookup for classname::parametername
  3. A default value provided by the class

Data bindings is meant to be pluggable to allow for ANY data backend, but, as of this writing, there’s currently only one: Hiera. Because of this, Puppet will now automatically do a Hiera lookup for every parameter to a parameterized class that isn’t explicitly passed a value via the parameterized class syntax (which means that if you just do include classname, Puppet will do a Hiera lookup for EVERY parameter defined to the “classname” class).

This is really cool because it means that you can just add classname::parametername to your Hiera setup, and, as long as you’re not EXPLICITLY passing that parameter’s value to the class, Puppet will do a lookup and find the value.

It’s also completely transparent to you unless you know it’s happening.

The issue here is that this is new functionality to Puppet, and it feels like magic to me. You can make the argument and say “If you don’t start using it, Gary, people will never take to it,” however I feel like this kind of magical lookup in the background is always going to be a bad thing.

There’s also another problem. Consider a Hiera hierarchy that has 15 levels (they exist, TRUST ME). What happens if you don’t define ANY parameters in Hiera in the form of classname::parametername and simply want to rely on the default values for every class? Well, it means that Hiera is STILL going to be triggered for every parameter to a class that isn’t explicitly passed a value. That’s a hell of a performance hit. Fortunately, there’s a way to disable this lookup. Simply add the following to the Puppet master’s puppet.conf file:

1
data_binding_terminus = none

It’s going to be up to how your team needs to work as to whether you use Hiera data bindings or not. If you have a savvy team that feels they can debug these lookups, then cool – use the hell out of it. I prefer to err on the side of an explicit hiera() lookup for every value I’m querying, even if it’s a lot of extra lines of code. I prefer the visibility, especially for new members to your team. For those people with large hierarchies, you may want to weigh the performance hit. Try to disable data bindings and see if your master is more performant. If so, then explicit hiera() calls may actually buy you some rewards.

PROS:

  • Adding parameters to Hiera in the style of classname::parametername will set parameterized class values automatically
  • Simplified code – simply use the include() function everywhere (which is safer than the parameterized class syntax)

CONS:

  • Lookup is completely transparent unless you know what’s going on
  • Debugging parameter values can be difficult (especially with typos or forgetting to set values in Hiera)
  • Performance hit for values you want to be assigned the class default value

Where to data – Hiera or Profile?

“Does this go right into the Profile or into Hiera?” I get that question repeatedly when I’m working with customers. It’s a good question, and one of the quickest ways to blow up your YAML files in Hiera. Here’s the order I use when deciding where to put data:

WHERE did that data come from?

Remember that the profile is YOUR implementation – it describes how YOU define the implementation of a piece of technology in YOUR organization. As such, it’s less about Puppet code and more about pulling data and passing it TO the Puppet code. It’s the glue-code that grabs the data and wires it up to the model that uses it. How it grabs the data is not really a big deal, so long as it grabs the RIGHT data – right? You can choose to hardcode it into the Profile, or use Hiera, or use some other magical data lookup mechanism – we don’t really care (so long as the Profile gathers the data and passes it to the correct Puppet class).

The PROBLEM here is debugging WHERE the data came from. As I said previously, Hiera has a level for all bits of data common to your organization, and, obviously, data overridden at a higher level takes precedence over the ‘common’ level at the bottom. With Hiera, unless you run the hiera binary in debug mode (-d), you can never be completely sure where the data came from. Puppet has no way of dumping out every variable and where it came from (whether Hiera or set directly in the DSL, and, if it WAS Hiera, exactly what level or file it came from).

It is THIS REASON that causes me to eschew things like data bindings in Puppet. Debugging where a value came from can be a real pain in the ass. If there were amazing tooling around this, I would 100% support using data bindings and just setting everything inside Hiera and using the include() function, but, alas, that’s not been my experience. Until then, I will continue to recommend explicit hiera calls for visibility into when Hiera is being called and when values are being set inside the DSL.

Enter the data into the Profile

One of the first choices people make is to enter the data (like ntpserver address, java version, or whatever it is) directly into the Profile. “BUT GARY! IT’S GOING TO MAKE IT HARD TO DEBUG!” Not really. You’re going to have to open the Profile anyway to see what’s going on (whether you pull the data from Hiera or hardcode it in the Profile), right? And, arguably, the Profile is legible…doing Hiera lookups gives you flexibility at a cost of abstracting away how it got that bit of data (i.e. “It used Hiera”). For newer users of Puppet, having the data in the Profile is easier to follow. So, in the end, putting the data into the Profile itself is the least-flexible and most-visible option…so consequently people consider it as the first available option. This option is good for common/default values, BUT, if you eventually want to use Hiera, you need to re-enter the data into the common level of Hiera. It also splits up your “source of truth” to include BOTH the Profile manifest and Hiera. In the end, you need to weigh your team’s goals, who has access to the Hiera repo, and how flexible you need to be with your data.

PROS:

  • Data is clearly visible and legible in the profile (no need to open additional files)

CONS:

  • Inability to redefine variables in Puppet DSL makes any settings constants by default (i.e. no overriding permitted)
  • Data outside of Hiera creates a second “source of truth”

Enter the data into Hiera

If you find that you need to have different bits of data for different nodes (i.e. a different version of Java in the dev tier instead of the prod tier), then you can look to put the data into Hiera. Where to put the data is going to depend on your own needs – I’m trusting that you can figure this part out – but the bigger piece here is that once the data is in Hiera you need to ensure you’re getting the RIGHT data (i.e. if it’s overridden at a higher level, you are certain you entered it into the right file and didn’t typo anything).

This answers that “where” question, but doesn’t answer the “what” question…as in “What data should I put into Hiera?” For that, we have another section…

PROS:

  • Flexibility in returning different values based on different conditions
  • All the data is inside one ‘source of truth’ for data according to your organization

CONS:

  • Visibility – you must do a Hiera lookup to find the value (or open Hiera’s YAML files)

“What exactly goes into Hiera?”

If there were one question that, if answered incorrectly, could make or break your Puppet deployment, this would be it. The greatest strength and weakness of Hiera is its flexibility. You can truly put almost anything in Hiera, and, when combined with something like the create_resources() function, you can create your own YAML configuration language (tip: don’t actually do this).

“But, seriously, what should go into Hiera, and what shouldn’t?”

The important thing to consider here is the price you pay by putting data into Hiera. You’re gaining flexibility at a cost of visibility. This means that you can do things like enter values at all level of the hierarchy that can be concatenated together with a single hiera_array() call, BUT, you’re losing the visibility of having the data right in front of you (i.e. you need to open up all the YAML files individually, or use the hiera binary to debug how you got those values). Hiera is REALLY COOL until you have to debug why it grabbed (or DIDN’T grab) a particular value.

Here’s what I usually tell people about what should be put into Hiera:

  • The exact data values that need to be different conditionally (i.e. a different ntp server for different sites, different java versions in dev/prod, a password hash, etc.)
  • Dynamic data expressed in multiple levels of the hierarchy (i.e. a lookup for ‘packages’ that returns back an array of all the values that were found in all the levels of the hierarchy)
  • Resources as a hash ONLY WHEN ABSOLUTELY NECESSARY

Puppet manifest vs. create_resources()

Bullets 1 and 2 above should be pretty straightforward – you either need to use Hiera to grab a specific value or return back a list of ALL the values from ALL the levels of the hierarchy. The point here is that Hiera should be returning back only the minimal amount of data that is necessary (i.e. instead of returning back a hash that contains the title of the resource, all the attributes of the resource, and all the attribute values for that resource, just return back a specific value that will be assigned to an attribute…like the password hash itself for a user). This data lookup appears to be “magic” to new users of Puppet – all they see is the magic phrase of “hiera” and a parameter to search for – and so it becomes slightly confusing. It IS, however, easier to understand that this magical phrase will return data, and that that data is going to be used to set the value for an attribute. Consider this example:

1
2
3
4
5
6
7
8
9
$password = hiera('garypassword')

user { 'gary':
  ensure   => present,
  uid      => '5001',
  gid      => 'gary',
  shell    => 'zsh',
  password => $password,
}

This leads us to bullet 3, which is “the Hiera + create_resources() solution.” This solution allows you to lookup data from within Hiera and pass it directly to a function where Puppet creates the individual resources as if you had typed them into a Puppet manifest itself. The previous example can be entered into a Hiera YAML file like so:

sysadmins.yaml
1
2
3
4
5
6
7
users:
  gary:
    ensure: 'present'
    uid: '5001'
    gid: 'gary'
    shell: 'zsh'
    password: 'biglongpasswordhash'

And then a resource can be created inside the Puppet DSL by doing the following:

1
2
$users = hiera('users')
create_resources('users')

Both examples are functionally identical, except the first one only uses Hiera to get the password hash value, whereas the second one grabs both the attributes, and their values, for a specific resource. Imagine Puppet gives you an error with the ‘gary’ user resource and you were using the latter example. You grep your Puppet code looking for ‘gary’, but you won’t find that user resource in your Puppet manifest anywhere (because it’s being created with the create_resources() function). You will instead have to know to go into Hiera’s data directory, then the correct datafile, and then look for the hash of values for the ‘gary’ user.

Functional differences between the two approaches

Functionally, you COULD do this either way. When you come up with a solution using create_resources(), I challenge you to draw up another solution using Puppet code in a Puppet manifest (however lengthy it may be) that queries Hiera for ONLY the specific values necessary. Consider this example, but, instead, you need to manage 500 users. If you use create_resources(), you would then need to add 500 more blocks to the ‘users’ parameter in your Hiera datafiles. That’s a lot of YAML. And on what level will you add these blocks? prod.yaml? dev.yaml? Are you using a common.yaml? Your YAML files suddenly got huge, and the rest of your team modifying them will not be so happy to scroll through 500 entries. Now consider the first example using Puppet code. Your Puppet manifest suddenly grew, but it didn’t affect all the OTHER manifests out there: only this file. The Hiera YAML files will still grow – but now 500 individual lines instead of 3000 lines in the previous example. Okay, now which one is more LEGIBLE? I would argue that the Puppet manifest is more legible, because I consider the Puppet DSL to be very legible (again, subject to debate versus YAML). Moreover, when debugging, you can stay inside Puppet files more often using Puppet manifests to define your resources. Using create_resources, you need to jump into Hiera more often. That’s a context shift, which adds more annoyance to debugging. Also, it creates multiple “sources of truth.” Suddenly you have the ability of entering data in Hiera as well as entering it in the Puppet manifest, which may be clear to YOU, but if you leave the company, or you get another person on your team, they may choose to abuse the Hiera settings without knowing why.

Now consider an example that you might say is more tailored to create_resources(). Say you have a defined type that sets up tomcat applications. This defined type accepts things like a path to install the application, the application’s package name, the version, which tomcat installation to target, and etc. Now consider that all application servers need application1, but only a couple of servers need application2, and a very snowflake server needs application3 (in this case, we’re NOT saying that all applications are on all boxes and that their data, like the version they’re using, is different. We’re actually saying that different machines require entirely different applications).

Using Hiera + create_resources() you could enter the resource for the application1 at a low level, then, at a higher level, add the resource for application2, and finally add the resource for application3 at the node-specific level. In the end, you can do a hiera_hash() lookup to discover and concatenate all resources from all levels of the hierarchy and pipe that to create_resources.

How would you do this with Puppet code? Well, I would create profiles for every application, and either different roles for the different kinds of servers (i.e. the snowflake machine gets its own role), or conditional checks inside the role (i.e. if this node is at the London location, it gets these application profiles, and etc…).

Now which is more legible? At this point, I’d still say that separate profiles and conditional checks in roles (or sub-roles) are more legible – including a class is a logical thing to follow, and conditionals inside Puppet code are easy to follow. The create_resources() solution just becomes magic. Suddenly, applications are on the node. If you want to know where they came from, you have to switch contexts and open Hiera data files or use the hiera binary and do a debug run. If you’re a small team that’s been using Puppet forever, then rock on and go for it. If you’re just getting started, though, I’d shy away.

Final word on create_resources?

1
2
Some people, when confronted with a problem, think “I know, I'll use create_resources()."
Now they have two problems.

The create_resources() function is often called the “PSE Swiss Army knife” (or, Professional Services Engineer – the people who do what I do and consult with our customers) because we like to break it out when we’re painted into a corner by customer requirements. It will work ANYWHERE, but, again, at that cost of visibility. I am okay with someone using it so long as they understand the cost of visibility and the potential debugging issues they’ll hit. I will always argue against using it, however, for those reasons. More code in a Puppet manifest is not a bad thing…especially if it’s reasonably legible code that can be kept to a specific class. Consider the needs and experience level of your team before using create_resources() – if you don’t have a good reason for using it, simply don’t.

create_resources()

PROS:

  • Dynamically iterate and create resources based on Hiera data
  • Using Hiera’s hash merging capability, you can functionally override resource values at higher levels of the hierarchy

CONS:

  • Decreased visibility
  • Becomes a second ‘source of truth’ to Puppet
  • Can increase confusion about WHERE to manage resources
  • When used too much, it creates a DSL to Puppet’s DSL (DSLs all the way down)

Puppet DSL + single Hiera lookup

PROS:

  • More visible (sans the bit of data you’re looking up)
  • Using wrapper classes allows for flexibility and conditional inclusion of resources/classes

CONS:

  • Very explicit – doesn’t have the dynamic overriding capability like Hiera does

Using Hiera as an ENC

One of the early “NEAT!” moments everyone has with Hiera is using it as an External Node Classifier, or ENC. There is a function called hiera_include() that allows you to include classes into the catalog as if you were to write “include (classname)” in a Puppet manifest. It works like this:

london.yaml
1
2
3
classes:
  - profiles::london::base
  - profiles::london::network
dev.yaml
1
2
classes:
  - profiles::tomcat::application2
site.pp
1
2
3
node default {
  hiera_include('classes')
}

Given the above example, the hiera_include() function will search every level of the hierarchy looking for a parameter called ‘classes’. It returns a concatenated list of classnames, which it then passes to Puppet’s include() function (in the end, Puppet will declare the profiles::london::base, profiles::london::network, and profiles::tomcat::application2 classes). Puppet puts the contents of these classes into the catalog, and away we go. This is awesome because you can change the classification of a node conditionally according to a Hiera lookup, and it’s terrible because you can CHANGE THE CLASSIFICATION OF A NODE CONDITIONALLY ACCORDING TO A HIERA LOOKUP! This means that anyone with access to the repo holding your Hiera data files can affect changes to every node in Puppet just by modifying a magical key. It also means that in order to see the classification for a node, you need to do a Hiera lookup (i.e. you can’t just open a file and see it).

Remember that WHOLE blog post about Roles and Profiles? I do, because I wrote the damn thing. You can even go back and read it again, too, if you want to. One of the core tenets of that article was that each node get classified with a single role. If you adhere to that (and you should; it makes for a much more logical Puppet deployment), a node really only ever needs to be classified ONCE. You don’t NEED this conditional classification behavior. It’s one of those “It seemed like a good idea at the time” moments that I assure you will pass.

Now, you CAN use Roles with hiera_include() – simply create a Facter fact that returns the node’s role, add a level to the Hiera hierarchy for this role fact, and in the role’s YAML file in Hiera, simply do:

appserver.yaml
1
classes: role::application_server

Then you can use the same hiera_include() call in the default node definition in site.pp. The ONLY time I recommend this is if you don’t already have some other classification method. The downside of this method is that if your role fact CHANGES, for some reason or another, classification immediately changes. Facts are NOT secure – they can be overridden really easily. I don’t like to leave classification to an insecure method that anyone with root access on a machine can change. Using an ENC or site.pp for classification means that the node ABSOLUTELY CANNOT override its classification. It’s the difference between being authoritative and simply ‘suggesting’ a classification.

PROS:

  • Dynamic classification: no need to maintain a site.pp file or group in the Console
  • Fact-based: a node’s classification can change immediately when its role fact does

CONS:

  • Decreased visibility: need to do a Hiera lookup to determine classification
  • Insecure: since facts are insecure and can be overridden, so can classification

Puppetconf 2014 Talk - the Refactor Dance

This year at Puppetconf 2014, I presented a 1.5 hour talk entitled “The Refactor Dance” that comprised nearly EVERYTHING that I’ve written about in my Puppet Workflows series (from writing better component modules, to Roles/Profiles, to Workflow, and lots of stories in-between) as well as a couple of bad words, a pair of leather pants (trousers), and an Uber story that beats your Uber story. It’s long, informative, and you get to watch the sweat stains under my arms grow in an attractive grey Puppet Labs shirt. What’s not to love?

To watch the video, click here to check it out!

On Dependencies and Order

This blog post was born out of a number of conversations that I’ve had about Puppet, its dependency model, and why ‘ordering’ is not necessarily the way to think about dependencies when writing Puppet manifests. Like most everything on this site, I’m getting it down in a file so I don’t have to repeat this all over again the next time someone asks. Instead, I can point them to this page (and, when they don’t actually READ this page, I can end up explaining everything I’ve written here anyways…).

Before we go any further, let me define a couple of terms:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
dependencies     - In a nutshell, what happens when you use the metaparameters of
                   'before', 'require', 'subscribe' or 'notify' on resources in a
                   Puppet manifest: it's a chain of resources that are to be
                   evaluted in a specific order every time Puppet runs. Any failure
                   of a resource in this chain stops Puppet from evaluating the
                   remaining resources in the chain.

evaluate         - When Puppet determines the 'is' value (or current state) of a
                   resource (i.e. for package resources, "is the package installed?")

remediate        - When Puppet determines that the 'is' value (or current state of
                   the resource) is different from the 'should' value (or the value
                   entered into the Puppet manifest...the way the resource SHOULD
                   end up looking on the system) and Puppet needs to make a change.

declarative(ish) - When I use the word 'declarative(ish)', I mean that the order
                   by which Puppet evaluates resources that do not contain dependencies
                   does not have a set procedure/order. The way Puppet EVALUATES
                   resources does not have a set procedure/order, but the order
                   that Puppet reads/parses manifest files IS from top-to-bottom
                   (which is why variables in Puppet manifests need to be declared
                   before they can be used).

Why Puppet doesn’t care about execution order (until it does)

The biggest shock to the system when getting started with a declarative (ish) configuration management tool like Puppet is understanding that Puppet describes the end-state of the machine, and NOT the order that it’s (Puppet) going to take you to that state. To Puppet, the order that it chooses to affect change in any resource (be it a file to be corrected, a package to be installed, or any other resource type) is entirely arbitrary because resources that have no relationship to another resource shouldn’t CARE about the order in which they’re evaluated and remediated.

For example, imagine Puppet is going to create both /etc/sudoers and update the system’s authorized keys file to enter all the sysadmins’ SSH keys. Which one should it do first? In an imperative system like shell scripts or a runbook-style system, you are forced to choose an order. So I ask again, which one goes first? If you try to update the sudoers file in your script first, and there’s a problem with that update, then the script fails and the SSH keys aren’t installed. If you switch the order and there’s a problem with the SSH keys, then you can’t sudo up because the sudoers file hasn’t been touched.

Because of this, Puppet has always taken the stance that if there are failures, we want to get as much of the system into a working state as possible (i.e. any resources that don’t depend upon the failing resource are going to still be evaluated, or ‘inspected’, and remediated, or ‘changed if need be’). There are definitely philosophical differences here: the argument can be made that if there’s a failure somewhere, the system is bad and you should cast it off until you’ve fixed whatever the problem is (or the part of the code causing the problem). In virtualized or ‘cloud’ environments where everything is automated, this is just fine, but in environments without complete and full automation, sometimes you have to fix and deal with what you have. Puppet “believes in your system”, which is borderline marketing-doubletalk for “alert you of errors and give you time to fix the damn thing and do another Puppet run without having to spin up a whole new system.”

Once you know WHY Puppet takes the stance it does, you realize that Puppet does not give two shits about the order of resources without dependencies. If you write perfect Puppet code, you’re fine. But the majority of the known-good-world does not do that. In fact, most of us write shit code. Which was the problem…

The history of Puppet’s ordering choices

‘Random’ random order

In the early days, the only resources that were guaranteed to have a consistent order were those resources with dependencies (i.e. as I stated above, resources that used the ‘before’, ‘require’, ‘subscribe’, or ‘notify’ metaparameters to establish an evaluation order). Every other resource was evaluted at random every time that Puppet ran…which meant that you could run Puppet ten times and, theoretically, resources without dependencies could be evaluated in a different order between every Puppet run (we call this non-deterministic ordering). This made things REALLY hard to debug. Take the case where you had a catalog of thousands of resources but you forgot a SINGLE dependency between a couple of file resources. If you roll that change out to 1000 nodes, you might have 10 or less of them fail (because Puppet chose an evaluation order that ordered these two resources incorrectly). Imagine trying to figure out what happened and replicate the problem. You could waste lots of time just trying to REPLICATE the issue, even if it was a small fix like this.

PROS:

  • IS there a pro here?

CONS:

  • Ordering could change between runs, and thus it was very hard to debug missing dependencies

Philosophically, we were correct: resources that are to be evaluated in a certain order require dependencies. Practically, we were creating more work for ourselves.

Incidentally, I’d heard that Adam Jacob, who created Chef, had cited this reason as one of the main motivators for creating Chef. I’d heard that as a Puppet consultant, he would run into these buried dependency errors and want to flip tables. Even if it’s not a true STORY, it was absolutely true for tables where I used to work…

Title-hash, ‘Predictable’ random order

Cut to Puppet version 2.7 where we introduced deterministic ordering with ‘title-hash’ ordering. In a nutshell, resources that didn’t have dependencies would still be executed in a random order, but the order Puppet chose could be replicated (it created a SHA1 hash based on the titles of the resources without dependencies, and ordered the hashes alphabetically). This meant that if you tested out a catalog on a node, and then ran that same catalog on 1000 other nodes, Puppet would choose the same order for all 1000 of the nodes. This gave you the ability to actually TEST whether your changes would successfully run in production. If you omitted a dependency, but Puppet managed to pick the correct evaluation order, you STILL had a missing dependency, but you didn’t care about it because the code worked. The next change you made to the catalog (by adding or removing resources), the order might change, but you would discover and fix the dependency at that time.

PROS:

  • ‘Predictable’ and repeatable order made testing possible

CONS:

  • Easy to miss dependency omissions if Puppet chose the right order (but do you really care?)

Manifest ordering, the ‘bath salts’ of ordering

Title-hash ordering seemed like the best of both worlds – being opinionated about resource dependencies but also giving sysadmins a reliable, and repeatable, way to test evaluation order before it’s pushed out to production.

Buuuuuuuuuut, y’all JUST weren’t happy enough, were you?

When you move from an imperative solution like scripts to a declarative(ish) solution like Puppet, it is absolutely a new way to think about modeling your system. Frequently we heard that people were having issues with Puppet because the order that resources shows up in a Puppet master WASN’T the order that Puppet would evaluate the resources. I just dropped a LOT of words explaining why this isn’t the case, but who really has the time to read up on all of this? People were dismissing Puppet too quickly because their expectations of how the tool worked didn’t align with reality. The assumption, then, was to align these expectations in the hopes that people wouldn’t dismiss Puppet so quickly.

Eric Sorenson wrote a blog post on our thesis and experimentation around manifest ordering that is worth a read (and, incidentally, is shorter than this damn post), but the short version is that we tested this theory out and determined that Manifest Ordering would help new users to Puppet. Because of this work, we created a feature called ‘Manifest Ordering’ that stated that resources that DID NOT HAVE DEPENDENCIES would be evaluated by Puppet in the order that they showed up in the Puppet manifest (when read top to bottom). If a resource truly does not have any dependencies, then you honestly should not care one bit what order it’s evaluated (because it doesn’t matter). Manifest Ordering made ordering of resources without dependencies VERY predictable.

But….

This doesn’t mean I think it’s the best thing in the world. In fact, I’m really wary of how I feel people will come to use Manifest Ordering. There’s a reason I called it the “bath salts of ordering” – because a little bit of it, when used correctly, can be a lovely thing, but too much of it, used in unintended circumstances, leads to hypothermia, paranoia, and the desire to gnaw someone else’s face off. We were/are giving you a way to bypass our dependency model by using the mental-model you had with scripts, but ALSO telling you NOT to rely on that mental-model (and instead set dependencies explicitly using metaparameters).

Seriously, what could go wrong?

Manifest Ordering is not a substitution for setting dependencies – that IS NOT what it was created for. Puppet Labs still maintains that you should use dependencies to order resources and NOT simply rely on Manifest Ordering as a form of setting dependencies! Again, the problem is that you need to KNOW this…and if Manifest Ordering allows you to keep the same imperative “mindset” inside a declarative(ish) language, then eventually you’re going to experience pain (if not today, but possibly later when you actually try to refactor code, or share code, or use this code on a system that ISN’T using Manifest Ordering). A declarative(ish) language like Puppet requires seeing your systems according to the way their end-state will look and worrying about WHAT the system will look like, and not necessarily HOW it will get there. Any shortcut to understanding this process means you’re going to miss key bits of what makes Puppet a good tool for modeling this state.

PROS:

  • Evaluation order of resources without dependencies is absolutely predictable

CONS:

  • If used as a substitution for setting dependencies, then refactoring code (moving around the order in which resources show up in a manifest) means changing the evaluation order

What should I actually take from this?

Okay, here’s a list of things you SHOULD be doing if you don’t want to create a problem for future-you or future-organization:

  • Use dependency metaparameters like ‘before’, ‘require’, ‘notify’, and ‘subscribe’ if resources in a catalog NEED to be evaluated in a particular order
  • Do not use Manifest Ordering as a substitute for explicitly setting dependencies (disable it if this is too tempting)
  • Use Roles and Profiles for a logical module layout (see: http://bit.ly/puppetworkflows2 for information on Roles and Profiles)
  • Order individual components inside the Profile
  • Order Profiles (if necessary) inside the Role

And, seriously, trust us with the explicit dependencies. It seems like a giant pain in the ass initially, but you’re ultimately documenting your infrastructure, and a dependency (or, saying ‘this thing MUST come before that thing’) is a pretty important decision. There’s a REASON behind it – treat it with some more weight other than having one line come before another line, ya know? The extra time right now is absolutely going to buy you the time you spend at home with your kids (and by ‘kids’, I mean ‘XBox’).

And don’t use bath salts, folks.

R10k + Directory Environments

If you’ve read anything I’ve posted in the past year, you know my feelings about the word ‘environments’ and about how well we tend to name things here at Puppet Labs (and if you don’t, you can check out that post here). Since then, Puppet Labs has released a new feature called directory environments (click this link for further reading) that replace the older ‘config file environments’ that we all used to use (i.e. stanzas in puppet.conf). Directory environments weren’t without their false starts and issues, but further releases of Puppet, and their inclusion in Puppet Enterprise 3.3.0, have allowed more people to ask about them. SO, I thought I’d do a quick writeup about them…

R10k had a child: Directory Environments

The Puppet platform team had a couple of problems with config file environments in puppet.conf – namely:

  • Entering them in puppet.conf meant that you couldn’t use environments named ‘master’, ‘main’, or ‘agent’
  • There was no easy/reliable way to determine all the available/used Puppet environments without making assumptions (and hacky code) – especially if someone were using R10k + dynamic environments
  • Adding more environments to puppet.conf made managing that file something of a nightmare (environments.d anyone?)

Combine this with the fact that most of the Professional Services team was rolling out R10k to create dynamic environments (which meant we were abusing $environment inside puppet.conf and creating environments…well… dynamically and on-the-fly), and they knew something needed to be done. Because R10k was so popular and widely deployed, an environment solution that was a simple step-up from an R10k deployment was made the target, and directory environments were born.

How does it work?

Directory environments, essentially, are born out of a folder on the Puppet master (typically $confdir/environments, where $confdir is /etc/puppetlabs/puppet in Puppet Enterprise) wherein every subfolder is a new Puppet environment. Every subfolder contains a couple of key items:

  • A modules folder containing all modules for that environment
  • A manifests/site.pp file containing the site.pp file for that environment
  • A new environment.conf file which can be used to set the modulepath, the environment_timeout, and, a new and often-requested feature, the ability to have environment-specific config_version settings

Basically, it’s everything that R10k ALREADY does with a couple of added goodies dropped into an environment.conf file. Feel free to read the official docs on configuring directory environments for further information on all of the goodies!

Cool, how do we set it up?

It wouldn’t be one of my blog posts if it didn’t include exact steps to configure shit, would it? For this walkthrough, I’m using a Centos 6.5 vm with DNS working (i.e. the node can ping itself and knows its own hostname and FQDN), and I’ve already installed an All-in-one installation of Puppet Enterprise 3.3.0. For the walkthrough, we’re going to setup:

  • Directory environments based on a control repo
  • Hiera data inside a hieradata folder in the control repo
  • Hiera to use the per-environment hieradata folder

Let’s start to break down the components:

The ‘Control Repo’?

Sometime between my initial R10k post and THIS post, the Puppet Labs PS team has come to call the repository that contains the Puppetfile and is used to track Puppet environments on all Puppet masters the ‘Control Repo’ (because it ‘Controls the creation of Puppet environments’, ya dig? Zack Smith and James Sweeny are actually pretty tickled about making that name stick). For the purpose of this demonstration, I’m using a repository on Github:

https://github.com/glarizza/puppet_repository

Everything you will need for this walkthrough is in that repository, and we will refer to it frequently. You DO NOT need to use my repository, and it’s definitely going to be required that you create your OWN, but it’s there for reference purposes (and to give you a couple of Puppet manifests to make setup a bit easier).

Configuring the Puppet master

We’re going to first clone my control repo to /tmp so we can use it to configure R10k and the Puppet master itself:

1
2
3
4
5
6
7
8
9
10
11
[root@master ~]# cd /tmp

[root@master /tmp]# git clone https://github.com/glarizza/puppet_repository.git
Initialized empty Git repository in /tmp/puppet_repository/.git/
remote: Counting objects: 164, done.
remote: Compressing objects: 100% (134/134), done.
remote: Total 164 (delta 54), reused 81 (delta 16)
Receiving objects: 100% (164/164), 22.68 KiB, done.
Resolving deltas: 100% (54/54), done.

[root@master /tmp]# cd puppet_repository

Great, I’ve cloned my repo. To configure R10k, we’re going to need to pull down Zack Smith’s R10k module from the forge with puppet module install zack/r10k and then use puppet apply on a manifest in my repo with puppet apply configure_r10k.pp. DO NOTE: If you want to use YOUR Control Repo, and NOT the one I use on Github, then you need to modify the configure_r10k.pp file and replace the remote property with the URL to YOUR Control Repo that’s housed on a git repository!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
[root@master /tmp/puppet_repository:production]# puppet module install zack/r10k

Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forgeapi.puppetlabs.com ...
Notice: Found at least one version of puppetlabs-stdlib compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Found at least one version of puppetlabs-inifile compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Found at least one version of puppetlabs-vcsrepo compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Found at least one version of puppetlabs-concat compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└─┬ zack-r10k (v2.2.7)
  ├─┬ gentoo-portage (v2.2.0)
  │ └── puppetlabs-concat (v1.0.3) [/opt/puppet/share/puppet/modules]
  ├── mhuffnagle-make (v0.0.2)
  ├── puppetlabs-gcc (v0.2.0)
  ├── puppetlabs-git (v0.2.0)
  ├── puppetlabs-inifile (v1.1.0) [/opt/puppet/share/puppet/modules]
  ├── puppetlabs-pe_gem (v0.0.1)
  ├── puppetlabs-ruby (v0.2.1)
  ├── puppetlabs-stdlib (v3.2.2) [/opt/puppet/share/puppet/modules]
  └── puppetlabs-vcsrepo (v1.1.0)

[root@master /tmp/puppet_repository:production]# puppet apply configure_r10k.pp

Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.71 seconds
Warning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.
   (at /opt/puppet/lib/ruby/site_ruby/1.9.1/puppet/type.rb:816:in `set_default')
Notice: /Stage[main]/R10k::Install/Package[r10k]/ensure: created
Notice: /Stage[main]/R10k::Install::Pe_gem/File[/usr/bin/r10k]/ensure: created
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/ensure: defined content as '{md5}5cda58e8a01e7ff12544d30105d13a2a'
Notice: Finished catalog run in 11.24 seconds

Performing those commands will successfully setup R10k to point to my Control Repo out on Github (and, again, if you don’t WANT that, then you need to make the change to the remote property in configure_r10k.pp). We next need to configure Directory Environments in puppet.conf by setting two attributes:

  • environmentpath (Or the path to the folder containing environments)
  • basemodulepath (Or, the set of modules that will be shared across ALL ENVIRONMENTS)

I have created a Puppet manifest that will set these attributes, and this manifest requires the puppetlabs/inifile module from the Puppet Forge. Fortunately, since I’m using Puppet Enterprise, that module is already installed. If you’re using open source Puppet and the module is NOT installed, feel free to install it by running puppet module install puppetlabs/inifile. Once this is done, go ahead and execute the manifest by running puppet apply configure_directory_environments.pp:

1
2
3
4
5
6
[root@master /tmp/puppet_repository:production]# puppet apply configure_directory_environments.pp

Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.05 seconds
Notice: /Stage[main]/Main/Ini_setting[Configure environmentpath]/ensure: created
Notice: /Stage[main]/Main/Ini_setting[Configure basemodulepath]/value: value changed '/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules' to '$confdir/modules:/opt/puppet/share/puppet/modules'
Notice: Finished catalog run in 0.20 seconds

The last step to configuring the Puppet master is to execute an R10k run. We can do that by running r10k deploy environment -pv:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@master /tmp/puppet_repository:production]# r10k deploy environment -pv

[R10K::Source::Git - INFO] Determining current branches for "https://github.com/glarizza/puppet_repository.git"
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment webinar_env
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying haproxy into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying haproxy into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

Great! Everything should be setup (if you’re using my repo)! My repository has a production branch, which is what Puppet’s default environment is named, so we can test that everything works by listing out all modules in the main production environment with the puppet module list command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[root@master /tmp/puppet_repository:production]# puppet module list

Warning: Module 'puppetlabs-stdlib' (v3.2.2) fails to meet some dependencies:
  'puppetlabs-ntp' (v3.1.2) requires 'puppetlabs-stdlib' (>= 4.0.0)
/etc/puppetlabs/puppet/environments/production/modules
├── notifyme (???)
├── profiles (???)
├── puppetlabs-apache (v1.1.1)
└── puppetlabs-ntp (v3.1.2)
/etc/puppetlabs/puppet/modules
├── gentoo-portage (v2.2.0)
├── mhuffnagle-make (v0.0.2)
├── puppetlabs-gcc (v0.2.0)
├── puppetlabs-git (v0.2.0)
├── puppetlabs-pe_gem (v0.0.1)
├── puppetlabs-ruby (v0.2.1)
├── puppetlabs-vcsrepo (v1.1.0)
└── zack-r10k (v2.2.7)
/opt/puppet/share/puppet/modules
├── puppetlabs-apt (v1.5.0)
├── puppetlabs-auth_conf (v0.2.2)
├── puppetlabs-concat (v1.0.3)
├── puppetlabs-firewall (v1.1.2)
├── puppetlabs-inifile (v1.1.0)
├── puppetlabs-java_ks (v1.2.4)
├── puppetlabs-pe_accounts (v2.0.2-3-ge71b5a0)
├── puppetlabs-pe_console_prune (v0.1.1-4-g293f45b)
├── puppetlabs-pe_mcollective (v0.2.10-15-gb8343bb)
├── puppetlabs-pe_postgresql (v1.0.4-4-g0bcffae)
├── puppetlabs-pe_puppetdb (v1.1.1-7-g8cb11bf)
├── puppetlabs-pe_razor (v0.2.1-1-g80acb4d)
├── puppetlabs-pe_repo (v0.7.7-32-gfd1c97f)
├── puppetlabs-pe_staging (v0.3.3-2-g3ed56f8)
├── puppetlabs-postgresql (v2.5.0-pe2)
├── puppetlabs-puppet_enterprise (v3.2.1-27-g8f61956)
├── puppetlabs-reboot (v0.1.4)
├── puppetlabs-request_manager (v0.1.1)
└── puppetlabs-stdlib (v3.2.2)  invalid

Notice a couple of things:

  • First, I’ve got some dependency issues…oh well, nothing that’s a game-stopper
  • Second, the path to the production environment’s module is correct at: /etc/puppetlabs/puppet/environments/production/modules

Configuring Hiera

The last dinghy to be configured on this dreamboat is Hiera. Hiera is Puppet’s data lookup mechanism, and is used to gather specific bits of data (such as versions of packages, hostnames, passwords, and other business-specific data). Explaining HOW Hiera works is beyond the scope of this article, but configuring Hiera data on a per-environment basis IS absolutely a worthwhile endeavor.

In this example, I’m going to demonstrate coupling Hiera data with the Control Repo for simple replication of Hiera data across environments. You COULD also choose to put your Hiera data in a separate repository and set it up in /etc/r10k.yaml as another source, but that exercise is left to the reader (and if you’re interested, I talk about it in this post).

You’ll notice that my demonstration repository ALREADY includes Hiera data, and so that data is automatically being replicated to all environments. By default, Hiera’s configuration file (hiera.yaml) has no YAML data directory specified, so we’ll need to make that change. In my demonstration control repository, I’ve included a sample hiera.yaml, but let’s take a look at one below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
## /etc/puppetlabs/puppet/hiera.yaml

---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{application_tier}"
  - common

:yaml:
# datadir is empty here, so hiera uses its defaults:
# - /var/lib/hiera on *nix
# - %CommonAppData%\PuppetLabs\hiera\var on Windows
# When specifying a datadir, make sure the directory exists.
  :datadir: "/etc/puppetlabs/puppet/environments/%{environment}/hieradata"

This hiera.yaml file specifies a hierarchy with three levels – a node-specific, level, a level for different application tiers (like ‘dev’, ‘test’, ‘prod’, and etc), and finally makes the change we need: mapping the data directory to each environment’s hieradata folder. The path to hiera.yaml is Puppet’s configuration directory (which is /etc/puppetlabs/puppet for Puppet Enterprise, or /etc/puppet for the open source version of Puppet), so open the file there, make your changes, and finally you’ll need to need to restart the Puppet master service to have the changes picked up.

Next, let’s perform a test by executing the hiera binary from the command line before running puppet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=production
This node is using common data

[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=webinar_env -d
DEBUG: 2014-08-31 19:55:44 +0000: Hiera YAML backend starting
DEBUG: 2014-08-31 19:55:44 +0000: Looking up message in YAML backend
DEBUG: 2014-08-31 19:55:44 +0000: Looking for data source common
DEBUG: 2014-08-31 19:55:44 +0000: Found message in common
This node is using common data

[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=bad_env -d
DEBUG: 2014-08-31 19:58:22 +0000: Hiera YAML backend starting
DEBUG: 2014-08-31 19:58:22 +0000: Looking up message in YAML backend
DEBUG: 2014-08-31 19:58:22 +0000: Looking for data source common
DEBUG: 2014-08-31 19:58:22 +0000: Cannot find datafile /etc/puppetlabs/puppet/environments/bad_env/hieradata/common.yaml, skipping
nil

You can see that for the first example, I passed the environment of production and did a simple lookup for a key called message – Hiera then returned me the value of out that environment’s common.yaml file. Next, I did another lookup, but added -d to enable debug mode (debug mode on the hiera binary is REALLY handy for debugging problems with Hiera – combine it with specifying values from the command line, and you can pretty quickly simulate what value a node is going to get). Notice the last example where I specified an invalid environment – Hiera logged that it couldn’t find the datafile requested and ultimately returned a nil, or empty, value.

Since we’re working on the Puppet master machine, we can even check for a value using puppet apply combined with the notice function:

1
2
3
4
[root@master /etc/puppetlabs/puppet/environments]# puppet apply -e "notice(hiera('message'))"
Notice: Scope(Class[main]): This node is using common data
Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.09 seconds
Notice: Finished catalog run in 0.19 seconds

Great, it’s working, but let’s look at pulling data from a higher level in the hierarchy – like from the application_tier level. We haven’t defined an application_tier fact, however, so we’ll need to fake it. First, let’s do that with the hiera binary:

1
2
3
4
5
6
[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=production application_tier=dev -d
DEBUG: 2014-08-31 20:04:12 +0000: Hiera YAML backend starting
DEBUG: 2014-08-31 20:04:12 +0000: Looking up message in YAML backend
DEBUG: 2014-08-31 20:04:12 +0000: Looking for data source dev
DEBUG: 2014-08-31 20:04:12 +0000: Found message in dev
You are in the development application tier

And then also with puppet apply:

1
2
3
4
[root@master /etc/puppetlabs/puppet/environments]# FACTER_application_tier=dev puppet apply -e "notice(hiera('message'))"
Notice: Scope(Class[main]): You are in the development application tier
Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.09 seconds
Notice: Finished catalog run in 0.18 seconds

Tuning environment.conf

The brand-new, per-environment environment.conf file is meant to be (for the most part) a one-stop-shop for your Puppet environment tuning needs. Right now, the only things you’ll need to tune will be the modulepath, config_version, and possibly the environment_timeout.

Module path

Before directory environments, every environment had its own modulepath that needed to be tuned to allow for modules that were to be used by this machine/environment, as well as shared modules. That modulepath worked like $PATH in that it was a priority-based lookup for modules (i.e. the first directory in modulepath that had a module matching the module name you wanted won). It also previously required the FULL path to be used for every path in modulepath.

Those days are over.

As I mentioned before, the main puppet.conf configuration file has a new parameter called basemodulepath that can be used to specify modules that are to be shared across ALL modules in ALL environments. Paths defined here (typically $confdir/modules and /opt/puppet/share/puppet/modules) are usually put at the END of a modulepath so Puppet can search for any overridden modules that show up in earlier modulepath paths. In the previous configuration steps, we executed a manifest that setup basemodulepath to look like:

1
basemodulepath = $confdir/modules:/opt/puppet/share/puppet/modules

Again, feel free to add or remove paths (except don’t remove /opt/puppet/share/puppet/modules if you’re using Puppet Enterprise, because that’s where all Puppet Enterprise modules are located), especially if you’re using a giant monolithic repo of modules (which was typically done before things like R10k evolved).

With basemodulepath configured, it’s now time to configure the modulepath to be defined for every environment. My demonstration control repo contains a sample environment.conf that defines a modulepath like so:

1
modulepath = modules:$basemodulepath

You’ll notice, now, that there are relative paths in modulepath. This is possible because now each environment contains an environment.conf, and thus relative paths make sense. In this example, nodes in the production environment (/etc/puppetlabs/puppet/environments/production) will look for a module by its name FIRST by looking in a folder called modules inside the current environment folder (i.e. /etc/puppetlabs/puppet/environments/production/modules/<module_name>). If the module wasn’t found there, it looks for the module in the order that paths are defined for basemodulepath above. If Puppet fails to find a module in ANY of the paths, a compile error is raised.

Per-environment config_version

Setting config_version has been around for awhile – hell, I remember video of Jeff McCune talking about it at the first Puppetcamp Europe in like 2010 – but the new directory environments implementation has fine tuned it a bit. Previously, config_version was a command executed on the Puppet master at compile time to determine a string used for versioning the configuration enforced during that Puppet run. When it’s not set it defaults to something of a time/date stamp off the parser, but it’s way more useful to make it do something like determine the most recent commit hash from a repository.

In the past when we used a giant monolithic repository containing all Puppet modules, it was SUPER easy to get a single commit hash and be done. As everyone moved their modules into individual repositories, determining WHAT you were enforcing became harder. With the birth of R10k an the control repo, we suddenly had something we could query for the state of our modules being enforced. The problem existed, though, that with multiple dynamic environments using multiple git branches, config_version wasn’t easily tuned to be able to grab the most recent commit from every branch.

Now that config_version is set in a per-environment environment.conf, we can make config_version much smarter. Again, looking in the environment.conf defined in my demonstration control repo produces this:

1
config_version = '/usr/bin/git --git-dir $confdir/environments/$environment/.git rev-parse HEAD'

This setting will cause the Puppet master to produce the most recent commit ID for whatever environment you’re in and embed it in the catalog and the report that is sent back to the Puppet master after a Puppet run.

I actually discovered a bug in config_version while writing this post, and it’s that config_version is subject to the relative pathing fun that other environment.conf settings are subject to. Relative pathing is great for things like modulepath, and it’s even good for config_version if you’re including the script you want to run to gather the config_version string inside the control repo, but using a one-line command that tries to execute a binary on the system that DOESN’T include the full path to the binary causes an error (because Puppet attempts to look for that binary in the current environment path, and NOT by searching $PATH on the system). Feel free to follow or comment on the bug if the mood hits you.

Caching and environment_timeout

The Puppet master loads environments on-request, but it also caches data associated with each environment to make things faster. This caching is finally tunable on a per-environment basis by defining the environment_timeout setting in environment.conf. The default setting is 3 minutes, which means the Puppet master will invalidate its caches and reload environment data every 3 minutes, but that’s now tunable. Definitely read up on this setting before making changes.

Classification

One of the last new features of directory environments is the ability to include an environment-specific site.pp file for classification. You could ALWAYS do this by modifying the manifest configuration item in puppet.conf, but now each environment can have its own manifest setting. The default behavior is to have the Puppet master look for manifests/site.pp in every environment directory, and I really wouldn’t change that unless you have a good reason. DO NOTE, however, that if you’re using Puppet Enterprise, you’ll need to be careful with your site.pp file. Puppet Enterprise defines things like the Filebucket and overrides for the File resource in site.pp, so if you’re using Puppet Enterprise, you’ll need to copy those changes into the site.pp file you add into your control repo (as I did).

It may take you a couple of times to change your thinking from looking at the main site.pp in $confdir/manifests to looking at each environment-specific site.pp file, but definitely take advantage of Puppet’s commandline tool to help you track which site.pp Puppet is monitoring:

1
2
3
4
5
[root@master /etc/puppetlabs/puppet/environments]# puppet config print manifest
/etc/puppetlabs/puppet/environments/production/manifests

[root@master /etc/puppetlabs/puppet/environments]# puppet config print manifest --environment webinar_env
/etc/puppetlabs/puppet/environments/webinar_env/manifests

You can see that puppet config print can be used to get the path to the directory that contains site.pp. Even cooler is what happens when you specify an environment that doesn’t exist:

1
2
[root@master /etc/puppetlabs/puppet/environments]# puppet config print manifest --environment bad_env
no_manifest

Yep, Puppet tells you if it can’t find the manifest file. That’s pretty cool.

Wrapping Up

Even though the new implementation of directory environments is meant to map closely to a workflow most of us have been using (if you’ve been using R10k, that is), there are still some new features that may take you by surprise. Hopefully this post gets you started with just enough information to setup your own test environment and start playing. PLEASE DO make sure to file bugs on any behavior that comes as unexpected or stops you from using your existing workflow. Cheers!

On R10k and ‘Environments’

There have been more than a couple of moments where I’m on-site with a customer who asks a seemingly simple question and I’ve gone “Oh shit; that’s a great question and I’ve never thought of that…” Usually that’s followed by me changing up the workflow and immediately regretting things I’ve done on prior gigs. Some people call that ‘agile’; I call it ‘me not having the forethought to consider conditions properly’.

‘Environment’, like ‘scaling’, ‘agent’, and ‘test’, has many meanings

It’s not a secret that we’ve made some shitty decisions in the past with regard to naming things in Puppet (and anyone who asks me what puppet agent -t stands for usually gets a heavy sigh, a shaken head, and an explanation emitted in dulcet, apologetic tones). It’s also very easy to conflate certain concepts that unfortunately share very common labels (quick – what’s the difference between properties and parameters, and give me the lowdown on MCollective agents versus Puppet agents!).

And then we have ‘environments’ + Hiera + R10k.

Puppet ‘environments’

Puppet has the concept of ‘environments’, which, to me, exist to provide a means of compiling a catalog using different paths to Puppet modules on the Puppet master. Using a Puppet environment is the same as saying “I made some changes to my tomcat class, but I don’t want to push it DIRECTLY to my production machines yet because I don’t drink Dos Equis. It would be great if I could stick this code somewhere and have a couple of my nodes test how it works before merging it in!”

Puppet environments suffer some ‘seepage’ issues, which you can read about here, but do a reasonable job of quickly testing out changes you’ve made to the Puppet DSL (as opposed to custom plugins, as detailed in the bug). Puppet environments work well when you need a pipeline for testing your Puppet code (again, when you’re refactoring or adding new functionality), and using them for that purpose is great.

Internal ‘environments’

What I consider ‘internal environments’ have a couple of names – sometimes they’re referred to as application or deployment gateways, sometimes as ‘tiers’, but in general they’re long-term groupings that machines/nodes are attached to (usually for the purpose of phased-out application deployments). They frequently have names such as ‘dev’, ‘test’, ‘prod’, ‘qa’, ‘uat’, and the like.

For the purpose of distinguishing them from Puppet environments, I’m going to refer to them as ‘application tiers’ or just ‘tiers’ because, fuck it, it’s a word.

Making both of them work

The problems with having Puppet environments and application tiers are:

  • Puppet environments are usually assigned to a node for short periods of time, while application tiers are usually assigned to a node for the life of the node.
  • Application tiers usually need different bits of data (i.e. NTP server addresses, versions of packages, etc), while Puppet environments usually use/involve differences to the Puppet DSL.
  • Similarly to the first point, the goal of Puppet environments is to eventually merge code differences into the main production Puppet environment. Application tiers, however, may always have differences about them and never become unified.

You can see where this would be problematic – especially when you might want to do things like use different Hiera values between different application tiers, but you want to TEST out those values before applying them to all nodes in an application tier. If you previously didn’t have a way to separate Puppet environments from application tiers, and you used R10k to generate Puppet environments, you would have things like long-term branches in your repositories that would make it difficult/annoying to manage.

NOTE: This is all assuming you’re managing component modules, Hiera data, and Puppet environments using R10k.

The first step in making both monikers work together is to have two separate variables in Puppet – namely $environment for Puppet environments, and something ELSE (say, $tier) for the application tier. The “something else” is going to depend on how your workflow works. For example, do you have something centrally that can correlate nodes to the tier in which they belong? If so, you can write a custom fact that will query that service. If you don’t have this magical service, you can always just attach an application tier to a node in your classification service (i.e. the Puppet Enterprise Console or Foreman). Failing both of those, you can look to external facts. External Fact support was introduced into Facter 1.7 (but Puppet Enterprise has supported them through the standard lib for quite awhile). External facts give you the ability to create a text file inside the facts.d directory in the format of:

1
2
tier=qa
location=portland

Facter will read this text file and store the values as facts for a Puppet run, so $tier will be qa and $location will be portland. This is handy for when you have arbitrary information that can’t be easily discovered by the node, but DOES need to be assigned for the node on a reasonably consistent basis. Usually these files are created during the provisioning process, but can also be managed by Puppet. At any rate, having $environment and $tier available allow us to start to make decisions based on the values.

Branch with $environment, Hiera with $tier

Like we said above, Puppet environments are frequently short-term assignments, while application tiers are usually long-term residencies. Relating those back to the R10k workflow: branches to the main puppet repo (containing the Puppetfile) are usually short-lived, while data in Hiera is usually longer-lived. It would then make sense that the name of the branches to the main puppet repo would resolve to being $environment (and thus the Puppet environment name), and $tier (and thus the application tier) would be used in the Hiera hierarchy for lookups of values that would remain different across application tiers (like package versions, credentials, and etc…).

Wins:

  • Puppet environment names (like repository branch names) become relatively meaningless and are the “means” to the end of getting Puppet code merged into the PUPPET CODE’s production branch (i.e. code that has been tested to work across all application tiers)
  • Puppet environments become short lived and thus have less opportunity to deviate from the main production codebase
  • Differences across application tiers are locked in one place (Hiera)
  • Differences to Puppet DSL code (i.e. in Manifests) can be pushed up to the profile level, and you have a fact ($tier) to catch those differences.

The ultimate reason why I’m writing about this is because I’ve seen people try to incorporate both the Puppet environment and application tier into both the environment name and/or the Hiera hierarchy. Many times, they run into all kinds of unscalable issues (large hierarchies, many Puppet environments, confusing testing paths to ‘production’). I tend to prefer this workflow choice, but, like everything I write about, take it and model it toward what works for you (because what works now may not work 6 months from now).

Thoughts?

Like I said before, I tend to discover new corner cases that change my mind on things like this, so it’s quite possible that this theory isn’t the most solid in the world. It HAS helped out some customers to clean up their code and make for a cleaner pipeline, though, and that’s always a good thing. Feel free to comment below – I look forward to making the process better for all!

Building a Functional Puppet Workflow Part 3b: More R10k Madness

In the last workflows post, I talked about dynamic Puppet environments and introduced R10k, which is an awesome tool for mapping modules to their environments which are dynamically generated by git branches. I didn’t get out everything I wanted to say because:

  • I was tired of that post sitting stale in a Google Doc
  • It was already goddamn long

So because of that, consider this a continuation of that previous monstrosity that talks about additional uses of R10k beyond the ordinary

Let’s talk Hiera

But seriously, let’s not actually talk about what Hiera does since there are better docs out there for that. I’m also not going to talk about WHEN to use Hiera because I’ve already done that before. Instead, let’s talk about a workflow for submitting changes to Hiera data and testing it out before it enters into production.

Most people store their Hiera data (if they’re using a backend that reads Hiera data from disk anyways) in separate repos as their Puppet repo. Some DO tie the Hiera datadir folder to something like the main Puppet repo that houses their Puppetfie (if they’re using R10k), but for the most part it’s a separate repo because you may want separate permissions for accessing that data. For the purposes of this post, I’m going to refer to a repository I use for storing Hiera data that’s out on Github.

The next logical step would be to integrate that Hiera repo into R10k so R10k can track and create paths for Hiera data just like it did for Puppet.

NOTE: Fundamentally, all that R10k does is checkout modules to a specific path whose folder name comes from a git branch. PUPPET ties its environment to this folder name with some puppet.conf trickery. So, to say that R10k “creates dynamic environments” is the end-result, but not the actual job of the tool.

We COULD add Hiera’s repository to the /etc/r10k.yaml file to track and create folders for us, and if we did it EXACTLY like we did for Puppet we would most definitely run into this R10k bug (AND, it comes up again in this bug).

UPDATE: So, I originally wrote this post BEFORE R10k version 1.1.4 was released. Finch released version 1.1.4 which FIXES THESE BUGS…so the workflow I’m going to describe (i.e. using prefixing to solve the problem of using multiple repos in /etc/r10k.yaml that could possibly share branch names) TECHNICALLY does NOT need to be followed ‘to the T’, as it were. You can disable prefixing when it comes to that step, and modify /etc/puppetlabs/puppet/hiera.yaml so you don’t prepend ‘hiera_’ to the path of each environment’s folder, and you should be totally fine…you know, as long as you use version 1.1.4 or greater of R10k. So, be forewarned

The issue is those bugs is that R10k collects the names of ALL the environments from ALL the sources at once, so if you have multiple source repositories and they share branch names, then you have clashes (since it only stores ONE branch name internally). The solution that Finch came up with was prefixing (or, prefixing the name of the branch with the name of the source). When you prefix, however, it creates a folder on-disk that matches the prefixed name (e.g. NameOfTheSource_NameOfTheBranch ). This is actually fine since we’ll catch it and deal with it, but you should be aware of it. Future versions of R10k may most likely deal with this in a different manner, so make sure to check out the R10k docs before blindly copying my code, okay? (Update: See the previous, bolded paragraph where I describe how Finch DID JUST THAT).

In the previous post I setup a file called r10k_installation.pp to setup R10k. Let’s revisit that manifest it and modify it for my Hiera repo:

/var/tmp/r10k_installation.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class { 'r10k':
  version           => '1.1.4',
  sources           => {
    'puppet' => {
      'remote'  => 'https://github.com/glarizza/puppet_repository.git',
      'basedir' => "${::settings::confdir}/environments",
      'prefix'  => false,
    },
    'hiera' => {
      'remote'  => 'https://github.com/glarizza/hiera_environment.git',
      'basedir' => "${::settings::confdir}/hiera",
      'prefix'  => true,
    }
  },
  purgedirs         => ["${::settings::confdir}/environments"],
  manage_modulepath => true,
  modulepath        => "${::settings::confdir}/environments/\$environment/modules:/opt/puppet/share/puppet/modules",
}

NOTE: For the duration of this post, I’ll be referring to Puppet Enterprise specific paths (like /etc/puppetlabs/puppet for $confdir). Please do the translation for open source Puppet, as R10k will work just fine with either the open source edition or the Enterprise edition of Puppet

You’ll note that I added a source called ‘hiera’ that tracks my Hiera repository, creates sub-folders in /etc/puppetlabs/puppet/hiera, and enables prefixing to deal with the bug I mentioned in the previous paragraph. Now, let’s run Puppet and do an R10k synchronization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
[root@master1 garysawesomeenvironment]# puppet apply /var/tmp/r10k_installation.pp
Notice: Compiled catalog for master1 in environment production in 1.78 seconds
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/content: content changed '{md5}c686917fcb572861429c83f1b67cfee5' to '{md5}69d38a14b5de0d9869ebd37922e7dec4'
Notice: Finished catalog run in 1.24 seconds

[root@master1 puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_testing
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/hiera

[root@master1 puppet]# ls /etc/puppetlabs/puppet/hiera
hiera_master  hiera_production  hiera_testing

[root@master1 puppet]# ls /etc/puppetlabs/puppet/environments/
development  garysawesomeenvironment  master  production

Great, so it configured R10k to clone the Hiera repository to /etc/puppetlabs/puppet/hiera like we wanted it to, and you can see that with prefixing enabled we have folders named “hiera_${branchname}”.

In Puppet, the magical connection that maps these subfolders to Puppet environments is in puppet.conf, but for Hiera that’s the hiera.yaml file. I’ve included that file in my Hiera repo, so let’s look at the copy at /etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml:

/etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml
1
2
3
4
5
6
7
8
9
10
---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - global

:yaml:
  :datadir: '/etc/puppetlabs/puppet/hiera/hiera_%{environment}/hieradata'

The magical line is in the :datadir: setting of the :yaml: section; it uses %{environment} to evaluate the environment variable set by Puppet and set the path accordingly.

As of right now R10k is configured to clone Hiera data from a known repository to /etc/puppetlabs/puppet/hiera, to create sub-folders based on branches to that repository, and to tie data provided to each Puppet environment to the respective subfolder of /etc/puppetlabs/puppet/hiera that matches the pattern of “hiera_(environment_name)”.

The problem with hiera.yaml

You’ll notice that each subfolder to /etc/puppetlabs/puppet/hiera contains its own copy of hiera.yaml. You’re probably drawing the conclusion that each Puppet environment can read from its own hiera.yaml for Hiera configuration.

And you would be wrong.

For information on this bug, check out this link. You’ll see that we provide a ‘hiera_config’ configuration option in Puppet that allows you to specify the path to hiera.yaml, but Puppet loads that config as singleton, which means that it’s read initially when the Puppet master process starts up and it’s NOT environment-aware. The workaround is to use one hiera.yaml for all environments on a Puppet master but to dynamically change the :datadir: path according to the current environment (in the same way that dynamic Puppet environments abuse ‘$environment’ in puppet.conf). You gain the ability to have per-environment changes to Hiera data but lose the ability to do things like using different hierarchies for different environments. As of right now, if you want a different hierarchy then you’re going to need to use a different master (or do some hacky things that I don’t even want to BEGIN to approach in this article).

In summary – there will be a hiera.yaml per environment, but they will not be consulted on a per-environment basis.

Workflow for per-environment Hiera data

Looking back on the previous post, you’ll see that the workflow for updating Hiera data is identical to the workflow for updating code to your Puppet environments. Namely, to create a new environment for testing Hiera data, you will:

  • Push a branch to the Hiera repository and name it accordingly (remembering that the name you choose will be a new environment).
  • Run R10k to synchronize the data down to the Puppet master
  • Add your node to that environment and test out the changes

For existing environments, simply push changes to that environment’s branch and repeat the last two steps.

NOTE: Puppet environments and Hiera environments are linked – both tools use the same ‘environment’ concept and so environment names MUST match for the data to be shared (i.e. if you create an environment in Puppet called ‘yellow’, you will need a Hiera environment called ‘yellow’ for that data).

This tight-coupling can cause issues, and will ultimately mean that certain branches are longer-lived than others. It’s also the reason why I don’t use defaults in my hiera() lookups inside Puppet manifests – I WANT the early failure of a compilation error to alert me of something that needs fixed.

You will need to determine whether this tight-coupling is worth it for your organization to tie your Hiera repository directly into R10k or to handle it out-of-band.

R10k and monolithic module repositories

One of the first requirements you encounter when working with R10k is that your component modules need to be stored in their own repositories. That convention is still relatively new – it wasn’t so long ago that we were recommending that modules be locked away in a giant repo. Why?

  • It’s easier to clone
  • The state of module reusability was poor

The main reason was that it was easier to put everything in one repo and clone it out on all your Puppet master servers. This becomes insidious as your module count rises and people start doing lovely things like committing large binaries into modules, pulling in old versions of modules they find out on the web, and the like. It also becomes an issue when you start needing to lock committers out of specific directories due to sensitive data, and blah blah blah blah…

There are better posts out there justifying/villafying the choice of one or multiple repositories, this section’s meant only to show you how to incorporate a single repository containing multiple modules into your R10k workflow.

From the last post you’ll remember that the Puppetfile allows you to tie a repository, and some version reference, to a directory using R10k. Incorporating a monolithic repository starts with an entry in the Puppetfile like so:

Puppetfile
1
2
3
mod "my_big_module_repo",
  :git => "git://github.com/glarizza/my_big_module_repo.git",
  :ref => '1.0.0'

NOTE: That git repository doesn’t exist. I don’t HAVE a monolithic repo to demonstrate, so I’ve chosen an arbitrary URI. Also note that you can use ANY name you like after the mod syntax to name the resultant folder – it doesn’t HAVE to mirror the URI of the repository.

Adding this entry to the Puppetfile would checkout that repository to wherever all the other modules are checked out with a folder name of ‘my_big_module_repo’. Within that folder would most-likely (again, depending on how you’ve laid out your repository) contain subfolders containing Puppet modules. This entry gets the modules onto your Puppet master, but it doesn’t make Puppet aware of their location. For that, we’re going to need to add an entry to the ‘modulepath’ configuration item in puppet.conf

Inside /etc/puppetlabs/puppet/puppet.conf you should see a configuration item called ‘modulepath’ that currently has a value of:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules

The modulepath itself works like a PATH environment variable in Linux – it’s a priority-based lookup mechanism that Puppet uses to find modules. Currently, Puppet will first look in /etc/puppetlabs/puppet/environments/$environment/modules for a module. If a the module that Puppet was looking for was found, Puppet will use it and not inspect the second path. If the module was not found at the FIRST path, it will inspect the second path. Failing to find the module at the second path results in a compilation error for Puppet. Using this to our advantage, we can add the path to the monolithic repository checked-out by the Puppetfile AFTER the path to where all the individual modules are checked-out. This should look something like this:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/etc/puppetlabs/puppet/environments/$environment/modules/my_big_module_repo:/opt/puppet/share/puppet/modules

Note: This assumes all modules are in the root of the monolithic repo. If they’re in a subdirectory, you must adjust accordingly

That’s a huge line (and if you’re afraid of anything over 80 column-widths then I’m sorry…and you should probably buy a new monitor…and the 80s are over), but the gist is that we’re first going to look for modules checked out by R10k, THEN we’re going to look for modules in our monolithic repo, then we’re going to look in Puppet Enterprise’s vendored module directory, and finally, like I said above, we’ll fail if we can’t find our module. This will allow you to KEEP using your monolithic repository and also slowly cut modules inside that monolithic repo over to their own repositories (since when they gain their own repository, they will be located in a path that COMES before the monolithic repo, and thus will be given priority).

Using MCollective to perform R10k synchronizations

This section is going to be much less specific than the rest because the piece that does the ACTION is part of a module for R10k. As of the time of this writing, this agent is in one state, but that could EASILY change. I will defer to the module in question (and specifically its README file) should you need specifics (or if my module is dated). What I CAN tell you, however, is that the R10k module does come with a class that will setup and configure both an MCollective agent for R10k and also a helper application that should make doing R10k synchroniations on multiple Puppet masters much easier than doing them by hand. First, you’ll need to INSTALL the MCollective agent/application, and you can do that by pulling down the module and its dependencies, and classifying all Puppet masters with R10k enabled by doing the following:

1
include r10k::mcollective

Terribly difficult, huh? With that, both the MCollective agent and application should be available to MCollective on that node. The way to trigger a syncronization is to login to an account on a machine that has MCollective client access (in Puppet Enterprise, this would be any Puppet master that’s allowed the role, and then, specifically, the peadmin user…so doing a su - peadmin should afford you access to that user), and perform the following command:

1
mco r10k deploy

This is where the README differs a bit, and the reason for that is because Finch changed the syntax that R10k uses to synchronize and deploy modules to a Master. The CURRENTLY accepted command (because, knowing Finch, that shit might change) is r10k deploy environment -p, and the action to the MCollective agent that EXECUTES that command is the ‘deploy’ action. The README refers to the ‘synchronize’ action, which executes the r10k synchronize command. This command MAY STILL WORK, but it’s deprecated, and so it’s NOT recommended to be used.

Like I said before, this agent is subject to change (mainly do to R10k command deprecation and maturation), so definitely refer to the README and the code itself for more information (or file issues and pull requests on the module repo directly).

Tying R10k to CI workflows

I spent a year doing some presales work for the Puppet Labs SE team, so I can hand-wave and tapdance like a motherfucker. I’m going to need those skills for this next section, because if you thought the previous section glossed over the concepts pretty quickly and without much detail, then this section is going to feel downright vaporous (is that a word? Fuck it; I’m handwaving – it’s a word). I really debated whether to include the following sections in this post because I don’t really give you much specific information; it’s all very generic and full of “ideas” (though I do list some testing libraries below that are helpful if you’ve never heard of them). Feel free to abandon ship and skip to the FINAL section right now if you don’t want to hear about ‘ideas’.

For the record, I’m going to just pick and use the term “CI” when I’m referring to the process of automating the testing and deployment of, in this case, Puppet code. There have definitely been posts arging about which definition is more appropriate, but, frankly, I’m just going to pick a term and go with it,

The issue at hand is that when you talk “CI” or “CD” or “Continuous (fill_in_the_blank)”, you’re talking about a workflow that’s tailored to each organization (and sometimes each DEPARTMENT of an organization). Sometimes places can agree on a specific tool to assist them with this process (be it Jenkins, Hudson, Bamboo, or whatever), but beyond that it’s anyone’s game.

Since we’re talking PUPPET code, though, you’re restricted to certain tasks that will show up in any workflow…and THAT is what I want to talk about here.

To implement some sort of CI workflow means laying down a ‘pipeline’ that takes a change of your Puppet code (a new module, a change to an existing module, some Hiera data updates, whatever) from the developer’s/operations engineer’s workstation right into production. The way we do this with R10k currently is to:

  • Make a change to an individual module
  • Commit/push those changes to the module’s remote repository
  • Create a test branch of the puppet_repository
  • Modify the Puppetfile and tie your module’s changes to this environment
  • Commit/push those changes to the puppet_repository
  • Perform an R10k synchronization
  • Test
  • Repeat steps 1-7 as necessary until shit works how you like it
  • Merge the changes in the test branch of the puppet_repository with the production branch
  • Perform an R10k synchronization
  • Watch code changes become active in your production environment

Of those steps, there’s arguably about 3 unique steps that could be automated:

  • R10k synchronizations
  • ‘Testing’ (whatever that means)
  • Merging the changes in the test branch of the puppet_repository with the production branch

NOTE: As we get progressively-more-handwavey (also probably not a word, but fuck it – let’s be thought leaders and CREATE IT), each one of these steps is going to be more and more…generic. For example – to say “test your code” is a great idea, but, seriously, defining how to do that could (and should) be multiple blog posts.

Laying down the pipeline

If I were building an automated workflow, the first thing I would do is setup something like Jenkins and configure it to watch the puppet_repository that contains the Puppetfile mapping all my modules and versions to Puppet environments. On changes to this repository, we want Jenkins to perform an R10k synchronization, run tests, and then, possibly, merge those changes into production (depending on the quality of your tests and how ‘webscale’ you think you are on that day).

R10k synchronizations

If you’re paying attention, we solved this problem in the previous section with the R10k MCollective agent. Jenkins should be running on a machine that has the ability to execute MCollective client commands (such as triggering mco r10k deploy when necessary). You’ll want to tailor your calls from Jenkins to only deploy environments it’s currently testing (remember in the puppet_repository that topic branches map to Puppet environments, so this is a per-branch action) as opposed to deploying ALL environments every time.

Also, if you’re buiding a pipeline, you might not want to do R10k synchronizations on ALL of your Puppet Masters at this point. Why not? Well, if your testing framework is good enough and has sufficient coverage that you’re COMPLETELY trusting it to determine whether code is acceptable or not, then this is just the FIRST step – making the code available to be tested. It’s not passed tests yet, so pushing it out to all of your Puppet masters is a bit wasteful. You’ll probably want to only synchronize with a single master that’s been identified for testing (and a master that has the ability to spin up fresh nodes, enforce the Puppet code on them, submit those nodes to a battery of tests, and then tear them down when everything has been completed).

If you’re like the VAST majority of Puppet users out there that DON’T have a completely automated testing framework that has such complete coverage that you trust it to determine whether code changes are acceptable or not, then you’re probably ‘testing’ changes manually. For these people, you’ll probably want to synchronize code to whichever Puppet master(s) are suitable.

The cool thing about these scenarios is that MCollective is flexible enough to handle this. MCollective has the ability to filter your nodes based on things like available MCollective agents, Facter facts, Puppet classes, and even things like the MD5 hashes of arbitrary files on the filesystem…so however you want to restrict synchronization, you can do it with MCollective.

After all of that, the answer here is “Use MCollective to do R10k syncs/deploys.”

Testing

This section needs its own subset of blog posts. There are all kinds of tools that will allow you to test all sorts of things about your Puppet code (from basic syntax checking and linting, to integration tests that check for the presence of resources in the catalog, to acceptance-level tests that check the end-state of the system to make sure Puppet left it in a state that’s acceptable). The most common tools for these types of tests are:

Unfortunately, the point of this section is NOT to walk you through setting up one or more of those tools (I’d love to write those posts soon…), but rather to make you aware of their presence and identify where they fit in our Pipeline.

Once you’ve synchronized/deployed code changes to a specific machine (or subset of machines), the next step is to trigger tests.

Backing up the train a bit, certain kinds of ‘tests’ should be done WELL in advance of this step. For example, if code changes don’t even pass basic syntax checking and linting, they shouldn’t even MAKE it into your repository. Things like pre-commit hooks will allow you to trigger syntactical checks and linting before a commit is allowed. We’re assuming you’ve already set those up (and if you’ve NOT, then you should probably do that RIGHT NOW).

Rather, in this section, we’re talking about doing some basic integration smoke testing (i.e. running the rspec-puppet tests on all the modules to ensure that what we EXPECT in the catalog is actually IN the catalog), moving into acceptance level testing (i.e. spinning up pristine/clean nodes, actually applying the Puppet code to the nodes, and then running things like Beaker or Serverspec on the nodes to check the end-state of things like services, open ports, configuration files, and whatever to ensure that Puppet ACTUALLY left the system in a workable state), and then returning a “PASS” or “FAIL” response to Jenkins (or whatever is controlling your pipeline).

These tests can be as thorough or as loose as is acceptable to you (obviously, the goal is to automate ALL of your tests so you don’t have to manually check ANY changes, but that’s the nerd-nirvana state where we’re all browsing the web all day), but they should catch the most NOTORIOUS and OBVIOUS things FIRST. Follow the same rules you did when you got started with Puppet – catch the things that are easiest to catch and start building up your cache of “Total Time Saved.”

Jenkins needs to be able to trigger these tests from wherever it’s running, so your Jenkins box needs the ability to, say, spin up nodes in ESX, or locally with something like Vagrant, or even cloud nodes in EC2 or GCE, then TRIGGER the tests, and finally get a “PASS” or “FAIL” response back. The HARDEST part here, by far, is that you have to define what level of testing you’re going to implement, how you’re going to implement it, and devise the actual process to perform the testing. Like I said before, there are other blog posts that talk about this (and I hope to tackle this topic in the very near future), so I’ll leave it to them for the moment.

To merge or not to merge

The final step for any test code is to determine whether it should be merged into production or not. Like I said before, if your tests are sufficient and are adequate at determining whether a change is ‘good’ or not, then you can look at automating the process of merging those changes into production and killing off the test branch (or, NOT merging those changes, and leaving the branch open for more changes).

Automatically merging is scary for obvious reasons, but it’s also a good ‘test’ for your test coverage. Committing to a ‘merge upon success’ workflow takes trust, and there’s absolutely no shame in leaving this step to a human, to a change review board, or to some out-of-band process.

Use your illusion

These are the most common questions I get asked after the initial shock of R10k, and its workflow, wears off. Understand that I do these posts NOT from a “Here’s what you should absolutely be doing!” standpoint, but more from a “Here’s what’s going on out there.” vantage. Every time I’m called on-site with a customer, I evaluate:

  • The size and experience level of the team involved
  • The processes that the team must adhere to
  • The Puppet experience level of the team
  • The goals of the team

Frankly, after all those observations, sometimes I ABSOLUTELY come to the conclusion that something like R10k is entirely-too-much process for not-enough benefit. For those who are a fit, though, we go down the checklists and tailor the workflow to the environment.

What more IS there on R10k?

I do have at least a couple of more posts in me on some specific issues I’ve hit when consulting with companies using R10k, such as:

  • How best to use Hiera and R10k with Puppet ‘environments’ and internal, long-term ‘environments’
  • Better ideas on ‘what to branch and why’ with regard to component modules and the puppet_repository
  • To inherit or not to inherit with Roles
  • How to name things (note that I work for Puppet Labs, so I’m most likely very WRONG with this section)
  • Other random things I’ve noticed…

Also, I apologize if it’s been awhile since I’ve replied to a couple of comments. I’m booked out 3 months in advance and things are pretty wild at the moment, but I’m REALLY thankful of everyone who cares enough to drop a note, and I hope I’m providing some good info you can actually use! Cheers!

Building a Functional Puppet Workflow Part 3: Dynamic Environments With R10k

Workflows are like kickball games: everyone knows the general idea of what’s going on, there’s an orderly progression towards an end-goal, nobody wants to be excluded, and people lose their shit when they get hit in the face by a big rubber ball. Okay, so maybe it’s not a perfect mapping but you get the idea.

The previous two posts (one and two) focused on writing modules, wrapping modules, and classification. While BOTH of these things are very important in the grand scheme of things, one of the biggest problems people get hung-up on is how do you iterate upon your modules, and, more importantly, how do you eventually get these changes pushed into production in a reasonably orderly fashion?

This post is going to be all over the place. We’re gonna cover the idea of separate environments in Puppet, touch on dynamic environments, and round it out with that mother-of-a-shell-script-turned-personal-savior, R10k. Hold on to your shit.

Puppet Environments

Puppet has the concept of ‘environments’ where you can logically separate your modules and manifest (read: site.pp) into separate folders to allow for nodes to get entirely separate bits of code based on which ‘environment’ the node belongs to.

Puppet environments are statically set in puppet.conf, but, as other blog posts have noted, you can do some crafty things in puppet.conf to give you the solution of having ‘dynamic environments’.

NOTE: The solutions in this post are going to rely on Puppet environments, however environments aren’t without their own shortcomings namely, this bug on Ruby plugins in Puppet). For testing and promoting Puppet classes written in the DSL, environments will help you out greatly. For complete separation of Ruby instances and any plugins to Puppet written in Ruby, however, you’ll need separate masters (which is something that I won’t be covering in this article).

One step further – ‘dynamic’ environments

Adrien Thebo, hitherto known as ‘Finch’, – who is known for building awesome things and talking like he’s fresh from a Redbull binge – created the now-famous blog post on creating dynamic environments in Puppet with git. That post relied upon a post-commit hook to do all the jiggery-pokery necessary to checkout the correct branches in the correct places, and thus it had a heavy reliance upon git.

Truly, the only magic in puppet.conf was the inclusion of ‘$environment’ in the modulepath configuration entry on the Puppet master (literally that string and not the evaluated form of your environment). By doing that, the Puppet master would replace the string ‘$environment’ with the environment of the node checking in and would look to that path for Puppet manifests and modules. If you use something OTHER than git, it would be up to you to create a post-receive hook that populated those paths, but you could still replicate the results (albiet with a little work on your part).

People used this pattern and it worked fairly well. Hell, it STILL works fairly well, nothing has changed to STOP you from using it. What changed, however, was the ecosystem around modules, the need for individual module testing, and the further need to automate this whole goddamn process.

Before we deliver the ‘NEW SOLUTION’, let’s provide a bit of history and context.

Module repositories: the one-to-many problem

I touched on this topic in the first post, but one of the first problems you encounter when putting your modules in version control is whether or not to have ONE GIANT REPO with all of your modules, or a repository for every module you create. In the past we recommended putting every module in one repository (namely because it was easier, the module sharing landscape was pretty barren, and teams were smaller). Now, we recommend the opposite for the following reasons:

  • Individual repos mean individual module development histories
  • Most VCS solutions don’t have per-folder ACLs for a single repositories; having multiple repos allows per-module security settings.
  • With the one-repository-per-module solution, modules you pull down from the Forge (or Github) must be committed to your repo. Having multiple repositories for each module allow you to keep everything separate
  • Publishing this module to the Forge (or Github/Stash/whatever) is easier with separate repos (rather than having to split-out the module later).

The problem with having a repository for every Puppet module you create is that you need a way to map every module with every Puppet master (and, also which version of every module should be installed in which Puppet environment).

A project called librarian-puppet sprang up that created the ‘Puppetfile’, a file that would map modules and their versions to a specific directory. Librarian was awesome, but, as Finch noted in his post, it had some shortcomings when used in an environment with many and fast-changing modules. His solution, that he documented here,, was the tool we now come to know as R10k.

Enter R10k

R10k is essentially a Ruby project that wraps a bunch of shell commands you would NORMALLY use to maintain an environment of ever-changing Puppet modules. Its power is in its ability to use Git branches combined with a Puppetfile to keep your Puppet environments in-sync. Because of this, R10k is CURRENTLY restricted to git. There have been rumblings of porting it to Hg or svn, but I know of no serious attempts at doing this (and if you ARE doing this, may god have mercy on your soul). Great, so how does it work?

Well, you’ll need one main repository SIMPLY for tracking the Puppetfile. I’ve got one right here, and it only has my Puppetfile and a site.pp file for classification (should you use it).

NOTE: The Puppetfile and librarian-puppet-like capabilities under the hood are going to be doing most of the work here – this repository is solely so you can create topic branches with changes to your Puppetfile that will eventually become dynamically-created Puppet environments.

Let’s take a look at the Puppetfile and see what’s going on:

Puppetfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
forge "http://forge.puppetlabs.com"

# Modules from the Puppet Forge
mod "puppetlabs/stdlib"
mod "puppetlabs/apache", "0.11.0"
mod "puppetlabs/pe_gem"
mod "puppetlabs/mysql"
mod "puppetlabs/firewall"
mod "puppetlabs/vcsrepo"
mod "puppetlabs/git"
mod "puppetlabs/inifile"
mod "zack/r10k"
mod "gentoo/portage"
mod "thias/vsftpd"


# Modules from Github using various references
mod "wordpress",
  :git => "git://github.com/hunner/puppet-wordpress.git",
  :ref => '0.4.0'

mod "property_list_key",
  :git => "git://github.com/glarizza/puppet-property_list_key.git",
  :ref => '952a65d9ea2c5809f4e18f30537925ee45548abc'

mod 'redis',
  :git => 'git://github.com/glarizza/puppet-redis',
  :ref => 'feature/debian_support'

This example lists the syntax for dealing with modules from both the Forge and Github, as well as pulling specific versions of modules (whether versions in the case of the Forge, or Github references as tags, branches, or even specific commits). The syntax is not hard to follow – just remember that we’re mapping modules and their versions to a set/known environment.

For every topic branch on this repository (containing the Puppetfile), R10k will in turn create a Puppet environment with the same name. For this reason, it’s convention to rename the ‘master’ branch to ‘production’ since that’s the default environment in Puppet (note that renaming branches locally is easy – renaming the branch on Github can sometimes be a pain in the ass). You will also note why it’s going to be somewhat hard to map R10k to subversion, for example, due to the lack of lightweight branching schemes.

To explain any more of R10k reads just as if I were describing its installation, so let’s quit screwing around and actually INSTALL/SETUP the damn thing.

Setting up R10k

As I mentioned before, we have the main repository that will be used to track the Puppetfile, which in turn will track the modules to be installed (whether from The Forge, Github, or some internal git repo). Like any good Puppet component, R10k itself can be setup with a Puppet module. The module I’ll be using was developed by Zack Smith, and is pretty simple to get started. Let’s download it from the forge first:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@master1 vagrant]# puppet module install zack/r10k
Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└─┬ zack-r10k (v1.0.2)
  ├─┬ gentoo-portage (v2.1.0)
  │ └── puppetlabs-concat (v1.0.1)
  ├── mhuffnagle-make (v0.0.2)
  ├── puppetlabs-gcc (v0.1.0)
  ├── puppetlabs-git (v0.0.3)
  ├── puppetlabs-inifile (v1.0.1)
  ├── puppetlabs-pe_gem (v0.0.1)
  ├── puppetlabs-ruby (v0.1.0)
  └── puppetlabs-vcsrepo (v0.2.0)

The module will be installed into the first path in your modulepath, which in the case above is /etc/puppetlabs/puppet/modules. This modulepath will change due to the way we’re going to setup our dynamic Puppet environments. For this example, I’m going to have environments dynamically generated at /etc/puppetlabs/puppet/environments, so let’s create that directory first:

1
[root@master1 vagrant]# mkdir -p /etc/puppetlabs/puppet/environments

Now, we need to setup R10k on this machine. The module we downloaded will allow us to do that, but we’ll need to create a small Puppet manifest that will allow us to setup R10k out-of-band from a regular Puppet run (you CAN continuously-enforce R10k configuration in-band with your regular Puppet run, but if we’re setting up a Puppet master to use R10k to serve out dynamic environments it’s possible to create a chicken-and-egg situation.). Let’s generate a file called r10k_installation.pp in /var/tmp and have it look like the following:

/var/tmp/r10k_installation.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
class { 'r10k':
  version           => '1.1.3',
  sources           => {
    'puppet' => {
      'remote'  => 'https://github.com/glarizza/puppet_repository.git',
      'basedir' => "${::settings::confdir}/environments",
      'prefix'  => false,
    }
  },
  purgedirs         => ["${::settings::confdir}/environments"],
  manage_modulepath => true,
  modulepath        => "${::settings::confdir}/environments/\$environment/modules:/opt/puppet/share/puppet/modules",
}

So what is every section of that declaration doing?

  • version => '1.1.3' sets the version of the R10k gem to install
  • sources => {...} is a hash of sources that R10k is going to track. For now it’s only our main Puppet repo, but you can also track a Hiera installation too. This hash accepts key/value pairs for configuration settings that are going to be written to /etc/r10k.yaml, which is R10k’s main configuration file. The keys in-use are remote, which is the path to the repository to-be-checked-out by R10k, basedir, which is the path on-disk to where dynamic environments are to be created (we’re using the $::settings::confdir variable which maps to the Puppet master’s configuration directory, or /etc/puppetlabs/puppet), and prefix which is a boolean to determine whether to use R10k’s source-prefixing feature. NOTE: the false value is a BOOLEAN value, and thus SHOULD NOT BE QUOTED. Quoting it turns it into a string, which matches as a boolean TRUE value. Don’t quote false – that’s bad, mmkay.
  • purgedirs=> ["${::settings::confdir}/environments"] is configuring R10k to implement purging on the environments directory (so any folders that R10k doesn’t create it will delete). This configuration MAY be moot with newer versions of R10k as I believe it implements this behavior by default.
  • manage_modulepath => true will ensure that this module sets the modulepath configuration item in /etc/puppetlabs/puppet/puppet.conf
  • modulepath => ... sets the modulepath value to be dropped into /etc/puppetlabs/puppet/puppet.conf. Note that we are interpolating variables ($::settings::confdir again), AND inserting the LITERAL string of $environment into the modulepath – this is because Puppet will replace $environment with the value of the agent’s environment at catalog compilation.

JUST IN CASE YOU MISSED IT: Don’t quote the false value for the prefix setting in the sources block. That is all.

Okay, we have our one-time Puppet manifest, and now the only thing left to do is to run it:

1
2
3
4
5
6
7
[root@master1 tmp]# puppet apply /var/tmp/r10k_installation.pp
Notice: Compiled catalog for master1 in environment production in 2.05 seconds
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/ensure: defined content as '{md5}0b619d5148ea493e2d6a5bb205727f0c'
Notice: /Stage[main]/R10k::Config/Ini_setting[R10k Modulepath]/value: value changed '/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules' to '/etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules'
Notice: /Package[r10k]/ensure: created
Notice: /Stage[main]/R10k::Install::Pe_gem/File[/usr/bin/r10k]/ensure: created
Notice: Finished catalog run in 10.55 seconds

At this point, it goes without saying that git needs to be installed, but if you’re firing up a new VM that DOESN’T have git, then R10k is going to spit out an awesome error – so ensure that git is installed. After that, let’s synchronize R10k with the r10k deploy environment -pv command (-p for Puppetfile synchronization and -v for verbose mode):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
[root@master1 puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

I ran this first synchronization with verbose mode so you can see exactly what’s getting copied where. Futher synchronizations don’t have to be in verbose mode, but it’s good for debugging. After all of that, we have an /etc/puppetlabs/puppet/environments folder containing our dynamic Puppet environments based off of the branches of the main Puppet repo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@master1 puppet]# ls -lah /etc/puppetlabs/puppet/environments/
total 20K
drwxr-xr-x 5 root root 4.0K Feb 19 11:44 .
drwxr-xr-x 7 root root 4.0K Feb 19 11:25 ..
drwxr-xr-x 4 root root 4.0K Feb 19 11:44 development
drwxr-xr-x 5 root root 4.0K Feb 19 11:43 master
drwxr-xr-x 5 root root 4.0K Feb 19 11:42 production

[root@master1 puppet]# cd /etc/puppetlabs/puppet/environments/production/
[root@master1 production]# git branch -a
  master
* production
  remotes/origin/HEAD -> origin/master
  remotes/origin/development
  remotes/origin/master
  remotes/origin/production

As you can see (at the time of this writing), my main Puppet repo has three main branches: development, master, and production, and so R10k created three Puppet environments matching those names. It’s somewhat of a convention to rename the master branch to production, but in this case I left it alone to demonstrate how this works.

ONE OTHER BIG GOTCHA: R10k does NOT resolve dependencies, and so it is UP TO YOU to track them in your Puppetfile. Check this out:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
[root@master1 production]# puppet module list
Warning: Module 'puppetlabs-firewall' (v1.0.0) fails to meet some dependencies:
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'puppetlabs-firewall' (v0.3.x)
Warning: Module 'puppetlabs-stdlib' (v4.1.0) fails to meet some dependencies:
  'puppetlabs-pe_accounts' (v2.0.1) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-pe_mcollective' (v0.1.14) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-request_manager' (v0.0.10) requires 'puppetlabs-stdlib' (v3.2.x)
Warning: Missing dependency 'cprice404-inifile':
  'puppetlabs-pe_puppetdb' (v0.0.11) requires 'cprice404-inifile' (>=0.9.0)
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'cprice404-inifile' (v0.10.x)
  'puppetlabs-puppetdb' (v1.5.1) requires 'cprice404-inifile' (>= 0.10.3)
Warning: Missing dependency 'puppetlabs-concat':
  'puppetlabs-apache' (v0.11.0) requires 'puppetlabs-concat' (>= 1.0.0)
  'gentoo-portage' (v2.1.0) requires 'puppetlabs-concat' (v1.0.x)
Warning: Missing dependency 'puppetlabs-gcc':
  'zack-r10k' (v1.0.2) requires 'puppetlabs-gcc' (>= 0.0.3)
/etc/puppetlabs/puppet/environments/production/modules
├── gentoo-portage (v2.1.0)
├── mhuffnagle-make (v0.0.2)
├── property_list_key (???)
├── puppetlabs-apache (v0.11.0)
├── puppetlabs-firewall (v1.0.0)  invalid
├── puppetlabs-git (v0.0.3)
├── puppetlabs-inifile (v1.0.1)
├── puppetlabs-mysql (v2.2.1)
├── puppetlabs-pe_gem (v0.0.1)
├── puppetlabs-ruby (v0.1.0)
├── puppetlabs-stdlib (v4.1.0)  invalid
├── puppetlabs-vcsrepo (v0.2.0)
├── redis (???)
├── ripienaar-concat (v0.2.0)
├── thias-vsftpd (v0.2.0)
├── wordpress (???)
└── zack-r10k (v1.0.2)
/opt/puppet/share/puppet/modules
├── cprice404-inifile (v0.10.3)
├── puppetlabs-apt (v1.1.0)
├── puppetlabs-auth_conf (v0.1.7)
├── puppetlabs-firewall (v0.3.0)  invalid
├── puppetlabs-java_ks (v1.1.0)
├── puppetlabs-pe_accounts (v2.0.1)
├── puppetlabs-pe_common (v0.1.0)
├── puppetlabs-pe_mcollective (v0.1.14)
├── puppetlabs-pe_postgresql (v0.0.5)
├── puppetlabs-pe_puppetdb (v0.0.11)
├── puppetlabs-postgresql (v2.5.0)
├── puppetlabs-puppet_enterprise (v3.1.0)
├── puppetlabs-puppetdb (v1.5.1)
├── puppetlabs-reboot (v0.1.2)
├── puppetlabs-request_manager (v0.0.10)
├── puppetlabs-stdlib (v3.2.0)  invalid
└── ripienaar-concat (v0.2.0)

I’ve installed Puppet Enterprise 3.1.0, and so /opt/puppet/share/puppet/modules reflects the state of the Puppet Enterprise (also known as ‘PE’) modules at that time. You can see that there are some conflicts because certain modules require certain versions of other modules. This is currently the nature of the beast with regard to Puppet modules. Some of these errors are loud and incidental (i.e. someone set a dependency on a version and forgot to update it), some are due to namespace changes (i.e. cfprice-inifile being ported over to puppetlabs-inifile), and so on. Basically, ensure that you handle the dependencies you care about inside the Puppetfile as R10k won’t do it for you.

There – we’ve done it! We’ve configured R10k! Now how the hell do you use it?

R10k demonstration – from module iteration to environment iteration

Let’s take the environment we’ve setup in the previous steps and walk you through adding a new module to your production environment, iterating upon that module, pushing the changes to that module, pushing the changes to a Puppet environment, and then promoting those changes to production.

NOTES ON THE SETUP OF THIS DEMO:

  • In this demonstration, classification method is going to be left to the user (i.e. it’s not a part of the magic). So, when I tell you to classify your node with a specific class, I don’t care if you use the Puppet Enterprise Console, site.pp, or any other manner.
  • I’m using Github for my repositories so that you folk watching and playing along at home can have something to follow. Feel free to substitute Github for something like Atlassian Stash/Bitbucket, internal repos, or whatever.

Add the module to an environment

The module we’ll be working with, a simple module called ‘notifyme’, will notify a message that will help us track the module’s process through all phases of iteration.

The first thing we need to do is to add the module to an environment, so let’s dynamically create a NEW environment by creating a new topic branch and pushing it up to the main puppet repo. I will perform this step on my laptop and outside of the VM I’m using to test R10k:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
└(~/src/puppet_repository)▷ git branch
  master
* production

└(~/src/puppet_repository)▷ git checkout -b notifyme
Switched to a new branch 'notifyme'

└(~/src/puppet_repository)▷ vim Puppetfile

# Perform the changes to Puppetfile here

└(~/src/puppet_repository)▷ git add Puppetfile
└(~/src/puppet_repository)▷ git commit
[notifyme 5239538] Add the 'notifyme' module
 1 file changed, 3 insertions(+)

└(~/src/puppet_repository)▷ git push origin notifyme:notifyme
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 348 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
 * [new branch]      notifyme -> notifyme

The contents I added to my Puppetfile look like this:

Puppetfile
1
2
mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git"

Perform an R10k synchronization

To pull the new dynamic environment down to the Puppet master, do another R10k synchronization with r10k deploy environment -pv:

1
2
3
4
5
6
7
8
[root@master1 production]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
<snip for brevity>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment notifyme
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/notifyme/modules
<more snipping>

I only included the relevant messages, but you can see that it pulled in a new environment called ‘notifyme’ that ALSO pulled in a module called ‘notifyme’

Rename the branch to avoid confusion

Suddenly I realize that this may get confusing having both an environment called ‘notifyme’ with a module/class called ‘notifyme’. No worries, how about we rename that branch?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
└(~/src/puppet_repository)▷ git branch -m notifyme garysawesomeenvironment

└(~/src/puppet_repository)▷ git push origin :notifyme
To https://github.com/glarizza/puppet_repository.git
 - [deleted]         notifyme

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 348 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
 * [new branch]      garysawesomeenvironment -> garysawesomeenvironment

That bit of git renamed the ‘notifyme’ branch to ‘garysawesomeenvironment’. The next git command is a bit tricky – when you git push to a remote, it’s supposed to be:

git push name_of_origin local_branch_name:remote_branch_name

In our case, the name of our origin is LITERALLY ‘origin’, but we actually want to DELETE a remote branch. The way to delete a local branch is with git branch -d branch_name, but the way to delete a REMOTE branch is to push NOTHING to it. So consider the following command:

git push origin :notifyme

We’re pushing to the origin named ‘origin’, but providing NO local branch name and pushing that bit of nothing to the remote branch of ‘notifyme’. This kills (deletes) the remote branch.

Finally, we push to our origin named ‘origin’ again and push the contents of the local branch ‘garysawesomeenvironment’ to the remote branch of ‘garysawesomeenvironment’ which in turn CREATES that branch if it doesn’t exist. Whew. Let’s run another damn synchronization:

1
2
3
4
5
6
7
8
[root@master1 production]# `r10k deploy environment -pv`
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
<more snippage>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
<more of that snipping shit>
R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

Cool, let’s check out our environments folder on our VM:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@master1 production]# ls -lah /etc/puppetlabs/puppet/environments/
total 24K
drwxr-xr-x 6 root root 4.0K Feb 19 13:34 .
drwxr-xr-x 7 root root 4.0K Feb 19 12:09 ..
drwxr-xr-x 4 root root 4.0K Feb 19 11:44 development
drwxr-xr-x 5 root root 4.0K Feb 19 13:33 garysawesomeenvironment
drwxr-xr-x 5 root root 4.0K Feb 19 11:43 master
drwxr-xr-x 5 root root 4.0K Feb 19 11:42 production

[root@master1 production]# cd /etc/puppetlabs/puppet/environments/garysawesomeenvironment/

[root@master1 garysawesomeenvironment]# git branch
* garysawesomeenvironment
  master

Run Puppet to test the new environment

Perfect! Now classify your node to include the ‘notifyme’ class, and let’s run Puppet to see what we get when we try to join the environment called ‘garysawesomeenvronment’:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snipping facts loading for brevity>
Info: Caching catalog for master1
Info: Applying configuration version '1392845863'
Notice: This is the notifyme module and its master branch
Notice: /Stage[main]/Notifyme/Notify[This is the notifyme module and its master branch]/message: defined 'message' as 'This is the notifyme module and its master branch'
Notice: Finished catalog run in 11.10 seconds

Cool! Now let’s try to run Puppet with another environment, say ‘production’:

1
2
3
4
5
6
[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snipping facts loading for brevity>
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class notifyme for master1 on node master1
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

We get an error because that module hasn’t been loaded by R10k for that environment.

Tie a module version to an environment

Okay, so we added a module to a new environment, but what if we want to test out a specific commit, branch, or tag of a module and test it in this new environment? This is frequently what you’ll be doing – making a change to an existing module, pushing your change to a topic branch of that module’s repository, tying it to an environment (or creating a new environment by branching the main Puppet repository), and then testing the change.

Let’s go back to my ‘notifyme’ module that I’ve cloned to my laptop and push a change to a BRANCH of that module’s Github repository:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
└(~/src/puppet-notifyme)▷ git branch
* master

└(~/src/puppet-notifyme)▷ git checkout -b change_the_message
Switched to a new branch 'change_the_message'

└(~/src/puppet-notifyme)▷ vim manifests/init.pp
## Make changes to the notify message

└(~/src/puppet-notifyme)▷ git add manifests/init.pp

└(~/src/puppet-notifyme)▷ git commit
[change_the_message bc3975b] Change the Message
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin change_the_message:change_the_message
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 448 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
 * [new branch]      change_the_message -> change_the_message

└(~/src/puppet-notifyme)▷ git branch -a
* change_the_message
  master
  remotes/origin/change_the_message
  remotes/origin/master

└(~/src/puppet-notifyme)▷ git log
commit bc3975bb5c75ada86bfc2c45db628b5a156f85ce
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 13:55:26 2014 -0800

    Change the Message

    This commit changes the message to test my workflow.

What I’m showing you is the workflow that creates a new local branch called ‘change_the_message’ to the notifyme module, changes the message in my notify resource, commits the change, and pushes the changes to a remote branch ALSO called ‘change_the_message’.

Because I created a topic branch, I can provide that branch name in the Puppetfile located in the ‘garysawesomeenvironment’ branch of the main Puppet repo. THAT is the piece that ties together the specific version of the module with the Puppet environment we want on the Puppet master. Here’s that change:

Puppetfile
1
2
3
mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git",
  :ref => 'change_the_message'

Again, that change gets put into the ‘garysawesomeenvironment’ branch of the main Puppet repo and pushed up to the remote:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
└(~/src/puppet_repository)▷ vim Puppetfile
## Make changes

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[garysawesomeenvironment 89b139c] Update garysawesomeenvironment
 1 file changed, 2 insertions(+), 1 deletion(-)

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 411 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   5239538..89b139c  garysawesomeenvironment -> garysawesomeenvironment

└(~/src/puppet_repository)▷ git log -p
commit 89b139c8c2faa888a402b98ea76e4ca138b3463d
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 14:04:18 2014 -0800

    Update garysawesomeenvironment

    Tie this environment to the 'change_the_message' branch of my notifyme module.

diff --git a/Puppetfile b/Puppetfile
index 5e5d091..27fc06e 100644
--- a/Puppetfile
+++ b/Puppetfile
@@ -31,4 +31,5 @@ mod 'redis',
   :ref => 'feature/debian_support'

 mod "notifyme",
-  :git => "git://github.com/glarizza/puppet-notifyme.git"
+  :git => "git://github.com/glarizza/puppet-notifyme.git",
+  :ref => 'change_the_message'

Now let’s synchronize again!!

1
2
3
4
5
6
[root@master1 garysawesomeenvironment]# `r10k deploy environment -pv`
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
<snip>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
<snip>

Cool, let’s check our work on the VM:

1
2
3
4
5
[root@master1 garysawesomeenvironment]# pwd
/etc/puppetlabs/puppet/environments/garysawesomeenvironment
[root@master1 garysawesomeenvironment]# git branch
* garysawesomeenvironment
  master

And finally, let’s run Puppet:

1
2
3
4
5
6
7
8
root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392847743'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 12.10 seconds

TADA! We’ve successfully tied a specific version of a module to a specific dynamic environment, deployed it to a master, and tested it out! Smell that? That’s the smell of awesome. Or Jeff in the next cubicle eating a burrito. Either way, I like it.

Merge your changes with master/production

It’s green – fuck it; ship it! NOW you’re speaking ‘agile’! Assuming everything went according to plan, let’s merge our changes in with the production environment and synchronize. This is up to your company’s workflow docs (whether you use pull requests, a merge master, or poke Patrick and tell him to tell Andy to merge in your change). I’m using git and Github, so let’s merge.

First, do the Module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
└(~/src/puppet-notifyme)▷ git checkout master
Switched to branch 'master'

└(~/src/puppet-notifyme)▷ git merge change_the_message
Updating d44a790..bc3975b
Fast-forward
 manifests/init.pp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin master:master
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
   d44a790..bc3975b  master -> master

└(~/src/puppet-notifyme)▷ cat manifests/init.pp
class notifyme {
  notify { "This is the changed message in the change_the_message branch": }
}

So now we have an issue, and that issue is that the production environment has YET to have the ‘notifyme’ module added to it. If we merge the contents of the ‘garysawesomeenvironment’ branch with the ‘production’ branch of the main Puppet repo, then we’re going to be pointing at the ‘change_the_message’ branch of the ‘notifyme’ module (because that was our last commit).

Because of this, I can’t do a straight merge, can I? For posterity’s sake (in the event that someone in the future wants to look for that branch on my Github repo), I’m going to keep that branch alive. In a production environment, I most likely would NOT have additional branches open for all my component modules as that would get pretty annoying/confusing. Understand that this is a one-off case because I’m doing a demo. BECAUSE of this, I’m going to modify the Puppetfile in the ‘production’ branch of the main Puppet repo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
└(~/src/puppet_repository)▷ git checkout production
Switched to branch 'production'

└(~/src/puppet_repository)▷ vim Puppetfile
## Make changes here

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[production a74f269] Add notifyme module to Production environment
 1 file changed, 4 insertions(+)

└(~/src/puppet_repository)▷ git push origin production:production
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 362 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   5ecefc8..a74f269  production -> production

└(~/src/puppet_repository)▷ git log -p
commit a74f26975102f3786eedddace89bda086162d801
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 14:24:05 2014 -0800

    Add notifyme module to Production environment

diff --git a/Puppetfile b/Puppetfile
index 0b1da68..9168a81 100644
--- a/Puppetfile
+++ b/Puppetfile
@@ -29,3 +29,7 @@ mod "property_list_key",
 mod 'redis',
   :git => 'git://github.com/glarizza/puppet-redis',
   :ref => 'feature/debian_support'
+
+mod 'notifyme',
+  :git => 'git://github.com/glarizza/puppet-notifyme'
+

Alright, we’ve updated the production environment, now synchronize again (I’ll spare you and do it WITHOUT verbose mode):

1
[root@master1 garysawesomeenvironment]# r10k deploy environment -p

Okay, now run Puppet with the PRODUCTION environment:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snipping fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392848588'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 12.66 seconds

Beautiful, we’re synchronized!!!

Making a change to an EXISTING module in an environment

Okay, so we saw previously how to add a NEW module to an environment, but what if we already HAVE a module in an environment and we want to make an update/change to it? Well, it’s largely the same process:

  • Cut a branch to the module
  • Commit your code and push it up to the module’s repo
  • Cut a branch to the main Puppet repo
  • Push that branch up to the main Puppet repo
  • Perform an R10k synchronization to sync the environments
  • Test your changes
  • Merge the changes with the master branch of the module
  • DONE!

Let’s go back and change that notify message again, shall we?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
└(~/src/puppet-notifyme)▷ git checkout -b 'another_change'
Switched to a new branch 'another_change'

└(~/src/puppet-notifyme)▷ vim manifests/init.pp
## Make changes to the message

└(~/src/puppet-notifyme)▷ git add manifests/init.pp

└(~/src/puppet-notifyme)▷ git commit
[another_change 608166e] Change the message that already exists!
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin another_change:another_change
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 426 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
 * [new branch]      another_change -> another_change

Okay, let’s re-use ‘garysawesomeenvironment’ because I like the name, but tie it to the new ‘another_change’ branch of the ‘notifyme’ module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
└(~/src/puppet_repository)▷ git checkout garysawesomeenvironment
Switched to branch 'garysawesomeenvironment'

└(~/src/puppet_repository)▷ vim Puppetfile
## Make change to Puppetfile to tie it to 'another_change' branch

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[garysawesomeenvironment ce84a30] Tie garysawesomeenvironment to 'another_change'
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 386 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   89b139c..ce84a30  garysawesomeenvironment -> garysawesomeenvironment

The Puppetfile for that branch now has an entry for the ‘notifyme’ module that looks like this:

Puppetfile
1
2
3
mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git",
  :ref => 'another_change'

Okay, synchronize again!

1
[root@master1 garysawesomeenvironment]# r10k deploy environment -p

And now run Puppet in the ‘garysawesomeenvironment’ environment:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392849521'
Notice: This changes the message that already exists!!!!
Notice: /Stage[main]/Notifyme/Notify[This changes the message that already exists!!!!]/message: defined 'message' as 'This changes the message that already exists!!!!'
Notice: Finished catalog run in 12.54 seconds

There’s the message that I changed in the ‘another_change’ branch of my ‘notifyme’ module! What’s it look like if I run in the ‘production’ environment, though?

1
2
3
4
5
6
7
8
9

ot@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392848588'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 14.11 seconds

There’s the old message that’s in the ‘master’ branch of the ‘notifyme’ module (which is where the ‘production’ branch Puppetfile is pointing). To merge the changes into the production environment, we now only have to do one thing: that’s merge the changes in the ‘another_change’ branch of the ‘notifyme’ module to the ‘master’ branch – that’s it! Why? Because the Puppetfile in the production branch of the main Puppet repo (and thus the production Puppet ENVIRONMENT) is already POINTING at the master branch of the ‘notifyme’ module. Let’s do the merge:

1
2
3
4
5
6
7
8
9
10
11
12
13
└(~/src/puppet-notifyme)▷ git checkout master
Switched to branch 'master'

└(~/src/puppet-notifyme)▷ git merge another_change
Updating bc3975b..608166e
Fast-forward
 manifests/init.pp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin master:master
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
   bc3975b..608166e  master -> master

Another R10k synchronization is needed on the master:

1
[root@master1 garysawesomeenvironment]# r10k deploy environment -p

And now let’s run Puppet in the production environment:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392850004'
Notice: This changes the message that already exists!!!!
Notice: /Stage[main]/Notifyme/Notify[This changes the message that already exists!!!!]/message: defined 'message' as 'This changes the message that already exists!!!!'
Notice: Finished catalog run in 11.82 seconds

There’s the message that was previously in the ‘another_change’ branch that’s been merged to the ‘master’ branch (and thus is entered into the production Puppet environment).

OR, use tags

One more note – for production environments that want a BIT more stability (rather than hoping that someone follows the policy of pushing commits to a BRANCH of a module rather than pushing directly to master – by accident or otherwise – and allowing that commit to make DIRECTLY it into production), the better way is to tie all modules to some sort of release version. For modules released to the Puppet Forge, that’s a version, for modules stored in git repositories, that would be a tag. Tying all modules in your production environment (and thus production Puppetfile) to specific tags in git repositories IS a “best practice” to ensure that the code that’s executed in production has some sort of ‘safe guard’.

TL;DR: Example tied to ‘master’ branch above was demo, and not necessarily recommended for your production needs.

Holy crap, that’s a lot to take in…

Yeah, tell me about it. And, believe it or not, I’m STILL not done with everything that I want to talk about regarding R10k – there’s still more info on:

  • Using R10k with a monolithic modules repo
  • Incorporating Hiera data
  • Triggering R10k with MCollective
  • Tying R10k to CI workflow

Those will come in a later post once I have time to decide how to tackle them. Until then, this should give you more than enough information to get started with R10k in your own environment.

If you have any questions/comments/corrections, PLEASE enter them in the comments below and I’ll be happy to respond when I’m not flying from gig to gig! :) Cheers!

EDIT: 2/19/2014 – correct librarian-puppet assumption thanks to Reid Vandewiele

Building a Functional Puppet Workflow Part 2: Roles and Profiles

In my first post, I talked about writing functional component modules. Well, I didn’t really do much detailing other than pointing out key bits of information that tend to cause problems. In this post, I’ll describe the next layer to the functional Puppet module workflow.

People usually stop once they have a library of component modules (whether hand-written, taken from Github, or pulled from The Forge). The idea is that you can classify all of your nodes in site.pp, the Puppet Enterprise Console, The Foreman, or with some other ENC, so why not just declare all your classes for every node when you need them?

Because that’s a lot of extra work and opportunities for fuckups.

People recognized this, so in the EARLY days of Puppet they would create node blocks in site.pp and use inheritance to inherit from those blocks. This was the right IDEA, but probably not the best PLACE for it. Eventually, ‘Profiles’ were born.

The idea of ‘Roles and Profiles’ originally came from a piece that Craig Dunn wrote while he worked for the BBC, and then Adrien Thebo also wrote a piece that documents the same sort of pattern. So why am I writing about it a THIRD time? Well, because I feel it’s only a PIECE of an overall puzzle. The introduction of Hiera and other awesome tools (like R10k, which we will get to on the next post) still make Roles and Profiles VIABLE, but they also extend upon them.

One final note before we move on – the terms ‘Roles’ and ‘Profiles’ are ENTIRELY ARBITRARY. They’re not magic reserve words in Puppet, and you can call them whatever the hell you want. It’s also been pointed out that Craig MIGHT have misnamed them (a ROLE should be a model for an individual piece of tech, and a PROFILE should probably be a group of roles), but, like all good Puppet Labs employees – we suck at naming things.

Profiles: technology-specific wrapper classes

A profile is simply a wrapper class that groups Hiera lookups and class declarations into one functional unit. For example, if you wanted Wordpress installed on a machine, you’d probably need to declare the apache class to get Apache setup, declare an apache::vhost for the Wordpress directory, setup a MySQL database with the appropriate classes, and so on. There are a lot of components that go together when you setup a piece of technology, it’s not just a single class.

Because of this, a profile exists to give you a single class you can include that will setup all the necessary bits for that piece of technology (be it Wordpress, or Tomcat, or whatever).

Let’s look at a simple profile for Wordpress:

profiles/manifests/wordpress.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class profiles::wordpress {

  ## Hiera lookups
  $site_name               = hiera('profiles::wordpress::site_name')
  $wordpress_user_password = hiera('profiles::wordpress::wordpress_user_password')
  $mysql_root_password     = hiera('profiles::wordpress::mysql_root_password')
  $wordpress_db_host       = hiera('profiles::wordpress::wordpress_db_host')
  $wordpress_db_name       = hiera('profiles::wordpress::wordpress_db_name')
  $wordpress_db_password   = hiera('profiles::wordpress::wordpress_db_password')
  $wordpress_user          = hiera('profiles::wordpress::wordpress_user')
  $wordpress_group         = hiera('profiles::wordpress::wordpress_group')
  $wordpress_docroot       = hiera('profiles::wordpress::wordpress_docroot')
  $wordpress_port          = hiera('profiles::wordpress::wordpress_port')

  ## Create user
  group { 'wordpress':
    ensure => present,
    name   => $wordpress_group,
  }
  user { 'wordpress':
    ensure   => present,
    gid      => $wordpress_group,
    password => $wordpress_user_password,
    name     => $wordpress_group,
    home     => $wordpress_docroot,
  }

  ## Configure mysql
  class { 'mysql::server':
    root_password => $wordpress_root_password,
  }

  class { 'mysql::bindings':
    php_enable => true,
  }

  ## Configure apache
  include apache
  include apache::mod::php
  apache::vhost { $::fqdn:
    port    => $wordpress_port,
    docroot => $wordpress_docroot,
  }

  ## Configure wordpress
  class { '::wordpress':
    install_dir => $wordpress_docroot,
    db_name     => $wordpress_db_name,
    db_host     => $wordpress_db_host,
    db_password => $wordpress_db_password,
  }
}

Name your profiles according to the technology they setup

Profiles are technology-specific, so you’ll have one to setup wordpress, and tomcat, and jenkins, and…well, you get the picture. You can also namespace your profiles so that you have profiles::ssh::server and profiles::ssh::client if you want. You can even have profiles::jenkins::tomcat and profiles::jenkins::jboss or however you need to namespace according to the TECHNOLOGIES you use. You don’t need to include your environment in the profile name (a la profiles::dev::tomcat) as the bits of data that make the dev environment different from production should come from HIERA, and thus aren’t going to be different on a per-profile basis. You CAN setup profiles according to your business unit if multiple units use Puppet and have different setups (a la security::profiles::tomcat versus ops::profiles::tomcat), but the GOAL of Puppet is to have one main set of modules that every group uses (and the Hiera data being different for every group). That’s the GOAL, but I’m pragmatic enough to understand that not everywhere is a shiny, happy ‘DevOps Garden.’

Do all Hiera lookups in the profile

You’ll see that I declared variables and set their values with Hiera lookups. The profile is the place for these lookups because the profile collects all external data and declares all the classes you’ll need. In reality, you’ll USUALLY only see profiles looking up parameters and declaring classes (i.e. declaring users and groups like I did above will USUALLY be left to component classes).

I do the Hiera lookups first to make it easy to debug from where those values came. I don’t rely on ‘Automatic Parameter Lookup’ in Puppet 3.x.x because it can be ‘magic’ for people who aren’t aware of it (for people new to Puppet, it’s much easier to see a function call and trace back what it does rather than experience Puppet doing something unseen and wondering what the hell happened).

Finally, you’ll notice that my Hiera lookups have NO DEFAULT VALUES – this is BY DESIGN! For most people, their Hiera data is PROBABLY located in a separate repository as their Puppet module data. Imagine making a change to your profile to have it lookup a bit of data from Hiera, and then imagine you FORGOT to put that data into Hiera. What happens if you provide a default value to Hiera? The catalog compiles, that default value gets passed down to the component module, and gets enforced on disk. If you have good tests, you MIGHT see that the component you configured has a bit of data that’s not correct, but what if you don’t have a great post-Puppet testing workflow? Puppet will correctly set this default value, according to Puppet everything is green and worked just fine, but now your component is setup incorrectly. That’s one of the WORST failures – the ones that you don’t catch. Now, imagine you DON’T provide a default value. In THIS case, Puppet will raise a compilation error because a Hiera lookup didn’t return a value. You’ll catch your error before anything gets pushed to Production and you can catch the screwup. This is a MUCH better solution.

Use parameterized class declarations and explicitly pass values you care about

The parameterized class declaration syntax can be dangerous. The difference between the include function and the parameterized class syntax is that the include function is idempotent. You can do the following in a Puppet manifest, and Puppet doesn’t raise an error:

1
2
3
include apache
include apache
include apache

This is because the include function checks to see if the class is in the catalog. If it ISN’T, then it adds it. If it IS, then it exits cleanly. The include function is your pal.

Consider THIS manifest:

1
2
3
class { 'apache': }
include apache
include apache

Does this work? Yep. The parameterized class syntax adds the class to the catalog, the include function detects this and exits cleanly twice. What about THIS manifest:

1
2
3
include apache
class { 'apache': }
include apache

Does THIS work? Nope! Puppet raises a compilation error because a class was declared more than once in a catalog. Why? Well, consider that Puppet is ‘declarative’…all the way up until it isn’t. Puppet’s PARSER reads from the top of the file to the bottom of the file, and we have a single-pass parser when it comes to things like setting variables and declaring classes. When the parser hits the first include function, it adds the class to the catalog. The parameterized class syntax, however, is a honey badger: it doesn’t give a shit. It adds a class to the catalog regardless of whether it already exists or not. So why would we EVER use the parameterized class declaration syntax? We need to use it because the include function doesn’t allow you to pass parameters when you declare a class.

So wait – why did I spend all this time explaining why the parameterized class syntax is more dangerous than the include function ONLY to recommend its use in profiles? For two reasons:

  • We need to use it to pass parameters to classes
  • We’re wrapping its use in a class that we can IN TURN declare with the include function

Yes, we can get the best of BOTH worlds, the ability to pass parameters and the use of our pal the include function, with this wrapper class. We’ll see the latter usage when we come to roles, but for now let’s focus on passing parameter values.

In the first section, we set variables with Hiera lookups, now we can pass those variables to classes we’re declaring with the parameterized class syntax. This allows the declaration of the class to be static, but the parameters we pass to that class to change according to the Hiera hierarchy. We’ve explicitly called the hiera function, so it makes it easier to debug, and we’re explicitly passing parameter values so we know definitively which parameters are being passed (and thus are overriding default values) to the component module. Finally, since our component modules do NOT use Hiera at all, we can be sure that if we’re not passing a parameter that it’s getting its value from the default set in the module’s ::params class.

Everything we do here is meant to make things easier to debug when it’s 3am and things aren’t working. Any asshole can do crazy shit in Puppet, but a seasoned sysadmin writes their code for ease of debugging during 3am pages.

An annoying Puppet bug – top-level class declarations and profiles

Oh, ticket 2053, how terrible are you? This is one of those bug numbers that I can remember by heart (like 8040 and 86). Puppet has the ability to do ‘relative namespacing’, which allows you to declare a variable called $port in a class called $apache and refer to it as $port instead of fully-namespacing the variable, and thus having to call it $apache::port inside the apache class. It’s a shortcut – you can STILL refer to the variable as $apache::port in the class – but it comes in handy. The PROBLEM occurs when you create a profile, as we did above, called profiles::wordpress and you try to declare a class called wordpress. If you do the following inside the profiles::wordpress class, what class is being declared:

1
include wordpress

If you think you’re declaring a wordpress class from within a wordpress module in your Puppet modulepath, you would be wrong. Puppet ACTUALLY thinks you’re trying to declare profiles::wordpress because you’re INSIDE the profiles::wordpress class and it’s doing relative namespacing (i.e. in the same way you refer to $port and ACTUALLY mean $apache::port it thinks you’re referring to wordpress and ACTUALLY mean profiles::wordpress.

Needless to say, this causes LOTS of confusion.

The solution here is to declare a class called ::wordpress which tells Puppet to go to the top-level namespace and look for a module called wordpress which has a top-level class called wordpress. It’s the same reason that we refer to Facter Fact values as $::osfamily instead of $osfamily in class definitions (because you can declare a local variable called $osfamily in your class). This is why in the profile above you see this:

1
2
3
4
5
6
class { '::wordpress':
  install_dir => $wordpress_docroot,
  db_name     => $wordpress_db_name,
  db_host     => $wordpress_db_host,
  db_password => $wordpress_db_password,
}

When you use profiles and roles, you’ll need to do this namespacing trick when declaring classes because you’re frequently going to have a profile::<sometech> that will declare the <sometech> top-level class.

Roles: business-specific wrapper classes

How do you refer to your machines? When I ask you about that cluster over there, do you say “Oh, you mean the machines with java 1.6, apache, mysql, etc…”? I didn’t think so. You usually have names for them, like the “internal compute cluster” or “app builder nodes” or “DMZ repo machines” or whatever. These names are your Roles. Roles are just the mapping of your machine’s names to the technology that should be ON them. In the past we had descriptive hostnames that afforded us a code for what the machine ‘did’ – roles are just that mapping for Puppet.

Roles are namespaced just like profiles, but now it’s up to your organization to fill in the blanks. Some people immediately want to put environments into the roles (a la roles::uat::compute_cluster), but that’s usually not necessary (as MOST LIKELY the compute cluster nodes have the SAME technology on them when they’re in dev versus when they’re in prod, it’s just the DATA – like database names, VIP locations, usernames/passwords, etc – that’s different. Again, these data differences will come from Hiera, so there should be no reason to put the environment name in your role). You still CAN put the environment name in the role if it makes you feel better, but it’ll probably be useless.

Roles ONLY include profiles

So what exactly is in the role wrapper class? That depends on what technology is on the node that defines that role. What I can tell you for CERTAIN is that roles should ONLY use the include function and should ONLY include profiles. What does this give us? This gives us our pal the include function back! You can include the same profile 100 times if you want, and Puppet only puts it in the catalog once.

Every node is classified with one role. Period.

The beautiful thing about roles and profiles is that the GOAL is that you should be able to classify a node with a SINGLE role and THAT’S IT. This makes classification simple and static – the node gets its role, the role includes profiles, profiles call out to Hiera for data, that data is passed to component modules, and away we go. Also, since classification is static, you can use version control to see what changes were introduced to the role (i.e. what profiles were added or removed). In my opinion, if you need to apply more than one role to a node, you’ve introduced a new role (see below).

Roles CAN use inheritance…if you like

I’ve seen people implement roles a couple of different ways, and one of them is to use inheritance to build a catalog. For example, you can define a base roles class that includes something like a base security profile (i.e. something that EVERY node in your infrastructure should have). Moving down the line, you COULD namespace according to function like roles::app for your application server machines. The roles::app class could inherit from the roles class (which gets the base security profile), and could then include the profiles necessary to setup an application server. Next, you could subclass down to roles::app::site_foo for an application server that supports some site in your organization. That class inherits from the roles::app class, and then adds profiles that are specific to that site (maybe they use Jboss instead of Tomcat, and thus that’s where the differentiation occurs). This is great because you don’t have a lot of repeated use of the include function, but it also makes it hard to definitively look at a specific role to see exactly what’s being declared (i.e. all the profiles). You have to weigh what you value more: less typing or greater visibility. I will err on the side of greater visibility (just due to that whole 3am outage thing), but it’s up to you to decide what to optimize for.

A role similar, yet different, from another role is: a new role

EVERYBODY says to me “Gary, I have this machine that’s an AWFUL LOT like this role over here, but…it’s different.” My answer to them is: “Great, that’s another role.” If the thing that’s different is data (i.e. which database to connect to, or what IP address to route traffic through), then that difference should be put in HIERA and the classification should remain the same. If that difference is technology-specific (i.e. this server uses JBoss instead of Tomcat) then first look and see if you can isolate how you know this machine is different (maybe it’s on a different subnet, maybe it’s at a different location, something like that). If you can figure that out and write a Fact for it (or use similar conditional logic to determine this logically), then you can just drop that conditional logic in your role and let it do the heavy lifting. If, in the end, this bit of data is totally arbitrary, then you’ll need to create another role (perhaps a subclass using the above namespacing) and assign it to your node.

The hardest thing about this setup is naming your roles. Why? Every site is different. It’s hard for me to account for differences in your setup because your workplace is dysfunctional (seriously).

Review: what does this get you?

Let’s walk through every level of this setup from the top to the bottom and see what it gets you. Every node is classified to a single role, and, for the most part, that classification isn’t going to change. Now you can take all the extra work off your classifier tool and put it back into the manifests (that are subject to version control, so you can git blame to your heart’s content and see who last changed the role/profile). Each role is going to include one or more profile, which gives us the added idempotent protection of the include function (of course, if profiles have collisions with classes you’ll have to resolve those. Say one or more profiles tries to include an apache class – simply break that component out into a separate profile, extract the parameters from Hiera, and include that profile at a higher level). Each profile is going to do Hiera lookups which should give you the ability to provide different data for different host types (i.e. different data on a per-environment level, or however you lay out your Hiera hierarchy), and that data will be passed directly to class that is declared. Finally, each component module will accept parameters as variables internal to that module, default parameters/variables to sane values in the ::params class, and use those variables when declaring each resource throughtout its classes.

  • Roles abstract profiles
  • Profiles abstract component modules
  • Hiera abstracts configuration data
  • Component modules abstract resources
  • Resources abstract the underlying OS implementation

Choose your level of comfortability

The roles and profiles pattern also buys you something else – the ability for less-skilled and more-skilled Puppet users to work with the same codebase. Let’s say you use some GUI classifier (like the Puppet Enterprise Console), someone who’s less skilled at Puppet looks and sees that a node is classified with a certain role, so they open the role file and see something like this:

1
2
3
include profiles::wordpress
include profiles::tomcat
include profiles::git::repo_server

That’s pretty legible, right? Someone who doesn’t regularly use Puppet can probably make a good guess as to what’s on the machine. Need more information? Open one of the profiles and look specifically at the classes that are being declared. Need to know the data being passed? Jump into Hiera. Need to know more information? Dig into each component module and see what’s going on there.

When you have everything abstracted correctly, you can have developers providing data (like build versions) to Hiera, junior admins grouping nodes for classification, more senior folk updating profiles, and your best Puppet people creating/updating component modules and building plugins like custom facts/functions/whatever.

Great! Now go and refactor…

If you’ve used Puppet for more than a month, you’re probably familiar with the “Oh shit, I should have done it THAT way…let me refactor this” game. I know, it sucks, and we at Puppet Labs haven’t been shy of incorporating something that we feel will help people out (but will also require some refactoring). This pattern, though, has been in use by the Professional Services team at Puppet Labs for over a year without modification. I’ve used this on sites GREAT and small, and every site with which I’ve consulted and implemented this pattern has been able to both understand its power and derive real value within a week. If you’re contemplating a refactor, you can’t go wrong with Roles and Profiles (or whatever names you decide to use).

Building a Functional Puppet Workflow Part 1: Module Structure

Working as a professional services engineer for Puppet Labs, my life consists almost entirely of either correcting some of the worst code atrocities you’ve seen in your life, or helping people get started with Puppet so that they don’t need to call us again due to: A.) Said code atrocities or B.) Refactor the work we JUST helped them start. It wasn’t ALWAYS like this – I can remember some of my earliest gigs, and I almost feel like I should go revisit them if only to correct some of the previous ‘best practices’ that didn’t quite pan out.

This would be exactly why I’m wary of ‘Best Practices’ – because one person’s ‘Best Practice’ is another person’s ‘What the fuck did you just do?!’

Having said that, I’m finding myself repeating a story over and over again when I train/consult, and that’s the story of ‘The Usable Puppet Workflow.’ Everybody wants to know ‘The Right Way™’, and I feel like we finally have a way that survives a reasonable test of time. I’ve been promoting this workflow for over a year (which is a HELL of a long time in Startup time), and I’ve yet to really see an edge case it couldn’t handle.

(If you’re already savvy: yes, this is the Roles and Profiles talk)

I’ll be breaking this workflow down into separate blog posts for every component, and, as always, your comments are welcome…

It all starts with the component module

The first piece of a functional Puppet deployment starts with what we call ‘component modules’. Component modules are the lowest level in your deployment, and are modules that configure specific pieces of technology (like apache, ntp, mysql, and etc…). Component modules are well-encapsulated, have a reasonable API, and focus on doing small, specific things really well (i.e. the *nix way).

I don’t want to write thousands of words on building component modules because I feel like others have done this better than I. As examples, check out RI’s Post on a simple module structure, Puppet Labs’ very own docs on the subject, and even Alessandro’s Puppetconf 2012 session. Instead, I’d like to provide some pointers on what I feel makes a good component module, and some ‘gotchas’ we’ve noticed.

Parameters are your API

In the current world of Puppet, you MUST define the parameters your module will accept in the Puppet DSL. Also, every parameter MUST ultimately have a value when Puppet compiles the catalog (whether by explicitly passing this parameter value when declaring the class, or by assuming a default value). Yes, it’s funny that, when writing a Puppet class, if you typo a VARIABLE Puppet will not alert you to this (in a NON use strict-ian sort of approach) and will happily accept a variable in an undefined state, but the second you don’t pass a value to your class parameter you’re in for a rude compilation error. This is the way of Puppet classes at the time of this writing, so you’re going to see Puppet classes with LINES of defined parameters. I expect this to change in the future (please let this change in the near future), but for now, it’s a necessary evil.

The parameters you expose to your top-level class (i.e. given class names like apache and apache::install, I’m talking specifically about apache) should be treated as an API to your module. IDEALLY, they’re the ONLY THING that a user needs to modify when using your module. Also, whenever possible, it should be the case that a user need ONLY interact with the top-level class when using your module (of course, defined resource types like apache::vhost are used on an ad-hoc basis, and thus are the exception here).

Inherit the ::params class

We’re starting to make enemies at this point. It’s been a convention for modules to use a ::params class to assign values to all variables that are going to be used for all classes inside the module. The idea is that the ::params class is the one-stop-shop to see where a variable is set. Also, to get access to a variable that’s set in a Puppet class, you have to declare the class (i.e. use the include() function or inherit from that class). When you declare a class that has both variables AND resources, those resources get put into the catalog, which means that Puppet ENFORCES THE STATE of those resources. What if you only needed a variable’s value and didn’t want to enforce the rest of the resources in that class? There’s no good way in Puppet to do that. Finally, when you inherit from a class in Puppet that has assigned variable values, you ALSO get access to those variables in the parameter definition section of your class (i.e. the following section of the class:

class apache (
  $port = $apache::params::port,
  $user = $apache::params::user,
) inherits apache::params {

See how I set the default value of $apache::port to $apache::params::port? I could only access the value of the variable $apache::params::port in that section by inheriting from the apache::params class. I couldn’t insert include apache::params below that section and be allowed access to the variable up in the parameter defaults section (due to the way that Puppet parses classes).

FOR THIS REASON, THIS IS THE ONLY RECOMMENDED USAGE OF INHERITANCE IN PUPPET!

We do NOT recommend using inheritance anywhere else in Puppet and for any other reason because there are better ways to achieve what you want to do INSTEAD of using inheritance. Inheritance is a holdover from a scarier, more lawless time.

NOTE: Data in Modules – There’s a ‘Data in Modules’ pattern out there that attempts to eliminate the ::params class. I wrote about it in a previous post, and I recommend you read that post for more info (it’s near the bottom).

Do NOT do Hiera lookups in your component modules!

This is something that’s really only RECENTLY been pushed. When Hiera was released, we quickly recognized that it would be the answer to quite a few problems in Puppet. In the rush to adopt Hiera, many people started adding Hiera calls to their modules, and suddenly you had ‘Hiera-compatible’ modules out there. This caused all kinds of compatibility problems, and it was largely because there wasn’t a better module structure and workflow by which to integrate Hiera. The pattern that I’ll be pushing DOES INDEED use Hiera, BUT it confines all Hiera calls to a higher-level wrapper class we call a ‘profile’. The reasons for NOT using Hiera in your module are:

  • By doing Hiera calls at a higher level, you have a greater visibility on exactly what parameters were set by Hiera and which were set explicitly or by default values.
  • By doing Hiera calls elsewhere, your module is backwards-compatible for those folks who are NOT using Hiera

Remember – your module should just accept a value and use it somewhere. Don’t get TOO smart with your component module – leave the logic for other places.

Keep your component modules generic

We always get asked “How do I know if I’m writing a good module?” We USED to say “Well, does it work?” (and trust me, that was a BIG hurdle). Now, with data separation models out there like Hiera, I have a couple of other questions that I ask (you know, BEYOND asking if it compiles and actually installs the thing it’s supposed to install). The best way I’ve found to determine if your module is ‘generic enough’ is if I asked you TODAY to give me your module, would you give it to me, or would you be worried that there was some company-specific data locked in there? If you have company-specific data in your module, then you need to refactor the module, store the data in Hiera, and make your module more generic/reusable. Also, does your module focus on installing one piece of technology, or are you declaring packages for shared libraries or other components (like gcc, apache, or other common components)? You’re not going to win any prizes for having the biggest, most monolithic module out there. Rather, if your module is that large and that complex, you’re going to have a hell of a time debugging it. Err on the side of making your modules smaller and more task-specific. So what if you end up needing to declare 4 classes where you previously declared 1? In the roles and profiles pattern we will show you in the next blog post, you can abstract that away ANYHOW.

Don’t play the “what if” game

I’ve had more than a couple of gigs where the customer says something along the lines of “What if we need to introduce FreeBSD/Solaris/etc… nodes into our organization, shouldn’t I account for them now?” This leads more than a few people down a path of entirely too-complex modules that become bulky and unwieldy. Yes, your modules should be formatted so that you can simply add another case in your ::params class for another OS’s parameters, and yes, your module should be formatted so that your ::install or ::config class can handle another OS, but if you currently only manage Redhat, and you’ve only EVER managed Redhat, then don’t start adding Debian parameters RIGHT NOW just because you’re afraid you might inherit Ubuntu machines. The goal of Puppet is to automate the tasks that eat up the MAJORITY of your time so you can focus on the edge cases that really demand your time. If you can eventually automate those edge cases, then AWESOME! Until then, don’t spend the majority of your time trying to automate the edge cases only to drown under the weight of deadlines from simple work that you COULD have already automated (but didn’t, because you were so worried about the exceptions)!

Store your modules in version control

This should go without saying, but your modules should be stored in version control (a la git, svn, hg, whatever). We tend to prefer git due to its lightweight branching and merging (most of our tooling and solutions will use git because we’re big git users), but you’re free to use whatever you want. The bigger question is HOW to store your modules in version control. There are usually two schools of thought:

  • One repository per module
  • All modules in a single repository

Each model has its pros and cons, but we tend to recommend one module per repository for the following reasons:

  • Individual repos mean individual module development histories
  • Most VCS solutions don’t have per-folder ACLs for a single repositories; having multiple repos allows per-module security settings.
  • With the one-repository-per-module solution, modules you pull down from the Forge (or Github) must be committed to your repo. Having multiple repositories for each module allow you to keep everything separate

NOTE: This becomes important in the third blog post in the series when we talk about moving changes to each Puppet Environment, but it’s important to introduce it NOW as a ‘best practice’. If you use our recommended module/environment solution, then one-module-per-repo is the best practice. If you DON’T use our solution, then the single repository per for all modules will STILL work, but you’ll have to manage the above issues. Also note that even if you currently have every module in a single repository, you can STILL use our solution in part 3 of the series (you’ll just need to perform a couple of steps to conform).

Best practices are shit

In general, ‘best practices’ are only recommended if they fit into your organizational workflow. The best and worst part of Puppet is that it’s infinitely customizable, so ‘best practices’ will invariably be left wanting for a certain subset of the community. As always, take what I say under consideration; it’s quite possible that I could be entirely full of shit.

Seriously, What Is This Provider Doing?

Clarke’s third law states: “Any sufficiently advanced technology is indistinguishable from magic.” In the case of Ruby and Puppet provider interaction, I’m inclined to believe it. If you want proof, take a look at some of the native Puppet types – no amount of ‘Expecto Patronum’ will free you from the Ruby metaprogramming dementors that hover around lib/puppet/provider/exec-land.

In my first post tackling Puppet types and providers, I introduced the concept of Puppet types and the utility they provide. In the second post, I brought you to the great plain of Puppet providers and introduced the core methods necessary for creating a very basic Puppet provider with a single property (HINT: if you’ve not read either of those posts, or you’ve never dealt with basic types and providers, you might want to stop here and read up a bit on the topics). The problems with a provider like the one created in that post were:

  • puppet resource support wasn’t implemented, so you couldn’t query for existing instances of the type on the system (and their corresponding values)
  • The getter method would be called for EVERY instance of the type on the system, which would mean shelling-out multiple times during a run
  • Ditto for the setter method (if changes to multiple instances of the type were necessary)
  • That type was VERY basic (i.e. ensurable with a single property)

Unfortunately, when most of us have the need of a Puppet type and provider, we usually require multiple properties and reasonably complex system interaction. When it comes to creating both a getter and a setter method for every property (including the potential performance hit that could come from shelling-out many times during a Puppet run), ain’t nobody got time for that. And finally, puppet resource is a REALLY handy tool for querying the current state of your resources on a system. These problems all have solutions, but up until recently there was just one more problem:

Good luck finding documentation for those solutions.

NOTE: The Puppet Types and Providers book written by Nan and Dan is a great resource that provides a bit of a deeper dive than I’ll be doing in this post – DO check it out if you want to know more

Something, something, puppet resource

The puppet resource command (or ralsh, as it used to be known), is a very handy command for querying a system and returning the current state of resources for a specific Puppet type. Try it out if you never have (note that the following is being run on CentOS 6.4):

[root@linux ~]# puppet resource user
user { 'abrt':
  ensure           => 'present',
  gid              => '173',
  home             => '/etc/abrt',
  password         => '!!',
  password_max_age => '-1',
  password_min_age => '-1',
  shell            => '/sbin/nologin',
  uid              => '173',
}
user { 'adm':
  ensure           => 'present',
  comment          => 'adm',
  gid              => '4',
  groups           => ['sys', 'adm'],
  home             => '/var/adm',
  password         => '*',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/sbin/nologin',
  uid              => '3',
}
< ... and more users below ... >

The puppet resource command returns a list of all users on the system and their current property values (note you can only see the password hash if you’re running Puppet with sufficient privileges). You can even query puppet resource for the values of a specific resource:

[root@gary ~]# puppet resource user glarizza
user { 'glarizza':
  ensure           => 'present',
  gid              => '502',
  home             => '/home/glarizza',
  password         => '$1$hsUuCygh$kgLKG5epuRaXHMX5KmxrL1',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/bin/bash',
  uid              => '502',
}

puppet resource seems magical, and you might think that if you create a custom type and sync it to your machine then puppet resource will automatically work for you.

And you would be wrong.

puppet resource will only work if you’ve implemented a special method in your provider called self.instances.

self.instances

The self.instances method is pretty sparsely documented, so let’s go straight to the source…code, that is:

lib/puppet/provider.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  # Returns a list of system resources (entities) this provider may/can manage.
  # This is a query mechanism that lists entities that the provider may manage on a given system. It is
  # is directly used in query services, but is also the foundation for other services; prefetching, and
  # purging.
  #
  # As an example, a package provider lists all installed packages. (In contrast, the File provider does
  # not list all files on the file-system as that would make execution incredibly slow). An implementation
  # of this method should be made if it is possible to quickly (with a single system call) provide all
  # instances.
  #
  # An implementation of this method should only cache the values of properties
  # if they are discovered as part of the process for finding existing resources.
  # Resource properties that require additional commands (than those used to determine existence/identity)
  # should be implemented in their respective getter method. (This is important from a performance perspective;
  # it may be expensive to compute, as well as wasteful as all discovered resources may perhaps not be managed).
  #
  # An implementation may return an empty list (naturally with the effect that it is not possible to query
  # for manageable entities).
  #
  # By implementing this method, it is possible to use the `resources´ resource type to specify purging
  # of all non managed entities.
  #
  # @note The returned instances are instance of some subclass of Provider, not resources.
  # @return [Array<Puppet::Provider>] a list of providers referencing the system entities
  # @abstract this method must be implemented by a subclass and this super method should never be called as it raises an exception.
  # @raise [Puppet::DevError] Error indicating that the method should have been implemented by subclass.
  # @see prefetch
  def self.instances
    raise Puppet::DevError, "Provider #{self.name} has not defined the 'instances' class method"
  end

You’ll find that method around lines 348 – 377 of the lib/puppet/provider.rb file in Puppet’s source code (as of this writing, which is a Friday… on a flight from DC to Seattle). To summarize, implementing self.instances in your provider means that you need to return an array of provider instances that have been discovered on the current system and all the current property values (we call these values the ‘is’ values for the properties, since each value IS the current value of the property on the system). It’s recommended to only implement self.instances if you can gather all resource property values in a reasonably ‘cheap’ manner (i.e. a single system call, read from a single file, or some similar low-IO means). Implementing self.instances not only gives you the ability to run puppet resource (which also affords you a quick-and-dirty way of testing your provider without creating unit tests by simply running puppet resource in debug mode and checking the output), but it also allows the ‘resources’ resource to work its magic (If you’ve never heard of the ‘resources’ resource, check this link for more information on this terribly/awesomely named resource type).

An important note about scope and self.instances

The self.instances method is a method of the PROVIDER, which is why it is prefixed with self. Even though it may be located in the provider file itself, and even though it sits among other methods like create, exists?, and destroy (which are methods of the INSTANCE of the provider), it does NOT have the ability to directly access or call those methods. It DOES have the ability to access other methods of the provider directly (i.e. other methods prefixed with self.). This means that if you were to define a method like:

1
2
3
def self.proxy_type
  'web'
end

You could access that directly from self.instances by simply calling it:

1
type_of_proxy = proxy_type()

Let’s say you had a method of the INSTANCE of the provider, like so:

1
2
3
def system_type
  'OS X'
end

You COULD NOT access this method from self.instances directly (there are always hacky ways around EVERYTHING in Ruby, sure, but there is no easy/straightforward way to access this method).

And here’s where it gets confusing…

Methods of the INSTANCE of the provider CAN access provider methods directly. Given our previous example, what if the system_type method wanted to access self.proxy_type for some reason? It could be done like so:

1
2
3
4
def system_type
  type_of_proxy = self.class.proxy_type()
  'OS X'
end

A method of the instance of the provider can access provider methods by simply calling the class method on itself (which returns the provider object). This is a one-way street for method creation that needs to be heeded when designing your provider.

Building a provider that uses self.instances (or: more Mac problems)

In the previous two posts on types/providers, I created a type and provider for managing bypass domains for network proxies on OS X. For this post, let’s create a provider for actually MANAGING the proxy settings for a given network interface. Here’s a quick type for managing a web proxy on a network interface on OS X:

puppet-mac_proxy/lib/puppet/type/mac_web_proxy.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Puppet::Type.newtype(:mac_web_proxy) do
  desc "Puppet type that models a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
    munge do |value|
      value.downcase
    end
    def insync?(is)
      is.downcase == should.downcase
    end
  end

  newproperty(:proxy_server) do
    desc "Proxy Server setting for the interface"
  end

  newparam(:authenticated_username) do
    desc "Username for proxy authentication"
  end

  newparam(:authenticated_password) do
    desc "Password for proxy authentication"
  end

  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end
end

This type has three properties, is ensurable, and a namevar called ‘name’. As for the provider, let’s start with self.instances and get the web proxy values for all interfaces. To do that we’re going to need to know how to get a list of all network interfaces, and also how to get the current proxy state for every interface. Fortunately, both of those tasks are accomplished with the networksetup binary:

▷ networksetup -listallnetworkservices
An asterisk (*) denotes that a network service is disabled.
Bluetooth DUN
Display Ethernet
Ethernet
FireWire
Wi-Fi
iPhone USB
Bluetooth PAN

▷ networksetup -getwebproxy Ethernet
Enabled: No
Server: proxy.corp.net
Port: 1234
Authenticated Proxy Enabled: 0

Cool, so one binary will do both tasks and they’re REASONABLY low-cost to run.

Helper methods

To keep things separated and easier to test, let’s create separate helper methods for each task. Since these methods are going to be called by self.instances, they will be provider methods.

The first method will simply return an array of network interfaces:

1
2
3
4
5
def self.get_list_of_interfaces
  interfaces = networksetup('-listallnetworkservices').split("\n")
  interfaces.shift
  interfaces.sort
end

Remember from above that the networksetup -listallnetworkservices command returns an info line before each interface, so this code strips that line off and returns a sorted list of interfaces based on a one-line-per-interface assumption.

The next method we need will accept a network interface name as an argument, will run the networksetup -getwebproxy (interface) command, and will use its output to return all the current property values (including the ensure value) for every instance of the type on the system (i.e. every interface’s proxy settings and whether the proxy is enabled, which means the resource is ensured as ‘present’, or disabled, which means the resource is ensured as ‘absent’.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def self.get_proxy_properties(int)
  interface_properties = {}

  begin
    output = networksetup(['-getwebproxy', int])
  rescue Puppet::ExecutionFailure => e
    raise Puppet::Error, "#mac_web_proxy tried to run `networksetup -getwebproxy #{int}` and the command returned non-zero. Failing here..."
  end

  output_array = output.split("\n")
  output_array.each do |line|
    line_values = line.split(':')
    line_values.last.strip!
    case line_values.first
    when 'Enabled'
      interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
    when 'Server'
      interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
    when 'Port'
      interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
    when 'Authenticated Proxy Enabled'
      interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
    end
  end

  interface_properties[:provider] = :ruby
  interface_properties[:name]     = int.downcase
  interface_properties
end

A couple of notes on the method itself – first, the networksetup command must exit zero on success or non-zero on failure (which it does). If ever the networksetup command were to return non-zero, we’re raising our own Puppet::Error, documenting what happened, and bailing out.

This method is going to return a hash of properties and values that is going to be used by self.instances – so the case statement needs to account for that. HOWEVER you populate that hash is up to you (in my case, I’m checking for specific output that networksetup returns), but make sure that the hash has a value for the :ensure key at the VERY least.

Assembling self.instances

Once the helper provider methods have been defined, self.instances becomes reasonably simple:

1
2
3
4
5
6
def self.instances
  get_list_of_interfaces.collect do |int|
    proxy_properties = get_proxy_properties(int)
    new(proxy_properties)
  end
end

Remember that self.instances must return an array of provider instances, and each one of these instances must include the namevar and ensure value at the very least. Since self.get_proxy_properties returns a hash containing all the property ‘is’ values for a resource, declaring a new provider instance is as easy as calling the new() method on the return value of self.get_proxy_properties for every network interface. In the end, the return value of the collect method on get_list_of_interfaces will be an array of provider instances.

Existance, @property_hash, and more magical methods

Even though we have assembled a functional self.instances method, we don’t have complete implementation that will work with puppet resource. The problem is that Puppet can’t yet determine the existance of a resource (even though the resource’s ensure value has been set by self.instances). If you were to execute the code with puppet resource mac_web_proxy, you would get the error:

Error: Could not run: No ability to determine if mac_web_proxy exists

To satisfy Puppet, we need to implement an exists?() method for the instance of the provider. Fortunately, we don’t need to re-implement any existing logic and can instead use @property_hash

A @property_hash is born…

I’ve omitted one last thing that is borne out of self.instances, and that’s the @property_hash instance variable. @property_hash is populated by self.instances as an instance variable that’s available to methods of the INSTANCE of the provider (i.e. methods that ARE NOT prefixed with self.) containing all the ‘is’ values for a resource. Do you need to get the ‘is’ value for a property? Just use @property_hash[:property_name]. Since the exists? method is a method of the instance of the provider, and it’s essentially the same thing as the ensure value for a resource, let’s implement exists? by doing a check on the ensure value from the @property_hash variable:

1
2
3
def exists?
  @property_hash[:ensure] == :present
end

Perfect, now exists? will return true or false accordingly and Puppet will be satisfied.

Getter methods – the slow way

Puppet may be happy that you have an exists? method, but puppet resource won’t successfully run until you have a method that returns an ‘is’ value for every property of the type (i.e. the proxy_server, proxy_authenticated, and proxy_port attributes for the mac_web_proxy type). These ‘is value methods’ are called ‘getter’ methods: they’re methods of the instance of the provider, and are named exactly the same as the properties they represent.

You SHOULD be thinking: “Hey, we already have @property_hash, why can’t we just use it again? We can, and you COULD implement all the getter methods like so:

1
2
3
def proxy_server
  @property_hash[:proxy_server]
end

If you did that, you would be TECHNICALLY correct, but it would seem to be a waste of lines in a provider (especially if you have many properties).

Getter methods – the quicker ‘method’

Because uncle Luke hated excess lines of code, he made available a method called mk_resource_methods which works very similarly to Ruby’s attr_accessor method. Adding mk_resource_methods to your provider will AUTOMATICALLY create getter methods that pull values out of @property_hash in the similar way that I just demonstrated (it will also create SETTER methods too, but we’ll look at those later). Long story short – don’t make getter/setter methods if you’re using self.instances – just implement mk_resource_methods.

JUST enough for puppet resource

Putting everything that we’ve learned up until now, we should have a provider that looks like this:

lib/puppet/provider/mac_web_proxy/ruby.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Puppet::Type.type(:mac_web_proxy).provide(:ruby) do
  commands :networksetup => 'networksetup'

  mk_resource_methods

  def self.get_list_of_interfaces
    interfaces = networksetup('-listallnetworkservices').split("\n")
    interfaces.shift
    interfaces.sort
  end

  def self.get_proxy_properties(int)
    interface_properties = {}

    begin
      output = networksetup(['-getwebproxy', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug "#get_proxy_properties had an error -> #{e.inspect}"
      return {}
    end

    output_array = output.split("\n")
    output_array.each do |line|
      line_values = line.split(':')
      line_values.last.strip!
      case line_values.first
      when 'Enabled'
        interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
      when 'Server'
        interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
      when 'Port'
        interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
      when 'Authenticated Proxy Enabled'
        interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
      end
    end

    interface_properties[:provider] = :ruby
    interface_properties[:name]     = int.downcase
    Puppet.debug "Interface properties: #{interface_properties.inspect}"
    interface_properties
  end

  def self.instances
    get_list_of_interfaces.collect do |int|
      proxy_properties = get_proxy_properties(int)
      new(proxy_properties)
    end
  end

  def exists?
    @property_hash[:ensure] == :present
  end
end

Here’s a tree of the module I’ve assembled on my machine:

└(~/src/puppet-mac_web_proxy)▷ tree .
.
└── lib
   └── puppet
       ├── provider
       │   └── mac_web_proxy
       │       └── ruby.rb
       └── type
           └── mac_web_proxy.rb

To test out puppet resource, we need to make Puppet aware of our new custom module. To do that, let’s set the $RUBYLIB environmental variable. $RUBYLIB is queried by Puppet and is added to its load path when looking for additional Puppet plugins. You will need to set $RUBYLIB to the path of the lib directory in the custom module that you’ve assembled. Because my custom module is located in ~/src/puppet-mac_web_proxy, I’m going to set $RUBYLIB like so:

export RUBYLIB=~/src/puppet-mac_web_proxy/lib

You can execute that command from the command line, or set it in your ~/.{bash,zsh}rc and source that file.

Finally, with all the files in place and $RUBYLIB set, it’s time to officially run puppet resource (I’m going to do it in --debug mode to see the debug output that I’ve written into the code):

└(~/src/blogtests)▷ envpuppet puppet resource mac_web_proxy --debug
Debug: Executing '/usr/sbin/networksetup -listallnetworkservices'
Debug: Executing '/usr/sbin/networksetup -getwebproxy Bluetooth DUN'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"bluetooth dun"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Bluetooth PAN'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"bluetooth pan"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Display Ethernet'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>"foo.bar.baz", :proxy_port=>"80", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"display ethernet"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Ethernet'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>"proxy.corp.net", :proxy_port=>"1234", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"ethernet"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy FireWire'
Debug: Interface properties: {:ensure=>:present, :proxy_server=>"stuff.bar.blat", :proxy_port=>"8190", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"firewire"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Wi-Fi'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"wi-fi"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy iPhone USB'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"iphone usb"}
mac_web_proxy { 'bluetooth dun':
  ensure => 'absent',
}
mac_web_proxy { 'bluetooth pan':
  ensure => 'absent',
}
mac_web_proxy { 'display ethernet':
  ensure => 'absent',
}
mac_web_proxy { 'ethernet':
  ensure => 'absent',
}
mac_web_proxy { 'firewire':
  ensure       => 'present',
  proxy_port   => '8190',
  proxy_server => 'stuff.bar.blat',
}
mac_web_proxy { 'iphone usb':
  ensure => 'absent',
}
mac_web_proxy { 'wi-fi':
  ensure => 'absent',
}

Note that you will only see ‘is’ values if you have a proxy set on any of your network interfaces (obviously, if you’ve not setup a proxy, then it will show as ‘absent’ on every interface. You can setup a proxy by opening System Preferences, clicking on the Network icon, choosing an interface from the list on the left, clicking the Advanced button in the lower right corner of the window, clicking the ‘Proxies” tab at the top of the window, clicking the checkbox next to the “Web Proxy (HTTP)” choice, and entering a proxy URL and port. NOW do you get why we automate this bullshit?). Also, your list of network interfaces may not match mine if you have more or less interfaces than I do.

TADA! puppet resource WORKS! ISN’T THAT AWESOME?! WHY AM I TYPING IN CAPS?!

Prefetching, flushing, caching, and other hard shit

Okay, so up until now we’ve implemented one half of the equation – we can query ‘is’ values and puppet resource works. What about using this ‘more efficient’ method of getting values for a type on the OTHER end of the spectrum? What if instead of calling setter methods one-by-one to set values for all resources of a type in a catalog we had a way to do it all at once? Well, such a way exists, and it’s called the flush method…but we’re getting slightly ahead of ourselves. Before we get to flushing, we need to point out that self.instances is ONLY used by puppet resource – THAT’S IT (and it’s only used by self.instances when you GET values from the system, not when you SET values on the system…and if you never knew that puppet resource could actually SET values on the system, well, I guess you got another surprise today). If we want puppet agent or puppet apply to use the behavior that self.instances implements, we need to create another method: self.prefetch

self.prefetch

If you thought self.instances didn’t have much documentation, wait until you see self.prefetch. After wading the waters of self.prefetch, I’m PRETTY SURE its implementation might have come to uncle Luke after a long night in Reed’s chem lab where he might have accidently synthesized mescaline.

Let’s look at the codebase:

lib/puppet/provider.rb
1
2
3
4
5
6
7
8
9
# @comment Document prefetch here as it does not exist anywhere else (called from transaction if implemented)
# @!method self.prefetch(resource_hash)
# @abstract A subclass may implement this - it is not implemented in the Provider class
# This method may be implemented by a provider in order to pre-fetch resource properties.
# If implemented it should set the provider instance of the managed resources to a provider with the
# fetched state (i.e. what is returned from the {instances} method).
# @param resources_hash [Hash<{String => Puppet::Resource}>] map from name to resource of resources to prefetch
# @return [void]
# @api public

That’s right, documentation for self.prefetch in the Puppet codebase is 9 lines of comments in lib/puppet/provider.rb, which is awesome. So when is self.prefetch used to provide information to Puppet and when is self.instances used?

Puppet Subcommand Provider Method Execution Mode
puppet resource self.instances getting values
puppet resource self.prefetch setting values
puppet agent self.prefetch getting values
puppet agent self.prefetch setting values
puppet apply self.prefetch getting values
puppet apply self.prefetch setting values

.

This doesn’t mean that self.instances is really only handy for puppet resource – that’s definitely not the case. In fact, frequently you will find that self.instances is used by self.prefetch to do some of the heavy lifting. Even though self.prefetch works VERY SIMILARLY to the way that self.instances works for puppet resource (and by that I mean that it’s going to gather a list of instances of a type on the system, and it’s also going to populate @property_hash for puppet apply, puppet agent, and when when puppet resource is setting values), it’s not an exact one-for-one match with self.instances. The self.prefetch method for a type is called once per run when Puppet encounters a resource of that type in the catalog. The argument to self.prefetch is a hash of all managed resources of that type that are encountered in a compiled catalog for that node (the hash’s key will be the namevar of the resource, and the value will be an instance of Puppet::Type – in this case, Puppet::Type::Mac_web_proxy). Your task is to implement a self.prefetch method that gets an array of instances of the provider that are discovered on the system, iterates through the hash passed to self.prefetch (containing all the resources of the type that were discovered in the catalog), and passes the correct instance of the provider that was discovered on the system to the provider= method of the correct instance of the type that was discovered in the catalog.

What the actual fuck?!

Okay, let’s break that apart to try and discover exactly what’s going on here. Assume that I’ve setup a proxy for the ‘FireWire’ interface on my laptop, and I want to try and manage that resource with puppet apply (i.e. something that uses self.prefetch). The resource in the manifest used to manage the proxy will look something like this:

1
2
3
4
5
mac_web_proxy { 'firewire':
  ensure       => 'present',
  proxy_port   => '8080',
  proxy_server => 'proxy.server.org',
}

When self.prefetch is called by Puppet, it’s going to be passed a hash looking something like this:

1
{ "firewire" => Mac_web_proxy[firewire] }

Because only one resource is encountered in the catalog, only one key/value pair shows up in the hash that’s passed as the argument to self.prefetch.

The job of self.prefetch is to find the current state of Mac_web_proxy['firewire'] on the system, create a new instance of the mac_web_proxy provider that contains the ‘is’ values for the Mac_web_proxy['firewire'] resource, and assign this provider instance as the value of the provider= method to the instance of the mac_web_proxy TYPE that is the VALUE of the ‘firewire’ key of the hash that’s passed to self.prefetch.

No, really, that’s what it’s supposed to do. I’m not even sure what’s real anymore

You’ll remember that self.instances gives us an array of resources that were discovered on the system, so we have THAT part of the implementation written. We also have the hash of resources that were encountered in the catalog – so we have THAT part done too. Our only job is to connect the dots (la la la la), programmatically speaking. This should just about do it:

1
2
3
4
5
6
7
def self.prefetch(resources)
  instances.each do |prov|
    if resource = resources[prov.name]
      resource.provider = prov
    end
  end
end

I want to make a confession right now – I’ve only ever copied and pasted this code into every provider I’ve ever written that needed self.prefetch implemented. It wasn’t until someone actually asked me what it DID that I had to walk the path of figuring out EXACTLY what it did. Based on the last couple of paragraphs – can you blame me?

This code iterates through the array of resources returned by self.instances, tries to assign a variable resource based on referencing a key in the resources hash using the name of the resource (remember, resources is a hash containing all resources in the catalog), and, if this assignment works (i.e. it isn’t nil, which is what happens when you reference a key in a Ruby hash that doesn’t exist), then we’re calling the provider= method on the instance of the type that was referenced in the resources hash, and passing it the resource that was discovered on the system by self.instances.

Wow.

Why DID we do all of that? We did it all for the @provider_hash. Doing this will populate @provider_hash in all methods of the instance of the provider (i.e. exists?, create, destroy, etc..) just like self.instances did for puppet resource.

Flush it; Ship it

As I alluded to above, the opposite side of the coin to prefetching (which is a way to query the state for all resources at once) is flushing (or specifically the flush method). The flush method is called once per resource whenever the ‘is’ and ‘should’ values for a property differ (and synchronization needs to occur). The flush method does not take the place of property setter methods, but, rather, is used in conjunction with them to determine how to synchronize resource property values. In this vein, it’s a single trigger that can be used to set all property values for an individual resource simultaneously.

There are a couple of strategies for implementing flush, but one of the more popular ones in use is to create an instance variable that will hold values to be synchronized, and then determine inside flush how best to make as-few-as-possible calls to the system to synchronize all the property values for an individual resource.

Our resource type is unique because the networksetup binary that we’ll be using to synchronize values allows us to set most every property value with a single command. Because of this, we really only need that instance variable for one property – the ensure value. But let’s start with the initialization of that instance variable for the flush method:

1
2
3
4
def initialize(value={})
  super(value)
  @property_flush = {}
end

The initialize method is magic to Ruby – it’s invoked when you instantiate a new object. In our case, we want to create a new instance variable – @property_flush – that will be available to all methods of the instance of the provider. This instance variable will be a hash and will contain all the ‘should’ values that will need to be synchronized for a resource. The super method in Ruby sends a message to the parent of the current object, asking it to invoke a method of the same name (e.g. intialize). Basically, the initialize method is doing the exact same thing as it has always done with one exception – making the instance variable available to all methods of the instance of the provider.

The only ‘setter’ method you need

This provider is going to be unique not only because the networksetup binary will set values for ALL properties, but because to change/set ANY property values you have to change/set ALL the property values at the same time. Typically, you’ll see providers that will need to pass arguments to a binary in order to set individual values. For example, if you had a binary fooset that took arguments of --bar and --baz to set values respectively for bar and baz properties of a resource, you might see the following setter and flush methods for bar and baz:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def bar=(value)
  @property_flush[:bar] = value
end

def baz=(value)
  @property_flush[:baz] = value
end

def flush
  array_arguments = []
  if @property_flush
    array_arguments << '--bar' << @property_flush[:bar] if @property_flush[:bar]
    array_arguments << '--baz' << @property_flush[:baz] if @property_flush[:baz]
  end
  if ! array_arguments.empty?
    fooset(array_arguments, resource[:name])
  end
end

That’s not the case for networksetup – in fact, one of the ONLY places in our code where we’re going to throw a value inside @property_flush is going to be in the destroy method. If our intention is to ensure a proxy absent (or, in this case, disable the proxy for a network interface), then we can short-circuit the method we’re going to create to set proxy values by simply checking for a value in @property_flush[:ensure]. Here’s what the destroy method looks like:

1
2
3
def destroy
  @property_flush[:ensure] = :absent
end

Next, we need a method that will set values for our proxy. This method will handle all interaction to networksetup. So, how do you set proxy values with networksetup?

networksetup -setwebproxy <networkservice> <domain> <port number> <authenticated> <username> <password>

The three properties to our mac_web_proxy type are proxy_port, proxy_server, and proxy_authenticated which map to the ‘<port number>’, ‘<domain>’, and ‘<authenticated>’ values in this command. To change any of these values means we have to pass ALL of these values (again, which is why our flush implementation may be unique from other flush implementations). Here’s what the set_proxy method looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def set_proxy
  if @property_flush[:ensure] == :absent
      networksetup(['-setwebproxystate', resource[:name], 'off'])
      return
  end

  if (resource[:proxy_server].nil? or resource[:proxy_port].nil?)
    raise Puppet::Error, "Proxy types other than 'auto' require both a proxy_server and proxy_port setting"
  end
  if resource[:proxy_authenticated] != :true
    networksetup(
      [
        '-setwebproxy',
        resource[:name],
        resource[:proxy_server],
        resource[:proxy_port]
      ]
    )
  else
    networksetup(
      [
        '-setwebproxy',
        resource[:name],
        resource[:proxy_server],
        resource[:proxy_port],
        'on',
        resource[:authenticated_username],
        resource[:authenticated_password]
      ]
    )
  end
  networksetup(['-setwebproxystate', resource[:name], 'on'])
end

This helper method does all the validation checks for required properties, executes the correct command, and enables the proxy. Now, let’s implement flush:

1
2
3
4
5
6
7
def flush
  set_proxy

  # Collect the resources again once they've been changed (that way `puppet
  # resource` will show the correct values after changes have been made).
  @property_hash = self.class.get_proxy_properties(resource[:name])
end

The last line re-populates @property_hash with the current resource values, and is necessary for puppet resource to return correct values after it makes a change to a resource during a run.

The final method

We’ve implemented logic to query the state of all resources, to prefetch those states, to make changes to all properties at once, and to destroy a resource if it exists, but we’ve yet to implement logic to CREATE a resource if it doesn’t exist and it should. Well, this is a bit of a lie – the logic is in the code, but we don’t have a create method, so Puppet’s going to complain:

1
2
3
def create
  @property_flush[:ensure] = :present
end

Technically, this method doesn’t have to do a DAMN thing. Why? Remember how the flush method is triggered when a resource’s ‘is’ values differ from its ‘should’ values? Also, remember how the flush method only calls the set_proxy method? And, finally, remember how set_proxy only checks if @property_flush[:ensure] == :absent (and if it doesn’t, then it goes about its merry way running networksetup)? Right, well add these things up and you’ll realize that the create method is essentially meaningless based on our implementation (but if you OMIT create, then Puppet’s going to throw a shit-fit in the shape of of a Puppet::Error exception):

Error: /Mac_web_proxy[firewire]/ensure: change from absent to present failed: Could not set 'present' on ensure: undefined method `create' for Mac_web_proxy[firewire]:Puppet::Type::Mac_web_proxy

So make Puppet happy and write the goddamn create method, okay?

The complete provider:

Wow, that was a wild ride, huh? If you’ve been coding along, you should have created a file that looks something like this:

lib/puppet/provider/mac_web_proxy/ruby.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
Puppet::Type.type(:mac_web_proxy).provide(:ruby) do
  commands :networksetup => 'networksetup'

  mk_resource_methods

  def initialize(value={})
    super(value)
    @property_flush = {}
  end

  def self.get_list_of_interfaces
    interfaces = networksetup('-listallnetworkservices').split("\n")
    interfaces.shift
    interfaces.sort
  end

  def self.get_proxy_properties(int)
    interface_properties = {}

    begin
      output = networksetup(['-getwebproxy', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug "#get_proxy_properties had an error -> #{e.inspect}"
      return {}
    end

    output_array = output.split("\n")
    output_array.each do |line|
      line_values = line.split(':')
      line_values.last.strip!
      case line_values.first
      when 'Enabled'
        interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
      when 'Server'
        interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
      when 'Port'
        interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
      when 'Authenticated Proxy Enabled'
        interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
      end
    end

    interface_properties[:provider] = :ruby
    interface_properties[:name]     = int.downcase
    Puppet.debug "Interface properties: #{interface_properties.inspect}"
    interface_properties
  end

  def self.instances
    get_list_of_interfaces.collect do |int|
      proxy_properties = get_proxy_properties(int)
      new(proxy_properties)
    end
  end

  def create
    @property_flush[:ensure] = :present
  end

  def exists?
    @property_hash[:ensure] == :present
  end

  def destroy
    @property_flush[:ensure] = :absent
  end

  def self.prefetch(resources)
    instances.each do |prov|
      if resource = resources[prov.name]
        resource.provider = prov
      end
    end
  end

  def set_proxy
    if @property_flush[:ensure] == :absent
        networksetup(['-setwebproxystate', resource[:name], 'off'])
        return
    end

    if (resource[:proxy_server].nil? or resource[:proxy_port].nil?)
      raise Puppet::Error, "Both the proxy_server and proxy_port parameters require a value."
    end
    if resource[:proxy_authenticated] != :true
      networksetup(
        [
          '-setwebproxy',
          resource[:name],
          resource[:proxy_server],
          resource[:proxy_port]
        ]
      )
    else
      networksetup(
        [
          '-setwebproxy',
          resource[:name],
          resource[:proxy_server],
          resource[:proxy_port],
          'on',
          resource[:authenticated_username],
          resource[:authenticated_password]
        ]
      )
    end
    networksetup(['-setwebproxystate', resource[:name], 'on'])
  end

  def flush
    set_proxy

    # Collect the resources again once they've been changed (that way `puppet
    # resource` will show the correct values after changes have been made).
    @property_hash = self.class.get_proxy_properties(resource[:name])
  end
end

Undoubtedly there are better ways to write this Ruby code, no? Also, I’m SURE I have some errors/bugs in that code. It’s those things that keep me in a job…

Final Thoughts

So, I write these posts not to belittle or mock anyone who works on Puppet or wrote any of its implementation (except the amazing/terrifying bastard who came up with self.prefetch). Anybody who contributes to open source and who builds a tool to save some time for a bunch of sysadmins is fucking awesome in my book.

No, I write these posts so that you can understand the ‘WHY’ piece of the puzzle. If you fuck up the ‘HOW’ of the code, you can spend some time in Google and IRB to figure it out, but if you don’t understand the ‘WHY’ then you’re probably not going to even bother.

Also, selfishly, I move from project to project so quickly that it’s REALLY easy to forget both why AND how I did what I did. Posts like these give me someplace to point people when they ask me “What’s self.prefetch?” that ISN’T just the source code or a liquor store.

This isn’t the last post in the series, by the way. I haven’t even TOUCHED on writing unit tests for this code, so that’s going to be a WHOLE other piece altogether. Also, while this provider manages a WEB proxy for a network interface, understand that there are MANY MORE kinds of proxies for OS X network interfaces (including socks and gopher!). A future post will show you how to refactor the above into a parent provider that can be inherited to allow for code re-use among all the proxy providers that I need to create.

As always, you’re more than welcome to comment, ask questions, or simply bitch at me both on this blog as well as on Twitter: @glarizza. Hopefully this post helped you out and you learned a little bit more about how Puppet providers do their dirty work…

Namaste, bitches.