Shit Gary Says

...things I don't want to forget

Some Other Beginning’s End

My first day at Puppet, James Turnbull sat me down next to Jeff McCune and waited for things to happen. That was partly because he knew that Jeff’s attention to detail would offset my attention to git commit -a -m 'blkajdfs', and partly because there were about 30 of us in the office and we all had shit to do. Jeff was kinda working on an MCollective module back then, and I had been using it pretty heavily at the school, so I decided to jump right in and start hacking it up. To make a long story short, that experience working on the MCollective module resulted in:

  1. Ticket number 8040 on class containment being filed by Jeff (AKA “How the Anchor pattern was born”)
  2. An understanding in how to use git as something more than just glorified rsync
  3. An “Odd Couple” friendship with a guy who has “right foot” and “left foot” socks

Over the next 4 years I would keep going back to Jeff whenever something new puzzled me, and he would keep giving me pointers that directed me down the right path. I learned about Pry from Jeff when I was working on the directoryservice provider and couldn’t figure out why my variables had no value, my understanding on the principles of unit testing came from completely screwing up spec tests, and my blog posts on type/provider development never would have happened if I didn’t make all those mistakes and have someone help me learn from them. So when I hit that point at Puppet where I began thinking about moving on to “the next big thing,” Jeff was a natural choice.

With that in mind, I’m happy to announce that as of September 7th, 2017 I’ll be joining Jeff at openinfrastructure.co where we’ll be available to consult on everything from DevOps practices, Puppet deployments/module development/etc, and “how you turn ordinary socks into ‘right foot socks’ and ‘left foot socks.’” (I’m MOSTLY kidding on the last bit, but that’s not my area of expertise, soooo…..)

It’s been 6.5 years of consulting with Puppet Inc. and I’m not planning on stopping anytime soon. I’m grateful for all the opportunities and experiences that have come my way, and I’m looking forward to going back to a smaller work environment and more freedom to choose those opportunities! If you have one of those opportunities and are looking for someone to help you out, please look us up at http://www.openinfrastructure.co and let us know about it!

Profiles and the Path to Hiera Data

This blog post first appeared on the Puppet blog as Hiera, data, and Puppet code: your path to the right data decisions, and is published here with permission of the Puppet blog editor.

The subject that generates the most questions for me from the Puppet community is Hiera. Not only do people want to know what it is (a data lookup tool) and how to spell it (I before E except after C), but even saying the word causes problems (It’s HIGH-rah — two syllables, with the accent on the first).

I get so many questions about Hiera because it’s a tool for storing and accessing site-specific data, and it’s actually this problem of accessing data within reusable code that most people are trying to solve. Many people think the ONLY place data can live is within Hiera, but that’s not always the case (as we will see later with profiles). To help with these problems, I’ve identified all the ways that data can be expressed within Puppet, listed the pros and cons of each, and made recommendations as to when each method should be used.

For those people who are visual learners, here’s a simplified flow chart below, detailing the choices you need to make when deciding how to express your configuration data.

What is data and what is code?

This issue of what constitutes data is the first wrinkle in devising what I call a data escalation path. For background reading, the Puppet docs page on roles and profiles does a great job of describing the difference between a component module and a profile.

To quickly summarize: A component module is a general-purpose module designed to model the configuration of a piece of technology (e.g., Apache, Tomcat or ntpd), and a profile is an organization-specific Puppet module that describes an organization’s implementation of a piece of technology. (We also use the term “site-specific” to refer to an organization’s own particular data.)

For example, an Apache profile that an organization creates for itself might use the official Puppet Apache module to install and configure Apache. But the profile might also contain resources for an organization’s SSL certificates or credentials, layered on top of the configuration provided by the Puppet Apache module. The resource(s) modeling the SSL certificate(s) are necessary only for that particular organization, which is why they don’t show up in the official Puppet Apache module.

In this example, the official Puppet Apache module itself represents the code, or the generic and reusable aspect of the configuration (as any good component module would). The profile contains the organizational (or site-specific) data that is fed to the component module (or code) when that module is used. This separation — and the fact that data can be represented within the same constructs used to represent code — is frequently a source of confusion or frustration for new Puppet users (and especially for users with a background in object-oriented programming, which is almost antithetical to the declarative approach that is core to Puppet).

Data within a profile can come in different forms:

  • A variable assigned within the profile.
  • A Hiera/data lookup (done explicitly, or by way of the automatic parameter lookup).
  • A parameter’s value when the profile is declared.

With the above items all considered to be data, which option do you choose? It’s this question that the data escalation path will answer.

NOTE: This post specifically covers data escalation paths within profiles, and NOT within component modules. Unless explicitly noted, assume that recommendations apply ONLY to profiles, and not component modules (since profiles represent site-specific data).

Why an escalation path?

The decisions you make when writing Puppet manifests will seldom be plain and obvious. Instead of focusing on whether something is in the right place, it’s better to think about the ways that complexity can help you solve problems.

You can absolutely put everything you would consider data inside Hiera, and that would immediately provide you a way to handle most use cases. But the legibility of your Puppet manifest suffers when you have to jump back to Hiera every time you need to retrieve or debug a data value (which is a very labor-intensive thing to do if you don’t have direct access to the Puppet masters). Plus, things like resource dependencies are particularly hard to model in Hiera data, as opposed to using resource declarations within a class.

For simpler use cases, putting data into Hiera isn’t necessary. But once you reach a certain level of complexity, Hiera becomes extremely useful. I’m going to define those “certain levels of complexity” explicitly here, as well as both the pros and the cons for each method of expressing data within your profiles.

Hardcoding variables

The term “hardcoding” is wrapped in quotes here because traditionally the term has negative connotations. When I refer to hardcoding, I’m talking about directly editing an item within a Puppet manifest, without assigning a variable. In the example below, if you opened up the Puppet manifest and changed the owner from ‘root’ to ‘puppet’, that would be considered hardcoding the value:

1
2
3
4
5
6
7
file { '/etc/puppetlabs/puppet/puppet.conf':
  ensure => file,
  owner  => 'root',
  group  => 'root',
  mode   => '0644',
  source => 'puppet:///modules/mymodule/puppet.conf',
}

Hardcoding has a negative connotation because typically, when someone would hardcode a value in a script, it represented a workaround where a data item is injected into the code — and mixing data and code means that your code is no longer as generic and extensible as it once was.

That concern is still valid for Puppet: If you open up the official Puppet Apache module and change or add a site-specific value within that component module, then you ARE, in fact, mixing data with code. If instead you edit the Apache profile for your organization and change a value in that profile, then you’re changing site-specific data in something that is already considered site-specific. The difference is that the official Puppet Apache module is designed to be extensible, and used where necessary, while the profile is meant to be used only by your own organization (or site, or group).

Hardcoding a value is the easiest method to understand: Something that was previously set to one value is now set to another value. It’s also the easiest change to implement — you simply change the value and move along. If done correctly, someone could change the value without needing to understand the Puppet DSL (domain specific language — i.e. the rules governing Puppet code in a Puppet manifest). Finally, because it’s simply text, a hardcoded value cannot be overridden, and the value is exactly the same for all nodes.

Pros

  • The easiest technique to understand: Something was changed from one value to another.
  • The easiest change to implement.

Cons

  • If you hardcode the same value in multiple places, then changing that value requires multiple individual changes.

Recommendations

You should hardcode a value when:

  • The value applies to EVERY NODE being managed by Puppet.
  • The value occurs once. If it occurs more than once within a manifest, use a variable instead.

Assigning a variable

The next logical step after hardcoding a value is to assign a variable within a Puppet manifest. Assigning a variable is useful when a value is going to be used in more than one place within a manifest. Because variables within the Puppet DSL cannot be reassigned, and because variables within a manifest cannot be assigned or changed by Hiera, variables are considered private to the implementation. This means they can be changed only by users with permission to change Puppet manifests, not by people who are responsible for using the console to pass data to the code written by manifest authors. So variables really assist writers of Puppet code more than they assist consumers of Puppet code.

Anytime there’s a data value that will be expressed more than once within a Puppet manifest, it’s recommended that you use a variable. In the future, if that value needs to be changed, all you need to do is change the variable’s value, and it will be updated wherever the variable was used. Below is an example of that concept in action:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$confdir = '/etc/puppetlabs/puppet'

file { "${confdir}/puppet.conf":
  ensure => file,
  owner  => 'root',
  group  => 'root',
  mode   => '0644',
  source => 'puppet:///modules/mymodule/puppet.conf',
}

file { "${confdir}/puppetdb.conf":
  ensure => file,
  owner  => 'root',
  group  => 'root',
  mode   => '0644',
  source => 'puppet:///modules/mymodule/puppetdb.conf',
}

Pros

  • Assigning a variable provides a single point within a manifest where data can be assigned or changed.
  • Assigning a variable within the DSL makes it visible to anyone reviewing the Puppet manifest. This means you don’t need to flip back and forth between Hiera and Puppet to look up data values.

Cons

  • The value applies to EVERYONE — it must be changed if a different value is desired, and that change applies to everyone.
  • No ability to override a value.

Recommendations

You should assign a variable when:

  • The data value shows up more than once within a manifest.
  • The data value applies to EVERY node.

Conditionally assigning a variable

In the previous section on assigning a variable, I recommend that variables be used only when their value applies to EVERY node. But there is a way to work around this: conditional statements.

Conditional statements in the Puppet DSL (such as if, unless, case, and the selector operator) allow you to assign a variable once, but assign it differently based on a specific condition. Using the previous example of Puppet’s configuration directory, let’s see how that would be assigned differently, based on the system’s kernel fact:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$confdir = $facts['kernel'] ? {
  'windows' => 'C:\\ProgramData\\PuppetLabs\\puppet\\etc',
  default   => '/etc/puppetlabs/puppet',
}

file { "${confdir}/puppet.conf":
  ensure => file,
  owner  => 'root',
  group  => 'root',
  mode   => '0644',
  source => 'puppet:///modules/mymodule/puppet.conf',
}

file { "${confdir}/puppetdb.conf":
  ensure => file,
  owner  => 'root',
  group  => 'root',
  mode   => '0644',
  source => 'puppet:///modules/mymodule/puppetdb.conf',
}

Conditionally assigning a variable has its own section because when people think about the choices they have for expressing data within Puppet, they usually think of Hiera. Hiera is an excellent tool for conditionally assigning a value, based on its internal hierarchy. But what if the conditional logic you need to use doesn’t follow Hiera’s configured hierarchy? Your choices are to:

Edit Hiera’s hierarchy to add the logic you need (which is potentially a disruptive change to Hiera that will affect lookups), or Use conditional logic within the DSL.

Since we’re talking about an escalation path, conditionally assigning a variable is the next logical progression when complexity arises.

Pros

  • Values can be assigned based on whatever conditional logic is necessary.
  • Values are assigned within the Puppet DSL, and thus are more visible to Puppet code reviewers (versus reviewing Hiera data, which may be located elsewhere).
  • Reusability remains intact: The variable is assigned once, and used throughout the manifest.

Cons

  • Variables still cannot be reassigned or overridden.
  • Conditional logic can grow to become stringy and overly complex if left unchecked.
  • Conditional logic is syntax-heavy, and requires knowledge of the Puppet DSL (i.e., it’s not something easily used by people who don’t know Puppet).

Recommendations

You should use conditional logic to assign a value within a profile when:

  • The conditional logic isn’t overly complex.
  • The conditional logic is different from the Hiera hierarchy.
  • Visibility of the data value within the Puppet DSL is a priority.

Hiera lookups and class parameters

Puppet’s data lookup tool is Hiera, and Hiera is an excellent way to model data in a hierarchical manner based on layers of business logic. Demonstrating how Hiera works is the easy part; implementing it (and knowing when to do Hiera calls) is another story.

Before we get there, it’s important to understand that Hiera lookups can be done ad hoc through the use of the hiera() or lookup() functions, or through the automatic class parameter lookup functionality. The previous links will give you detailed explanations. Briefly, if a class is declared and a value is not explicitly assigned for any of the class’s parameters, Hiera will automatically do a lookup for the full parameter name. For example, if the class is called ‘apache’ and the parameter is called ‘port’, then Hiera does an automatic parameter lookup for apache::port.

We’ll get back to automatic parameter lookups in a second, but for now let’s focus on explicit lookups. Here’s an example using both the older hiera function and the newer lookup function:

1
2
$apache_port    = hiera('apache_port')
$apache_docroot = lookup('apache_docroot')

Explicit lookups using one of the above functions are easier to see and understand when you’re new to Puppet, because the automatic parameter lookup functionality is relatively hidden to you (should you not be aware of its existence). More importantly, explicit lookups within a Puppet class are considered to be private to that class. By “private,” I mean the object-oriented programming definition: The data is limited in scope to this implementation, and there’s no other external way to override or affect this value, short of changing what value Hiera ends up returning. You can’t, for example, pass in a parameter and have it take precedence over an explicit lookup — the result of the lookup stands alone.

More than anything, the determining factor for whether you use an explicit lookup or expose a class parameter to the profile should be whether the Hiera lookup is merely a shorthand for getting a value that others SHOULDN’T be able to change, or whether this value should be exposed to the profile as part of the API. If you don’t want people to be able to override this value outside of Hiera, then an explicit lookup is the correct choice.

Explicit lookup pros

  • No need for conditional logic since Hiera is configured independently. Simply do a lookup for a value, and assign it to a variable.
  • Using a lookup function is a visible indicator that the data lives outside the DSL (in Hiera).

Explicit lookup cons

  • Loss of visibility: The data is inside Hiera’s hierarchy, and determining the value requires invoking Hiera in some manner (as opposed to simply observing a value in the DSL).
  • If the lookup you want to perform doesn’t conform to Hiera’s existing hierarchy, then Hiera’s hierarchy will need to be changed, which is disruptive.

Explicit lookup recommendations

You should use an explicit data lookup when:

  • The data item is private to the implementation of the class (i.e., not exposed as an API to the profile).
  • The value from Hiera should not be overridden within the Puppet DSL.

Class parameters

API vs. internal logic

When building a profile, the implementation of the profile (i.e., anything between the open and closing curly braces {} of a class definition: class apache { … } ) is considered to be private. This means that there really are no guarantees around specific resource declarations as long as the technology is configured properly in the end. Class parameters are considered to be part of the profile’s API, and thus there’s a guarantee that existing parameters won’t be removed or have their functionality changed within a major release (if you follow semantic versioning).

More specifically, exposing a parameter indicates to your Puppet code users that this is something that can be set or changed. Think of computer hardware and the differentiation between Phillips head screws and Torx screws. The Phillips head screws usually mean that customer intervention is allowed, much the same way that parameters indicate data values that can be changed, while Torx screws usually mean that customer intervention is disallowed, much the same way as variables or explicit lookups within a profile cannot be reassigned or overridden.

As referenced in the previous section, this document on the automatic class parameter lookup functionality describes the order of precedence for setting class parameters:

  1. Parameter values are explicitly set with a resource-like class declaration.
  2. Puppet performs a Hiera lookup in the style of (CLASS NAME)::(PARAMETER NAME).
  3. The default value set in the class definition.

By exposing a class parameter to your profile, you allow for the data to be entered into Hiera without needing an explicit lookup in the profile. Additionally, class parameters can be specified during a resource-like class declaration that allows the user to override the Hiera lookup layer and pass in their desired value. The user understands that class parameters are Puppet’s way of allowing input and altering the way Puppet configures the piece of technology. In this way, class parameters aren’t merely another method for performing a Hiera lookup; they’re also an invitation for user input.

Discoverability and extensibility

One important distinction with class parameters: The Puppet Enterprise console is able to discover class parameters and present them visually. It can do this because Puppet Server has an API that exposes this data, and that means parameters and classes can be queried and enumerated. Explicit Hiera lookups are not discoverable in the same way; you will need to search through your codebase manually.

Next, class parameters can have their values assigned by an external node classifier, or ENC, but explicit Hiera lookups cannot. An ENC is an arbitrary script or application that can tell Puppet which classes a node should have. (For more information, refer to this document on ENCs.) For Puppet Enterprise, the Puppet Enterprise console acts as an ENC.

Finally, consider the extensibility of explicit lookups versus class parameters. Puppet introduced the lookup() function a while back as a replacement for the hiera() function, which means that over time, all hiera() function calls will need to be converted to lookup() function calls. Class parameters have remained largely unchanged since their introduction (with data types being an additional change), so people using class parameters and the automatic parameter lookup don’t need to convert all those explicit lookups. In this case, explicit lookups may require more work than class parameters when performing an upgrade.

Because these two lookups have two fundamentally different purposes, I’m treating their usages separately.

Class parameter lookup pros

  • Signals to users of Puppet code that this data item is configurable.
  • Allows the value to be assigned either by the Puppet Enterprise console (or other configured ENC) or Hiera.
  • Classes and parameters are discoverable through the Puppet Server API.

Class parameter lookup cons

  • Automatic parameter lookup is unexpected if you don’t know it exists.
  • Loss of visibility: The data is inside Hiera’s hierarchy, and determining the value requires invoking Hiera in some manner (as opposed to simply observing a value in the DSL).
  • Each parameter is unique, so even if multiple profiles expose a parameter of the same name that requires the same value, there needs to be a value in Hiera for each unique parameter.

Class parameter recommendations

You should expose a class parameter when:

  • You require the conditional logic within Hiera’s hierarchy to determine the value of a data item.
  • If or when you need to override the value using the Puppet Enterprise console (or other configured ENC).
  • To indicate that this part of the profile is configurable to users of Puppet code.

Summary

Writing extensible code and keeping configuration data separate are always in the back of every Puppet user’s mind, but the mechanics of how to achieve this goal can seem daunting. With this post, I hope you now have a clearer path for structuring your Puppet code!

Roles and Profiles in a Control Repo?

In the past, the thing that got me to make a blog post was answering a question more than once and not having a good source to point someone to after-the-fact. As the docs at docs.puppet.com have become more comprehensive, I find that I’m wanting to write about things infrequently. But, all it takes is a question or two from a customer to kick things in the ass and remind me that there’s still a LOT of tribal knowledge around Puppet (let alone the greater community). It’s with THAT theme that we talk about Roles & Profiles, and the Control Repo.

Like many things nowadays, there are official Puppet docs on the Control Repo. In a nutshell, the Control Repo is the repository that Puppet’s Code Manager (or R10k in the open source) uses to track Puppet Environments and the versions of all Puppet modules within each Puppet Environment. On a greater scale, the Control Repo is an organization’s implementation of Puppet to the extent that it can (and eventually will) fully represent your organization’s infrastructure. Changes to the Control Repo WILL incur changes to your Puppet Master(s), and in most cases will also bubble down to your managed nodes (i.e. if you’re changing a profile that’s being used by 1000 nodes, then that change will be definitely change the file that’s on each Puppet Master but will also change the enforcement of Puppet on those 1000 nodes).

Similarly, Roles & Profiles has its own official docs page! As a recap, “Role & Profiles” is a design pattern (that’s all!) that has been employed by Puppet Users for several years as a way to make sense of wiring up public Puppet modules with site-specific implementations and data. It allows organizations to share common modules while also having the ability to add their own customizations and implement a separate configuration management data layer (i.e. Hiera).

Both the Control Repo and Roles & Profiles (R&P) have undergone several evolutions to get them to the reliable state we know today, and they’ve had their shared history: we’ve implemented Roles & Profiles both inside and outside the Control Repo…

Roles and Profiles outside the Control Repo

Roles & Profiles were (was?) created before the Control Repo because the problem of disentangling data from Puppet code was a greater priority than automating code-consistency across multiple Puppet masters. When the idea of using a git repo to implement dynamic Puppet Environments came along, the success of being able to ensure module consistency across all your masters was pretty landmark. The workflow, however, needed some work – there were a LOT of steps. Git workflow automation loves it some hooks, and so the idea of a post-receive hook that would immediately update a Puppet environment was a logical landing point. The idea was that all modules would be listed and ‘pinned’ to their correct version/tag/commit within Puppetfile that lived at the root of the Control Repo. ‘Roles’ and ‘Profiles’ are Puppet modules, modules were already listed in Puppetfile, so some customers listed them there initially. During a code deploy, R10k/Code Manager would read that file, pull down all the modules at their correct versions, and then move along. That entire workflow looked like this:

  1. Create/Modify a Profile and push changes to the Profile module repo
  2. Create a branch to the Control Repo and modify Puppetfile to target the new Profile changes
  3. Push the Control Repo changes up to the remote (where the git webhook catches that change and deploys it to Puppet Masters)
  4. Classify your node (if needed) and/or test the changes
  5. If changes are necessary, go back to step 1 and repeat up to step 4
  6. Once everything works, submit a Pull Request to the Control Repo

This workflow works regardless of whether a Role or Profile was changed, but the biggest thing to understand is that ONLY the Control Repo has the git webhook that will deploy code changes to your Puppet Masters, so if you want to trigger a code deploy then you’ll need to change the Control Repo and push that change up (or have access to trigger R10k/Code Manager on the Puppet Master). This resulted in a lot of ‘dummy’ changes that were necessary SOLELY to trigger a code change. Conversely, changes to the Roles or Profiles module (they’re separate) don’t get automatically replicated, so even if there’s a small change to a Profile you’ll still need to either trigger R10k/Code Manager by hand or make a small dummy commit to the Control Repo to trigger a code deploy.

As I said before, some customers implemented Roles & Profiles and the Control Repo this way for awhile until it was realized that you could save steps by putting both the Roles and Profiles module into the Control Repo itself…

Roles and Profiles inside the Control Repo

Since the entire contents of the Control Repo are already cloned down to disk by R10k/Code Manager, the idea came about to store the Roles and Profiles modules in a special directory of the Control Repo (usually called ‘site’ which is short for ‘site-specific modules’), and then change $modulepath within Puppet to look for the ‘site’ folder within every Puppet Environment’s directory path as another place for Puppet modules to live. This worked for two reasons:

  1. It shortened the workflow (since changes to Roles and Profiles were done within the history of the Control Repo, there was no need to change the version ‘pin’ inside Puppetfile as a separate step)
  2. Because Roles and Profiles are now within the Control Repo, changes made to Roles and Profiles will now trigger a code deploy

For the vast majority of customers, putting Roles & Profiles inside the Control Repo made sense and kept the workflow shorter than it was before. It also had the added benefit of turning the Control Repo into the important artifact that it is today (thanks to the git webhook).

Can/Should we also put other modules inside the Control Repo?

Once you add the site directory to $modulepath, it opens up that directory to be used as a place for storing ANY Puppet modules. The question then remains: should the site directory be used for anything else other than Roles and Profiles?

Maybe?

Just like Puppet, though, just because you CAN do things, doesn’t immediately mean you SHOULD. It’s important to understand that the Control Repo is fundamental to ensuring code consistency across multiple Puppet Masters. For that reason, commits to the Control Repo should be scruitinized closely. If you’re a large team with many Puppet contributers and many Puppet masters, then it’s best to keep modules within their own git repositories so multiple team members can work independenly and the Control Repo can be used to “tie it all together” in the end. If you’re the only Puppet contributor, you’re using 80% of your modules from the Puppet Forge, but you have 3 relatively-static modules outside of Roles and Profiles that you’ve written specifically for your organization and you want them in the site directory of the Control Repo then you’re probably fine. See the difference?

Who owns what?

One of the biggest factors to influence where Puppet modules should be managed is the split of which teams own which decisions. Usually, Puppet infrastructure is owned by an internal operations team, which means that the Ops team is used to making changes to the Control Repo. If Puppet usage is wide enough within your organization it’s common to find application teams who own specific Profiles that are separate from the infrastructure. It’s usually easier to grant an outside team access to a separate repo than it is to try and restrict access to a specific folder or even branch of an existing repository, and so in that case it might make sense to make the Profile module its own repository. If the people that own the Puppet infrastructure are the same people that make changes to Puppet modules, then it doesn’t really matter where Roles and Profiles go.

For many organizations this is THE consideration that determines their choice, but remember to build a workflow for today with the ability to adapt to tomorrow. If you have a single person outside the ops team contributing to Puppet, it doesn’t mean that you need to upend the workflow just for them. Moving from something like having Roles & Profiles inside the Control Repo to having them outside the Control Repo is an easy switch to implement (from a technical standpoint), but the second you make that switch you’re adding steps to EVERYONE’S workflow and changing the location of the most commonly used modules within Puppet. That’s a heavy cost – don’t do it without reason.

So what are the OFFICIAL RECOMMENDATIONS THEN?!?!

We officially recommend you calm down with that punctuation. Beyond that, here it is:

  • Put Roles & Profiles within the site directory of the Control Repo unless you have a specific reason NOT to.

Do you have multiple Puppet contributors and separate modules for EACH INDIVIDUAL Profile? Then you might want to have separate repos for each Profile and put them in Puppetfile to keep the development history separate. You ALSO might want to put each individual Profile module in the site directory of the Control Repo and just run with it that way. The bottom line here would be access: who can/should access Profiles, who can/should access the Control Repo, are those people the same, and do you need to restrict access for some reason? Start with doing it this way and change WHEN YOU HIT THAT COMPLEXITY! Don’t deviate because you ‘anticipate something’ – change when you’re ready for it and don’t overarchitect early.

  • If you’re a smaller team with the same people who own the Puppet infrastructure as who own Puppet module development and you have a couple of small internal modules that don’t change very often, AND putting them inside the ‘site’ folder of the Control Repo is easier for you than managing individual git repos, then by all means do it!

Whew that was a lot. Basically, yes, I’ve outlined a narrow case because usually creating a new git repository is a very small cost for an organization. If you’re in an organization where that’s NOT the case, then the site directory solution might appeal to you. What you gain in simplicity you lose in access and security, though, so consider that ahead of time. Finally, the biggest factor HERE is that the same people own the infrastructure and module code, so you can afford to make shortcuts.

  • Have an internal Puppet Policy/Style Guide for where Puppet modules “go.”

If you’ve had the conversation and made the decision, DOCUMENT IT! It’s more important to have an escalation path/policy for new Puppet users in your organization to ensure consistency (the last thing you want to do is to keep having this conversation every other month).

  • Moving a module from the ‘site’ directory to its own repository is not difficult, but it does add workflow steps.

Remember that if a module doesn’t live in the ‘site’ directory then it needs to get ‘pinned’ in Puppetfile, and that adds an extra step anytime that module needs updated within a Puppet Environment.

Summary

First, if you’ve read this post it’s probably because you’re Googling for material to support your cause (or someone cited this post as evidence to back their position). You might have even skipped down here for “the answer.” Guess what – shit doesn’t work like that! Storing Roles & Profiles (and/or other Puppet modules) within the ‘site’ directory is an organizational choice based on the workflow that best jives with an organization’s existing developmental cycle and ownership requirements. The costs/benefits for each choice boil down to access, security, and saving time. The majority of the time putting Roles & Profiles in the Control Repo saves time and keeps all organizational-specific information in one place. If you don’t have a great reason to change that, then don’t.

Workflows Evolved: Even Besterer Practices

It’s nearly been two years since I posted the Puppet Workflow series and several things have changed:

  • R10k now ships with Puppet Enterprise and there are docs for it!
  • There’s even a pe_r10k module that ships with Puppet Enterprise 2015.2.x and higher to configure R10k
  • Control repos are the standard and are popping up all over the place
  • Most people are bundling Hiera data with their Control repo (unless they have a very good reason not to)
  • Ditto for Roles and Profiles
  • The one-role-per-node rule is a good start, but PE’s rules-based classification engine allows us to relax that rule
  • Roles still include Profiles, but conditional logic is allowed and recommended to keep Hiera hierarchy levels minimal
  • ‘Data’ goes in Hiera, but the definition of ‘data’ changes between organizations
  • There’s now a (somewhat) defined path for whether ‘data’ is included in a profile or Hiera
  • Automatic Parameter Lookup + Hiera…it’s still hard to debug, but we’re getting there
  • I’m incredibly wary of taking Uber during peak travel times with rate multipliers

It’s been awhile since I’ve had a good rant, so let’s get right into it!

Code Management with R10k

As of PE 3.8, R10k became bundled with Puppet Enterprise (PE) and was referred to as “Code Management” which initially confused people because the only thing about PE that was changed was that the R10k gem was preinstalled into PE’s Ruby installation. The purpose of this act was twofold:

  1. The Professional Services team was installing R10k in essentially EVERY services engagement, and so it made sense to ship R10k and thus officially support its installation
  2. We’ve always had plans to keep the functionality that R10k provided but not NECESSARILY the tool-known-as-R10k, so calling the service it provided something OTHER than R10k would allow us to swap out the implementation underneath the hood while still being able to talk about the functionality it provided

Of course, if you didn’t live inside Puppet Labs it’s possible that you might not have gotten this memo, but, hey: better late than never?

For various reasons, we also never initially shipped a PE-specific module to configure R10k, so you ALSO had to either manually setup r10k.yaml or use Zack Smith’s R10k module to manage that file. Of course, that module did all kinds of OTHER things (like installing the R10k gem, setting up webhooks, and making my breakfast), which meant that if you used it with the version of PE that shipped R10k, you had to be careful to use the version of the module that didn’t ALSO try to upgrade that gem on your system (and whoops if the module actually upgraded the version of R10k that we shipped). This is why that module is Puppet Approved but not an offical Puppet Labs module: it does things that we would consider “unsupported” outside of a professional services engagement (i.e. the webhook stuff). Finally, the path to r10k.yaml was changed to /etc/puppetlabs/r10k/r10k.yaml, but, in its absence, the old path of /etc/r10k.yaml would be used and a message would be displayed to inform you of the new file path (in the case that both files were present, the file at /etc/puppetlabs/r10k/r10k.yaml would win).

When PE version 2015.2.0 shipped (I’m still not used to these version numbers either, folks), we FINALLY shipped a pe_r10k module with similar structure to Zack’s R10k module – this meant you could FINALLY setup R10k immediatly without having to install additional Puppet modules. Even better(er), in PE 2015.2.2 we expose a couple of PE installer answer file questions that allow you to configure R10k DURING INSTALL TIME – so now your servers could be immediately bootstrapped with a single answers file (seriously, I know, it’s about time; I do this shit every week, you have no idea). It finally feels like R10k has grown into the first-class citizen we all wanted it to be!

Which means it’s time to dump it.

I kid. Mostly. The fact of the matter is that we’re introducing a new service to manage code within Puppet Enterprise, and if you’re interested in reading more about it, check out this blog post by Lindsay Smith about Code Manager. For you, the consumer, the process will be the same: you have a control repo, you push changes, a service is triggered on your Puppet masters, and code is synchronized on the Puppet master. What WILL change is the setup of this tool (there will still be PE installer answer file questions that allow you to configure this service, don’t fret, and you’ll still be able to configure this service through a Puppet module, but the name of said module and configuration files on disk will probably be different. Welcome to IT).

Be on the lookout for this service, and, as always, check out the PE docs site for more information on the Code Management service.

Control (repo) freak

With the explosion of R10k came the explosion of “Control Repos” all over the place. Everyone had one, everyone had an opinion on what worked best, and, well, we didn’t really do a good job at offering a good startup control repo for you. Because of that, we recently posted a ‘starter’ control repo on Github in the Puppet Labs namespace that could be used to get started with R10k. Yes, it’s definitely long overdue, but there it is! I use it on all engagements I do with new customers, so you can guarantee it’ll have the use of Puppet Labs’ PS team behind it. If you’ve not started with R10k yet (or if you have but you wanna see what kinda crazy shit we’re doing now), check it out. It’s got great stuff in there like a config_version script to spit out the most recent commit of the current branch of the control repo (read: also current Puppet environment) as the “Config Version” string that Puppet prints out during every Puppet run (see here for more info on this functionality). We’re also slowly adding things like inital bootstrapping profiles that will do things like configure R10k/Code Manager, manage the SSH key necessary to contact the control repo (should you be using an internal git repository server and also require an SSH key to access that repo), and so on. Star that repo and keep checking back, especially around PE releases, to see if we’ve updated things in a way that will help you out!

“Just put it in the control repo”

Look, if there’s one thing that my blog emphasizes (other than the fact that I’ve got a hairpin trigger for cursing and an uncomfortable Harry Potter fetish) it’s that “best practices” are inversely related to the insecurities of the speaker. Fortunately, I have no problem saying when I’m wrong. If you’ve got the time, allow me my mea culpa moment. In the past I had recommended:

  • Using a separate git repo for Hiera data
  • Using separate git repos for Roles and Profiles
  • The Dave Matthews Band

Time, experience, and the legalization of recreational marijuana in Oregon have helped me see the error in my ways (though, look, #41 is a good goddamn song, especially on the Dave & Tim Live at Luther College album), so allow me to provide some insight into WHY I’ve reconsidered my message(s)…

Hiera Data

In the past, I recommended a separate git repo for Hiera data along with a separate entry in r10k.yaml that would allow R10k to clone the Hiera data repo along the same vein as the control repo. The pro was that a separate Hiera data repo would afford you different access rights to this repo as you would the control repo (especially if different people needed different access to each function). The con was that now the branch structure of your Hiera data repo needed to EXACTLY MIRROR the structure of your control repo….even if certain branches had EXACTLY THE SAME Hiera data and no changes were necessary.

Puppet has enough moving parts, why did we need to complicate this if most people didn’t care about access levels differing between the two repos? The solution was to bundle the Hiera data inside the control repo all the way up until you had a specific need to split it out. Truth be told both methods work with Puppet, so the choice is up to you (read: I DON’T CARE WHICH METHOD YOU USE OH MY GOD WILL YOU QUIT TRYING TO PICK A FIGHT WITH ME OVER THIS LOL) :)

Finally, there’s an added benefit of putting this data inside the control repo, and it’s ALSO the reason for the next recommendation…

Roles and Profiles

This is one that I actually fought when someone suggested it…I even started to recommend that a customer NOT do the thing I’m about to recommend to you until they very eloquently explained why they did it. In the end, they were right, and I’m passing this tip on to you: Unless you have a very specific reason NOT to, put your ‘roles’ and ‘profiles’ modules in your control repo.

Here’s the thing about the control repo – you can set a post-receive hook on the repository (or setup a Jenkins/Bamboo/whatever job) that will update all your Puppet masters whenever changes are pushed to the remote git repository (i.e. your git repository server). This means that anytime the control repo is updated your Puppet masters will be updated. That’s why it’s CALLED the control repo – it effectively CONTROLS your Puppet masters.

Understanding THAT, think about when you want your Puppet masters updated? Well, you usually want to update them when you’re testing something out – you made a change to a couple of modules, then a profile (and possibly also a role), and now you wanna see if that code works on more than just your local laptop. But the Puppet landscape has changed a bit as the Puppet Forge has matured – most people are using modules off the Forge and are at least TRYING not to use their own component modules. This means that changes to your infrastructure are being controlled from within roles/profiles. But even IF you’re one of those people who aren’t using the Forge or who have to update an internal component module, you’re probably not wanting to update all your Puppet masters every time you update a component module. There’s probably lots of tinkering there, and every change isn’t “update-worthy”. Conversely, changes to your profiles probably ARE “update-worthy”: “Okay, let’s pull this bit from Hiera, pass it as a parameter, and now I’m ready to check it out on a couple of machines.”

If your roles and profiles modules are separate from your control repo, you end up having to push changes to, say, a class in the profiles module, then updating the Puppetfile in the control repo, then trigger an R10k run/sync. If things aren’t correct, you end up changing the profile, pushing that change to the profile repo, and THEN having to trigger an R10k run/sync (and if you don’t have SSH access to your masters, you have to make a dummy commit to the control repo so it triggers an R10k run OR doing a curl to some endpoint that will update your Puppet master for you). That last step is the thing that ends up wasting a bit of your time: why do we need to push a profile and then manually do an R10k run of we’ve established that roles and profiles will pretty much ALWAYS be “update-worthy”? We don’t. If you put the roles and profiles module inside the control repo, then it will automatically update your Puppet masters every time you make a change to one or the other. Bam – step saved. ALSO, if you do this, you can take Roles/Profiles out of Puppetfile, which means you no longer need to pin them! No more will you have to tie that module to a topic branch during development time: just create a branch of the control repo and go to town! Wow, that saves even more time! I’m uncomfortable with this level of excitement!

The one thing you WILL need to do is to update environment.conf so that it knows to look for the roles/profiles modules in a different path from all the other modules (because removing it from Puppetfile means that it will no longer go to the same modulepath as every other module managed inside Puppetfile). For the purposes of cleanliness, we usually end up putting both roles/profiles inside a site folder in the control repo. If you do that, your modulepath in environment.conf looks a little something like this:

1
modulepath = site:modules:$basemodulepath

This means that Puppet will look for modules first in the ‘site’ directory of its current environment (this is the directory where we put roles/profiles), and then inside the ‘modules’ directory (this is where modules managed in Puppetfile are cloned by default), and then in $basemodulepath (i.e. modules common to all environments and also modules that Puppet Enterprise ships).

LOOK, BEFORE YOU FREAK OUT, YES, SITE COMES FIRST HERE, AND OTHER PEOPLE HAVE SITE COME SECOND! Basically, if you have roles/profiles in the ‘site’ directory AND you manage to still have the module in Puppetfile, then the module in the ‘site’ directory will win. Feel free to flip/flop that if you want.

TL;AR: (yes, you already read all of this so it’s futile) put roles/profiles inside the site directory of the control repo to save you time, but also don’t do it if you have a specific reason not to…or if you like being contrarian.

Dave Matthews

The “Everyday” album was the “jump the shark” moment for the Dave Matthews band, while the leaked “Lillywhite Sessions” that would largely make it to “Busted Stuff” definitely indicated where the band wanted to go. They never recovered after that, and, just like Boone’s Farm ‘wine’, I stopped partaking in them.

Also, not ONCE did being able to play most every Dave Matthews song on the acoustic guitar ever get me laid…though I can’t tell exactly whose fault that was. On second thought, that was probably me. Though Tim Reynolds is an absolute beast of a musician; I’m still #teamtim.

One role per node, until you don’t want to

Why do we even make these rules if you’re not gonna follow them? It’s getting awfully “Who’s Line Is It Anyways?” up in here. Before PE 3.7, and its rules-based classification engine, we recommended not assigning more than one role to a node. Why? Well, the Puppet Enterprise Console around that time wasn’t the best at tracking changes or providing authentication around tasks like classification. This meant if you tried to manage ALL of your classification within the console you could have a hard time telling when things changed or why. Fortunately, git provides you with this functionality. Because of that, we (and when I say ‘we’ I mean ‘everyone in the field trying to design a Puppet workflow that not only made sense but also had some level of accountability’) tried to displace most classification tasks from the Console into flat files that could be managed with git. This is largely the impetus for Roles and Profiles when you think about it: Profiles connect Puppet to external ata and give you a layer to express dependencies between multiple Puppet classes, and Roles is a mechanism for boiling down classification to a single unit.

Once we launched a new Node Classifier that had a rules-based classification engine AND role-based authentication control, we became more comfortable delegating some of these classification tasks BACK to the console. The Node Classifier ALSO made it easy to click on a node and not only see what was classified to that node, but also WHERE it got that bit of classification from (“This node is getting the JBoss profile because it was put into the App Servers nodegroup”). With that level of accountability, we could start relaxing our “One Role Per Node™” mandate, OR eliminate the roles module altogether (and use nodegroups in the Node Classifier in place of roles).

The goal has always been to err on the side of “debugability” (I like making words). I will usually try to optimize a task for tracing errors later, because I’ve been a sysadmin where the world is falling apart around you and you need to quickly determine what caused this mess. Using one role per node makes sense if you don’t use a node classifier that gives you this flexibility, but MIGHT not if you DO use a classifier that has some level of accountability.

Roles, conditional logic, Hiera, and you

Over time as I’ve talked to people that ended up building Puppet workflows based on the things I’ve written (which still feels batshit crazy to me, by the way, since I’ve known myself for over 34 years), I’ve noticed that people seem to take the things I say VERY LITERALLY. And to this I say: “You should probably send me money via Paypal.” Also – note that I’m writing these things to address the 80% of people out there using/getting started with Puppet. You don’t HAVE to do what I say, especially if you have a good reason not to, and you SHOULDN’T do what I say, especially if you’re the one that’s going to stay with that organization forever and manage the entire Puppet deployment. For everyone else out there, let’s talk some more about roles.

The talking points around roles has always been “Roles include profiles; that’s it.” Again, going back to the idea that roles exist to support classification, this makes sense – you don’t want to add resources at a very high level like a roles class because, well, honestly, there’s probably a better place for it, but any logic added to simply classification is a win.

Consider an organization that has both Windows and Linux application servers. The question of whether to have separate roles for Linux and Windows application servers is always one of the first questions to be surfaced. At a low level, everything you do in a Puppet manifest is solely for the purpose of getting resources into the catalog (a JSON object containing a list of all resource Puppet is to be managing ond their desired end-state). Whether you have two different roles matters not to Puppet so long as the right node gets the right catalog. For a Puppet developer writing code, having two separate roles also might not matter (and, in reality, based on the amount of code assigned to either role, it might be cleaner to have different roles for each). For the person in charge of classifying nodes with their assigned role, it’s probably easier to have a single role (roles::application_server, for example) that can be assigned to ALL application servers, and then logic inside the role to determine whether this will be a Windows application server using IIS or a Linux application server using JBoss (or, going further, a Linux application server running Weblogic, or Websphere, or Tomcat, whatever). Like we mentioned in the previous point, if you’re using the “One role per node” philosophy, then you probably want a single role with conditional logic to determine Windows/Linux, and then determine Tomcat/JBoss, and so on. If you’re using the Puppet Enterprise Console’s node classifier, and thus the rule-based engine, you can afford not to care about the number of node groups you create because you can create a rule to match for application servers, and then a rule to match on operating system, and create as many rules as you want to dynamically discover and classify nodes on the fly.

The point here is that the PURPOSE of the Role is to aid classification, and the focus on creating a role is to start small, use conditional logic to determine which profiles to include, and then simply include them. If that conditional logic uses Facter facts, awesome. If you need to look at a variable coming from the Console to do the job, fine – go for it! But if you’re using the Role as a substitute for a Profile (i.e. data lookups, declaring classes, even declaring resources), then you’re probably going down a path that’s gonna make it confusing for people follow what’s going on.

Bottom line: technology-agnostic roles that utilize conditional logic around including profiles is a win, but keep tasks like declaring resources and component modules to Profiles. Doing this provides a top-down path for debugging and a cleaner overall Puppet codebase.

What the hell is ‘Data’ anyhow?

This point has single-handedly caused more people to come up and argue with me. I’m not kidding. I shit you not, I’ve had people legitimately *SCREAM* at me about how wrong I was with my opinions here. The cool thing is that people LOVE the idea of Hiera – it lets you keep the business-specific data out of your Puppet manifests, it’s expressed in YAML and not the Puppet DSL, and when it works, it’s magical.

The problem is that it’s fucking magical. Seriously.

So what IS a good use of Hiera? Anytime you have a bit of data that is subject to override (for example: the classical NTP problem where everyone should use the generic company NTP server, except nodes at this location should use a different NTP server, and this particular node should use ITSELF as its NTP server), that bit of data goes into Hiera (and by ‘that bit of data’, I mean ‘the value of the NTP server’ or ‘the NTP server’s FQDN’), which would look SOMETHING like this:

1
ntpserver: pool.ntp.org

What does NOT go into Hiera is a hash-based representation of the Puppet resource that would then be passed to create_resources() and used to create the resource in the catalog…which would look something like this:

1
2
3
4
5
6
7
ntpfiles:
  '/etc/ntp/ntpd.conf':
    ensure: file
    owner:  0
    group:  0
    mode:   0644
    source: 'puppet:///modules/ntp/ntpd.conf'

…which would then be passed into Puppet like this:

1
create_resources('file', hiera_hash('ntpfiles))

Yes, this is an exaggeration based on a very narrow use case, but what I’m trying to highlight is that the ‘data’ bit in all that above mess is SOLELY an FQDN, and everything else is arguably the “Model”, or your Puppet code.

Organizations LOVE that you can put as much “stuff” into Hiera as you want and then Puppet can call Hiera, create resources based on what it tells you, and merrily be on your way. Well, they “love” it until it doesn’t work or does something unexpected, and then debugging Hiera is a right bastard.

Understand that the problem I have would be with unexpected Hiera behavior. If you’re skilled in the ways of the Hiera and its (sometimes cloudy) interaction with Puppet, then by ALL means use it for whatever ya like. BUT, if you’re still new to Puppet, then you may have a very loose mental map for how Hiera works and where it interacts with Puppet…and nobody should have to have that advanced level of knowledge just to debug the damn thing.

The Hiera + create_resources() use above is of particular nastiness simply because it turns your Hiera YAML files into a potential mechanized weapon of Puppet destruction. If I know that you’re doing this under the hood, I could POTENTIALLY slip data into Hiera that would end up creating resources on a node to do what I want. Frequently Puppet code is more heavily scrutinized than Hiera data, and I could see something like this getting overlooked (especially if you don’t have a ton of testing around your Puppet code before it gets deployed).

The REASON why create_resources() was created was because Puppet lacked the ability to do things like recursion and loops inside the DSL, and sometimes you WANT to automate very repeated tasks. Consider the case where you truly DON’T know how many of something is going to be on a node ahead of time – maybe you’re using VMware vRO/vRA and someone is building a node on-the-fly with the web GUI. For every checkbox someone ticks there will be another application to be installed, or another series of firewall rules, or SOMETHING like that. You can choose to model these individually with profiles, OR, if the task is repetitive, you can accept their choices as data and feed it back into Puppet like a defined resource type. In fact, most use-cases for Hiera + create_resources() is passing data into a defined resource type. As of Puppet 4.x.x, we have looping constructs inside the DSL, so we can finally AUTOMATE these tasks without having to use an extra function (of course, in THIS use case, whether you use recursion/looping in the DSL or create_resources() matters not – you get the same thing in the end).

For one last point, the Puppet DSL is still pretty easy to read (as of right now), and most people can follow what’s going on even if they’re NOT PuppEdumicated. Having 10 resource declarations in a row seems like a pain in the ass to write when you’re doing it, but READING it makes sense. Later on, if you need to know what’s going on with this profile, you can scan it and see exactly what’s there. If you start slipping lots of data into Hiera and looping logic into the DSL, you’re gonna force the person who manages Puppet to go back and forth between reading Hiera code, then back to Puppet code, then back to the node, and so on. Again, it’s totally possible to do now, and frequently NECESSARY when you have a more complex deployment and well-trained Puppet administrators, but initially it’s possible to build your own DSL to Puppet by slipping things into Hiera and running away laughing.

So when do I put this ‘data’ into the Profile and when is a good time to put it into Hiera? I’m glad you asked…

A path to Hiera data

These last two points I’ve written about before. I may be repeating myself, but bytes are cheap. Like I wrote above (and before), putting data directly into a Profile is the easiest and most legible way of providing “external data” into Puppet. Yes, you’ll argue, putting the data into a Profile, which is Puppet code, is ARGUABLY NOT being very “external” about it. In my opinion it is – your Profile is YOUR IMPLEMENTATION of a technology stack, and thus isn’t going to be shared outside your organization. I consider that external to all the component modules out there, but, again, potato/potato. I recommend STARTING HERE when you’re getting started with Puppet. Hiera comes in when you have a very clear-cut need for overriding data (a la: this NTP server everywhere, except here and here). The second you might need to have different data, you can either start building conditional logic inside the Profile, OR use the conditional logic that Hiera provides.

So – which do you use?

The point of Hiera is to solve 80% or better of all conditional choices in your organization. Consider this data organization model:

  • Everyone shares most of the same data items
  • San Francisco/London do their own things sometimes
  • Application tiers get their own level for dev/test/qa/prod-specific overrides
  • Combinations of tiers/locations/and business units want their own overrides
  • Node specific data is the most specific (and least-used) level

If you’re providing some data to Puppet that follows this model, then cool – use Hiera. What about specific “exceptions” that don’t fit this model? Do you try to create specialized layers in Hiera just for these exceptions? Certain organizations absolutely do – I see it all the time. What you find is that certain layers in Hiera go together (this location/tier/business_unit level goes right above location/tier, which goes right above location), and we start referring to those coupled layers as “Chains”. Chains are usually tied to some specific need (deploying applications, for example). Sometimes you create a chain just to solve a VERY SPECIFIC hard problem (populating /etc/sudoers in large organizations, for example).

The question is – do I create another “Chain” of layers in the hierarchy solely because deploying sudoers is hard, or do I throw a couple of case statements into the sudoers profile and keep it out of Hiera altogether?

My answer is to start with conditional logic in the sudoers profile and break it out into Hiera if you see that “Chain” being needed elsewhere. Why? Because, like I’ve said many times before, debugging Hiera kinda sucks right now – there’s no way currently to get a dump of all variables and parameters for a particular node and determine which were set by Hiera, which were set with variables in the DSL, which came out of the console, and so on. If we HAD that tool, I’d be all about using it and polluting your hierarchy all day long (I expand upon this slightly in the next point about the Automatic Parameter Lookup + Hiera).

Bottom line: Start with the data in the Profile, then move it to Hiera when you need to override. Start with conditional logic in the Profile, then create a “Chain” in the Hierarchy if you need to use it in more than one place.

Hiera, APL, Refactoring, WTF

Like I said, I’ve written about this before. I like the Automatic Parameter Lookup functionality in Puppet – it’s ace. I like Hiera. But if you don’t know how it works, or that it exists, it feels too much like Magic. There are certain things in the product that can ONLY be set by putting data inside Hiera and running Puppet, and that is truly an awesome thing: just tell a customer “drop this bit of data somewhere in Hiera, run Puppet, and you’re all set.” But, again, if you need to know how a particular line got into a particular config file on your node, and it was set with the APL, then you’ve got some digging to do.

There’s still no tool, like I mentioned in the last item, to give me full introspection into all variables/parameters set for a node and that variable/parameter’s origin. Part of the reason as to WHY this tool doesn’t exist is because the internals of Puppet don’t necessarily make it easy for you to determine where a parameter/variable was set. That’s OUR problem, and I feel like we’re slowly making progress on marking these things internally so we can expose them to our customers. Until then, you have to trace through code and Hiera data.

I know the second I publish and tweet about this, I’m gonna get a message from R.I. Pienaar saying that I’ve crazy for NOT pushing people toward using Hiera more with the Automatic Parameter Lookup, because the more we use it, the faster we can move away from things like params classes, and profiles, and everything else, but the reality is I’m ALL ABOUT PEOPLE using it if they know how it works. I’m ACTUALLY fucking happy that it works well for you – please continue to use it and do awesome Puppet things. I only recommend to people who are getting started to NOT USE it FIRST, and then, when you understand how it would help you by clocking some hours of Puppet code writing and debugging, do some refactoring and move to it!

Yes, refactoring is involved.

Look, refactoring is a way of life. You’re gonna re-tool your Puppet code for the purposes of legibility, or efficiency, or any of the many other reasons why you refactor code – it’s unavoidable. Also, if I come into your org and setup Puppet for the most efficient use-case, and then I leave that into your relatively-new-to-Puppet hands, it’s probably not gonna be the best situation because you won’t have known WHY I made the decisions I did (and, even if I document them, you might have gaps of knowledge that would help you understand the problems I’m helping you avoid).

Sometimes hitting the problem so you have first-hand knowledge of why you need to avoid it in the future isn’t the WORST thing in the world.

To move to any configuration management system means you’re gonna be refactoring. Embrace it. Start small, get things working, then clean it up. Don’t try to build the “fortress of sysadmin perfection” with your first bit of Puppet code – just get shit done! Allow yourself time during the month simply to unwind some misgivings you realize after-the fact, and definitely seek advice before doing something you feel might be particularly complex or overarching, but getting shit done is gonna trump “not working” any day (or whatever the manager-y buzzspeak is this week).

Bottom Line: APL if you understand it, start small, get shit done, refactor, repeat

Hopefully this leads to more posts

Holy shit, you’re still reading?! Ohh, you skimmed down this far to see how long this post was gonna be – got it. Either way, I’m glad I finally got this out there. It’s been months, yes, but that doesn’t mean I haven’t been writing. We’ve been doing lots of internal work to try and get more official docs out to you and less of “Go read Gary’s blog!” You’ll notice R10k has some official docs, right?! Yeah, that’s awesome! We want more of that. BUT, there’s still going to be times where I feel like what I’m gonna say isn’t necessarily the “party line”, and that’s what this blog is about.

Thanks to everyone at Puppetconf and beyond who approached me and told me how much they love what I write. I’m gonna be humble as fuck in person, but I really do get excited whenever someone says that. It’s also crazy as hell when someone from Wal-mart approaches you and says they built part of their deployment based on the shit you wrote. From a guy who came from a town in Ohio with a population of less than 8000 people, it’s crazy to see where you’re “recognized.”

So thank you, again, for all the support.

And sorry, Dave Matthews – it’s not you, it’s me. Actually, that’s a lie; it was you.

Puppet Workflows 4: Using Hiera in Anger

Hiera. That thing nobody is REALLY quite sure how to say (FYI: It’s pronounced ‘hiera’), the tool that everyone says you should be using, and the tool that will make you hate YAML syntax errors with a passion. It’s a data/code separation dream, (potentially) a debugging nightmare, and absolutely vital in creating a Puppet workflow that scales better than your company’s Wifi strategy (FYI: your company’s Wifi password just changed. Again. Because they’re not using certificates). I’ve already written a GOOD AMOUNT on why/how to use it, but now I’m going to give you a couple of edge cases. Call them “best practices” (and I’ll cut you), but I like to call it “shit I learned after using Hiera in anger.” Here are a couple of the most popular questions I hear, and my usual responses…

“How should I setup my hierarchy?”

This is such a subjective question because it’s specific to your organization (because it’s your data). I usually ask back “What are the things about your nodes that are different, and when are they different?” Usually I hear something back like “Well, nodes in this datacenter have different DNS settings” or “Application servers in production use one version of java, and those in dev use a different version” or “All machines in the dev environment in this datacenter need to have a specific repository”. All of these replies give me ideas to your hierarchy. When you think of Hiera as a giant conditional statment, you can start seeing how your hierarchy could be laid out. With the first response, we know we need a location fact to determine where a node is, and then we can have a hierarchy level for that location. The second response tells me we need a level for the application tier (i.e. dev/test/prod). The third response tells me we need a level that combines both the location and the application tier. When you add in that you should probably have a node-specific level at the top (for overrides) and a default level at the bottom (or not: see the next section), I’m starting to picture this:

1
2
3
4
5
6
:hierarchy:
  - "nodes/%{::clientcert}"
  - "%{::location}/%{::applicationtier}"
  - "%{::location}/common"
  - "tier/%{::applicationtier}"
  - common

Every time you have a need, you consider a level. Now, obviously, it doesn’t mean that you NEED a level for every request (sometimes if it’s an edge case you can handle it in the profile or the role). There’s a performance hit for every level of your Hiera hierarchy, so ideally keep it minimal (or around 5 levels or so), but we’re talking about flexibility here, and, if that’s more important than performance then you should go for it.

Next comes ordering. This one’s SLIGHTLY easier – your hierarchy should read from most-specific to least-specific. Note that when you specify an application tier at a specific location that that it is MORE specific than just saying “all nodes in this application tier.” Sometimes you will have levels that might be hard to define an order – such as location vs. application tier. You kinda just have to go with your gut here. In many cases you may find that the data you put in those two levels will be entirely different (location-based data may not ever overlap with application-tier-specific data). Do remember than any time you change the order of your hierarchy you’re going to introduce the possibility that values get flip/flopped.

If you look at level 3 of the hierarchy above, you’ll see that I have ‘common’ at the end. Some people like this syntax (where they put a ‘common’ file in a folder that matches the fact they’re checking against), and some people prefer a filename matching the fact. Do what makes you happy, but, in this case, we can unify the location folder and just put the common file underneath the application tier files.

Finally, DO MAKE USE OF FOLDERS! For the love of god, this. Putting all files in a single folder both makes that a BIG folder, but also introduces a namespace collision (i.e. what if you have a location named ‘dev’ for example? Now you have both an application tier and a location with the same name. Oops).

How you setup your hierarchy is up to you, but this should hopefully give you somewhere to start.

Common.yaml, your organization’s common values – REVISED

UPDATE – 28 October

Previously, this section was where I presented the idea of removing the lowest level of the hierarchy as a way of ensuring that you didn’t omit a value in Hiera (the idea being that common values would be in the profile, anything higher would be in Hiera, and all your ‘defaults’, or ‘common values’ would be inside the profile). The idea of removing the lowest level of the Hiera hierarchy was always something I was kicking around in my head, but R.I. made a comment below that’s made me revise my thought process. There’s still a greater concern around definitively tracking down values pulled from Hiera, but I think we can accomplish that through other means. I’m going to revise what I wrote below to point out the relevant details.

When using Hiera, you need to define a hierarchy that Hiera uses in its search for your data. Most often, it looks something like this:

hiera.yaml
1
2
3
4
5
6
7
8
9
10
---
:backends:
  - yaml
:yaml:
  :datadir: /etc/puppetlabs/puppet/hieradata
:hierarchy:
  - "nodes/%{::clientcert}"
  - "location/%{::location}"
  - "environment/%{::applicationtier}"
  - common

Notice that little “common” at the end? That means that, failing everything else, it’s going to look in common.yaml for a value. I had thought of common as the ‘defaults’ level, but the reality is that it is a list of values common across all the nodes in your infrastructure. These are the values, SPECIFIC TO YOUR ORGANIZATION, that should be the same everywhere. Barring an override at a higher level, these values are your organization’s ‘defaults’, if you will.

Previously, you may have heard me rail against Hiera’s optional second argument and how I really don’t like it. Take this example:

1
$foo = hiera('port', '80')

Given this code, Hiera is going to look for a parameter called ‘port’ in its hierarchy, and, if it doesn’t find one in ANY of the levels, assign back a default value of ‘80’. I don’t like using this second argument because:

  1. If you forget to enter the ‘port’ parameter into the hierarchy, or typo it in the YAML file, Hiera will gladly assign the default value of ‘80’ (which, unless you’re checking for this, might sneak and get into production)
  2. Where is the real ‘default’ value: the value in common.yaml or the optional second argument?

It actually depends on where you do the hiera() call as to what ‘kind’ of default value this is. Note that previously we talked about how the ‘common’ level represented values common across your infrastructure. If you do this hiera() call inside a profile (which is where I recommend it be done), providing the optional second argument ends up being redundant (i.e. the value should be inside Hiera).

The moral of this story being: values common to all nodes should be in the lowest level of the Hiera hierarchy, and all explicit hiera calls should omit the default second argument if that common value is expected to be found in the hierarchy.

Data Bindings

In Puppet 3, we introduced the concept of ‘data bindings’ for parameterized classes, which meant that Puppet now had another choice for gathering parmeter values. Previously, the order Puppet would look to assign a value for parameters to classes was:

  1. A value passed to the class via the parameterized class syntax
  2. A default value provided by the class

As of Puppet 3, this is the new parameter assignment order:

  1. A value passed to the class via the parameterized class syntax
  2. A Hiera lookup for classname::parametername
  3. A default value provided by the class

Data bindings is meant to be pluggable to allow for ANY data backend, but, as of this writing, there’s currently only one: Hiera. Because of this, Puppet will now automatically do a Hiera lookup for every parameter to a parameterized class that isn’t explicitly passed a value via the parameterized class syntax (which means that if you just do include classname, Puppet will do a Hiera lookup for EVERY parameter defined to the “classname” class).

This is really cool because it means that you can just add classname::parametername to your Hiera setup, and, as long as you’re not EXPLICITLY passing that parameter’s value to the class, Puppet will do a lookup and find the value.

It’s also completely transparent to you unless you know it’s happening.

The issue here is that this is new functionality to Puppet, and it feels like magic to me. You can make the argument and say “If you don’t start using it, Gary, people will never take to it,” however I feel like this kind of magical lookup in the background is always going to be a bad thing.

There’s also another problem. Consider a Hiera hierarchy that has 15 levels (they exist, TRUST ME). What happens if you don’t define ANY parameters in Hiera in the form of classname::parametername and simply want to rely on the default values for every class? Well, it means that Hiera is STILL going to be triggered for every parameter to a class that isn’t explicitly passed a value. That’s a hell of a performance hit. Fortunately, there’s a way to disable this lookup. Simply add the following to the Puppet master’s puppet.conf file:

1
data_binding_terminus = none

It’s going to be up to how your team needs to work as to whether you use Hiera data bindings or not. If you have a savvy team that feels they can debug these lookups, then cool – use the hell out of it. I prefer to err on the side of an explicit hiera() lookup for every value I’m querying, even if it’s a lot of extra lines of code. I prefer the visibility, especially for new members to your team. For those people with large hierarchies, you may want to weigh the performance hit. Try to disable data bindings and see if your master is more performant. If so, then explicit hiera() calls may actually buy you some rewards.

PROS:

  • Adding parameters to Hiera in the style of classname::parametername will set parameterized class values automatically
  • Simplified code – simply use the include() function everywhere (which is safer than the parameterized class syntax)

CONS:

  • Lookup is completely transparent unless you know what’s going on
  • Debugging parameter values can be difficult (especially with typos or forgetting to set values in Hiera)
  • Performance hit for values you want to be assigned the class default value

Where to data – Hiera or Profile?

“Does this go right into the Profile or into Hiera?” I get that question repeatedly when I’m working with customers. It’s a good question, and one of the quickest ways to blow up your YAML files in Hiera. Here’s the order I use when deciding where to put data:

WHERE did that data come from?

Remember that the profile is YOUR implementation – it describes how YOU define the implementation of a piece of technology in YOUR organization. As such, it’s less about Puppet code and more about pulling data and passing it TO the Puppet code. It’s the glue-code that grabs the data and wires it up to the model that uses it. How it grabs the data is not really a big deal, so long as it grabs the RIGHT data – right? You can choose to hardcode it into the Profile, or use Hiera, or use some other magical data lookup mechanism – we don’t really care (so long as the Profile gathers the data and passes it to the correct Puppet class).

The PROBLEM here is debugging WHERE the data came from. As I said previously, Hiera has a level for all bits of data common to your organization, and, obviously, data overridden at a higher level takes precedence over the ‘common’ level at the bottom. With Hiera, unless you run the hiera binary in debug mode (-d), you can never be completely sure where the data came from. Puppet has no way of dumping out every variable and where it came from (whether Hiera or set directly in the DSL, and, if it WAS Hiera, exactly what level or file it came from).

It is THIS REASON that causes me to eschew things like data bindings in Puppet. Debugging where a value came from can be a real pain in the ass. If there were amazing tooling around this, I would 100% support using data bindings and just setting everything inside Hiera and using the include() function, but, alas, that’s not been my experience. Until then, I will continue to recommend explicit hiera calls for visibility into when Hiera is being called and when values are being set inside the DSL.

Enter the data into the Profile

One of the first choices people make is to enter the data (like ntpserver address, java version, or whatever it is) directly into the Profile. “BUT GARY! IT’S GOING TO MAKE IT HARD TO DEBUG!” Not really. You’re going to have to open the Profile anyway to see what’s going on (whether you pull the data from Hiera or hardcode it in the Profile), right? And, arguably, the Profile is legible…doing Hiera lookups gives you flexibility at a cost of abstracting away how it got that bit of data (i.e. “It used Hiera”). For newer users of Puppet, having the data in the Profile is easier to follow. So, in the end, putting the data into the Profile itself is the least-flexible and most-visible option…so consequently people consider it as the first available option. This option is good for common/default values, BUT, if you eventually want to use Hiera, you need to re-enter the data into the common level of Hiera. It also splits up your “source of truth” to include BOTH the Profile manifest and Hiera. In the end, you need to weigh your team’s goals, who has access to the Hiera repo, and how flexible you need to be with your data.

PROS:

  • Data is clearly visible and legible in the profile (no need to open additional files)

CONS:

  • Inability to redefine variables in Puppet DSL makes any settings constants by default (i.e. no overriding permitted)
  • Data outside of Hiera creates a second “source of truth”

Enter the data into Hiera

If you find that you need to have different bits of data for different nodes (i.e. a different version of Java in the dev tier instead of the prod tier), then you can look to put the data into Hiera. Where to put the data is going to depend on your own needs – I’m trusting that you can figure this part out – but the bigger piece here is that once the data is in Hiera you need to ensure you’re getting the RIGHT data (i.e. if it’s overridden at a higher level, you are certain you entered it into the right file and didn’t typo anything).

This answers that “where” question, but doesn’t answer the “what” question…as in “What data should I put into Hiera?” For that, we have another section…

PROS:

  • Flexibility in returning different values based on different conditions
  • All the data is inside one ‘source of truth’ for data according to your organization

CONS:

  • Visibility – you must do a Hiera lookup to find the value (or open Hiera’s YAML files)

“What exactly goes into Hiera?”

If there were one question that, if answered incorrectly, could make or break your Puppet deployment, this would be it. The greatest strength and weakness of Hiera is its flexibility. You can truly put almost anything in Hiera, and, when combined with something like the create_resources() function, you can create your own YAML configuration language (tip: don’t actually do this).

“But, seriously, what should go into Hiera, and what shouldn’t?”

The important thing to consider here is the price you pay by putting data into Hiera. You’re gaining flexibility at a cost of visibility. This means that you can do things like enter values at all level of the hierarchy that can be concatenated together with a single hiera_array() call, BUT, you’re losing the visibility of having the data right in front of you (i.e. you need to open up all the YAML files individually, or use the hiera binary to debug how you got those values). Hiera is REALLY COOL until you have to debug why it grabbed (or DIDN’T grab) a particular value.

Here’s what I usually tell people about what should be put into Hiera:

  • The exact data values that need to be different conditionally (i.e. a different ntp server for different sites, different java versions in dev/prod, a password hash, etc.)
  • Dynamic data expressed in multiple levels of the hierarchy (i.e. a lookup for ‘packages’ that returns back an array of all the values that were found in all the levels of the hierarchy)
  • Resources as a hash ONLY WHEN ABSOLUTELY NECESSARY

Puppet manifest vs. create_resources()

Bullets 1 and 2 above should be pretty straightforward – you either need to use Hiera to grab a specific value or return back a list of ALL the values from ALL the levels of the hierarchy. The point here is that Hiera should be returning back only the minimal amount of data that is necessary (i.e. instead of returning back a hash that contains the title of the resource, all the attributes of the resource, and all the attribute values for that resource, just return back a specific value that will be assigned to an attribute…like the password hash itself for a user). This data lookup appears to be “magic” to new users of Puppet – all they see is the magic phrase of “hiera” and a parameter to search for – and so it becomes slightly confusing. It IS, however, easier to understand that this magical phrase will return data, and that that data is going to be used to set the value for an attribute. Consider this example:

1
2
3
4
5
6
7
8
9
$password = hiera('garypassword')

user { 'gary':
  ensure   => present,
  uid      => '5001',
  gid      => 'gary',
  shell    => 'zsh',
  password => $password,
}

This leads us to bullet 3, which is “the Hiera + create_resources() solution.” This solution allows you to lookup data from within Hiera and pass it directly to a function where Puppet creates the individual resources as if you had typed them into a Puppet manifest itself. The previous example can be entered into a Hiera YAML file like so:

sysadmins.yaml
1
2
3
4
5
6
7
users:
  gary:
    ensure: 'present'
    uid: '5001'
    gid: 'gary'
    shell: 'zsh'
    password: 'biglongpasswordhash'

And then a resource can be created inside the Puppet DSL by doing the following:

1
2
$users = hiera('users')
create_resources('users')

Both examples are functionally identical, except the first one only uses Hiera to get the password hash value, whereas the second one grabs both the attributes, and their values, for a specific resource. Imagine Puppet gives you an error with the ‘gary’ user resource and you were using the latter example. You grep your Puppet code looking for ‘gary’, but you won’t find that user resource in your Puppet manifest anywhere (because it’s being created with the create_resources() function). You will instead have to know to go into Hiera’s data directory, then the correct datafile, and then look for the hash of values for the ‘gary’ user.

Functional differences between the two approaches

Functionally, you COULD do this either way. When you come up with a solution using create_resources(), I challenge you to draw up another solution using Puppet code in a Puppet manifest (however lengthy it may be) that queries Hiera for ONLY the specific values necessary. Consider this example, but, instead, you need to manage 500 users. If you use create_resources(), you would then need to add 500 more blocks to the ‘users’ parameter in your Hiera datafiles. That’s a lot of YAML. And on what level will you add these blocks? prod.yaml? dev.yaml? Are you using a common.yaml? Your YAML files suddenly got huge, and the rest of your team modifying them will not be so happy to scroll through 500 entries. Now consider the first example using Puppet code. Your Puppet manifest suddenly grew, but it didn’t affect all the OTHER manifests out there: only this file. The Hiera YAML files will still grow – but now 500 individual lines instead of 3000 lines in the previous example. Okay, now which one is more LEGIBLE? I would argue that the Puppet manifest is more legible, because I consider the Puppet DSL to be very legible (again, subject to debate versus YAML). Moreover, when debugging, you can stay inside Puppet files more often using Puppet manifests to define your resources. Using create_resources, you need to jump into Hiera more often. That’s a context shift, which adds more annoyance to debugging. Also, it creates multiple “sources of truth.” Suddenly you have the ability of entering data in Hiera as well as entering it in the Puppet manifest, which may be clear to YOU, but if you leave the company, or you get another person on your team, they may choose to abuse the Hiera settings without knowing why.

Now consider an example that you might say is more tailored to create_resources(). Say you have a defined type that sets up tomcat applications. This defined type accepts things like a path to install the application, the application’s package name, the version, which tomcat installation to target, and etc. Now consider that all application servers need application1, but only a couple of servers need application2, and a very snowflake server needs application3 (in this case, we’re NOT saying that all applications are on all boxes and that their data, like the version they’re using, is different. We’re actually saying that different machines require entirely different applications).

Using Hiera + create_resources() you could enter the resource for the application1 at a low level, then, at a higher level, add the resource for application2, and finally add the resource for application3 at the node-specific level. In the end, you can do a hiera_hash() lookup to discover and concatenate all resources from all levels of the hierarchy and pipe that to create_resources.

How would you do this with Puppet code? Well, I would create profiles for every application, and either different roles for the different kinds of servers (i.e. the snowflake machine gets its own role), or conditional checks inside the role (i.e. if this node is at the London location, it gets these application profiles, and etc…).

Now which is more legible? At this point, I’d still say that separate profiles and conditional checks in roles (or sub-roles) are more legible – including a class is a logical thing to follow, and conditionals inside Puppet code are easy to follow. The create_resources() solution just becomes magic. Suddenly, applications are on the node. If you want to know where they came from, you have to switch contexts and open Hiera data files or use the hiera binary and do a debug run. If you’re a small team that’s been using Puppet forever, then rock on and go for it. If you’re just getting started, though, I’d shy away.

Final word on create_resources?

1
2
Some people, when confronted with a problem, think “I know, I'll use create_resources()."
Now they have two problems.

The create_resources() function is often called the “PSE Swiss Army knife” (or, Professional Services Engineer – the people who do what I do and consult with our customers) because we like to break it out when we’re painted into a corner by customer requirements. It will work ANYWHERE, but, again, at that cost of visibility. I am okay with someone using it so long as they understand the cost of visibility and the potential debugging issues they’ll hit. I will always argue against using it, however, for those reasons. More code in a Puppet manifest is not a bad thing…especially if it’s reasonably legible code that can be kept to a specific class. Consider the needs and experience level of your team before using create_resources() – if you don’t have a good reason for using it, simply don’t.

create_resources()

PROS:

  • Dynamically iterate and create resources based on Hiera data
  • Using Hiera’s hash merging capability, you can functionally override resource values at higher levels of the hierarchy

CONS:

  • Decreased visibility
  • Becomes a second ‘source of truth’ to Puppet
  • Can increase confusion about WHERE to manage resources
  • When used too much, it creates a DSL to Puppet’s DSL (DSLs all the way down)

Puppet DSL + single Hiera lookup

PROS:

  • More visible (sans the bit of data you’re looking up)
  • Using wrapper classes allows for flexibility and conditional inclusion of resources/classes

CONS:

  • Very explicit – doesn’t have the dynamic overriding capability like Hiera does

Using Hiera as an ENC

One of the early “NEAT!” moments everyone has with Hiera is using it as an External Node Classifier, or ENC. There is a function called hiera_include() that allows you to include classes into the catalog as if you were to write “include (classname)” in a Puppet manifest. It works like this:

london.yaml
1
2
3
classes:
  - profiles::london::base
  - profiles::london::network
dev.yaml
1
2
classes:
  - profiles::tomcat::application2
site.pp
1
2
3
node default {
  hiera_include('classes')
}

Given the above example, the hiera_include() function will search every level of the hierarchy looking for a parameter called ‘classes’. It returns a concatenated list of classnames, which it then passes to Puppet’s include() function (in the end, Puppet will declare the profiles::london::base, profiles::london::network, and profiles::tomcat::application2 classes). Puppet puts the contents of these classes into the catalog, and away we go. This is awesome because you can change the classification of a node conditionally according to a Hiera lookup, and it’s terrible because you can CHANGE THE CLASSIFICATION OF A NODE CONDITIONALLY ACCORDING TO A HIERA LOOKUP! This means that anyone with access to the repo holding your Hiera data files can affect changes to every node in Puppet just by modifying a magical key. It also means that in order to see the classification for a node, you need to do a Hiera lookup (i.e. you can’t just open a file and see it).

Remember that WHOLE blog post about Roles and Profiles? I do, because I wrote the damn thing. You can even go back and read it again, too, if you want to. One of the core tenets of that article was that each node get classified with a single role. If you adhere to that (and you should; it makes for a much more logical Puppet deployment), a node really only ever needs to be classified ONCE. You don’t NEED this conditional classification behavior. It’s one of those “It seemed like a good idea at the time” moments that I assure you will pass.

Now, you CAN use Roles with hiera_include() – simply create a Facter fact that returns the node’s role, add a level to the Hiera hierarchy for this role fact, and in the role’s YAML file in Hiera, simply do:

appserver.yaml
1
classes: role::application_server

Then you can use the same hiera_include() call in the default node definition in site.pp. The ONLY time I recommend this is if you don’t already have some other classification method. The downside of this method is that if your role fact CHANGES, for some reason or another, classification immediately changes. Facts are NOT secure – they can be overridden really easily. I don’t like to leave classification to an insecure method that anyone with root access on a machine can change. Using an ENC or site.pp for classification means that the node ABSOLUTELY CANNOT override its classification. It’s the difference between being authoritative and simply ‘suggesting’ a classification.

PROS:

  • Dynamic classification: no need to maintain a site.pp file or group in the Console
  • Fact-based: a node’s classification can change immediately when its role fact does

CONS:

  • Decreased visibility: need to do a Hiera lookup to determine classification
  • Insecure: since facts are insecure and can be overridden, so can classification

Puppetconf 2014 Talk - the Refactor Dance

This year at Puppetconf 2014, I presented a 1.5 hour talk entitled “The Refactor Dance” that comprised nearly EVERYTHING that I’ve written about in my Puppet Workflows series (from writing better component modules, to Roles/Profiles, to Workflow, and lots of stories in-between) as well as a couple of bad words, a pair of leather pants (trousers), and an Uber story that beats your Uber story. It’s long, informative, and you get to watch the sweat stains under my arms grow in an attractive grey Puppet Labs shirt. What’s not to love?

To watch the video, click here to check it out!

On Dependencies and Order

This blog post was born out of a number of conversations that I’ve had about Puppet, its dependency model, and why ‘ordering’ is not necessarily the way to think about dependencies when writing Puppet manifests. Like most everything on this site, I’m getting it down in a file so I don’t have to repeat this all over again the next time someone asks. Instead, I can point them to this page (and, when they don’t actually READ this page, I can end up explaining everything I’ve written here anyways…).

Before we go any further, let me define a couple of terms:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
dependencies     - In a nutshell, what happens when you use the metaparameters of
                   'before', 'require', 'subscribe' or 'notify' on resources in a
                   Puppet manifest: it's a chain of resources that are to be
                   evaluted in a specific order every time Puppet runs. Any failure
                   of a resource in this chain stops Puppet from evaluating the
                   remaining resources in the chain.

evaluate         - When Puppet determines the 'is' value (or current state) of a
                   resource (i.e. for package resources, "is the package installed?")

remediate        - When Puppet determines that the 'is' value (or current state of
                   the resource) is different from the 'should' value (or the value
                   entered into the Puppet manifest...the way the resource SHOULD
                   end up looking on the system) and Puppet needs to make a change.

declarative(ish) - When I use the word 'declarative(ish)', I mean that the order
                   by which Puppet evaluates resources that do not contain dependencies
                   does not have a set procedure/order. The way Puppet EVALUATES
                   resources does not have a set procedure/order, but the order
                   that Puppet reads/parses manifest files IS from top-to-bottom
                   (which is why variables in Puppet manifests need to be declared
                   before they can be used).

Why Puppet doesn’t care about execution order (until it does)

The biggest shock to the system when getting started with a declarative (ish) configuration management tool like Puppet is understanding that Puppet describes the end-state of the machine, and NOT the order that it’s (Puppet) going to take you to that state. To Puppet, the order that it chooses to affect change in any resource (be it a file to be corrected, a package to be installed, or any other resource type) is entirely arbitrary because resources that have no relationship to another resource shouldn’t CARE about the order in which they’re evaluated and remediated.

For example, imagine Puppet is going to create both /etc/sudoers and update the system’s authorized keys file to enter all the sysadmins’ SSH keys. Which one should it do first? In an imperative system like shell scripts or a runbook-style system, you are forced to choose an order. So I ask again, which one goes first? If you try to update the sudoers file in your script first, and there’s a problem with that update, then the script fails and the SSH keys aren’t installed. If you switch the order and there’s a problem with the SSH keys, then you can’t sudo up because the sudoers file hasn’t been touched.

Because of this, Puppet has always taken the stance that if there are failures, we want to get as much of the system into a working state as possible (i.e. any resources that don’t depend upon the failing resource are going to still be evaluated, or ‘inspected’, and remediated, or ‘changed if need be’). There are definitely philosophical differences here: the argument can be made that if there’s a failure somewhere, the system is bad and you should cast it off until you’ve fixed whatever the problem is (or the part of the code causing the problem). In virtualized or ‘cloud’ environments where everything is automated, this is just fine, but in environments without complete and full automation, sometimes you have to fix and deal with what you have. Puppet “believes in your system”, which is borderline marketing-doubletalk for “alert you of errors and give you time to fix the damn thing and do another Puppet run without having to spin up a whole new system.”

Once you know WHY Puppet takes the stance it does, you realize that Puppet does not give two shits about the order of resources without dependencies. If you write perfect Puppet code, you’re fine. But the majority of the known-good-world does not do that. In fact, most of us write shit code. Which was the problem…

The history of Puppet’s ordering choices

‘Random’ random order

In the early days, the only resources that were guaranteed to have a consistent order were those resources with dependencies (i.e. as I stated above, resources that used the ‘before’, ‘require’, ‘subscribe’, or ‘notify’ metaparameters to establish an evaluation order). Every other resource was evaluted at random every time that Puppet ran…which meant that you could run Puppet ten times and, theoretically, resources without dependencies could be evaluated in a different order between every Puppet run (we call this non-deterministic ordering). This made things REALLY hard to debug. Take the case where you had a catalog of thousands of resources but you forgot a SINGLE dependency between a couple of file resources. If you roll that change out to 1000 nodes, you might have 10 or less of them fail (because Puppet chose an evaluation order that ordered these two resources incorrectly). Imagine trying to figure out what happened and replicate the problem. You could waste lots of time just trying to REPLICATE the issue, even if it was a small fix like this.

PROS:

  • IS there a pro here?

CONS:

  • Ordering could change between runs, and thus it was very hard to debug missing dependencies

Philosophically, we were correct: resources that are to be evaluated in a certain order require dependencies. Practically, we were creating more work for ourselves.

Incidentally, I’d heard that Adam Jacob, who created Chef, had cited this reason as one of the main motivators for creating Chef. I’d heard that as a Puppet consultant, he would run into these buried dependency errors and want to flip tables. Even if it’s not a true STORY, it was absolutely true for tables where I used to work…

Title-hash, ‘Predictable’ random order

Cut to Puppet version 2.7 where we introduced deterministic ordering with ‘title-hash’ ordering. In a nutshell, resources that didn’t have dependencies would still be executed in a random order, but the order Puppet chose could be replicated (it created a SHA1 hash based on the titles of the resources without dependencies, and ordered the hashes alphabetically). This meant that if you tested out a catalog on a node, and then ran that same catalog on 1000 other nodes, Puppet would choose the same order for all 1000 of the nodes. This gave you the ability to actually TEST whether your changes would successfully run in production. If you omitted a dependency, but Puppet managed to pick the correct evaluation order, you STILL had a missing dependency, but you didn’t care about it because the code worked. The next change you made to the catalog (by adding or removing resources), the order might change, but you would discover and fix the dependency at that time.

PROS:

  • ‘Predictable’ and repeatable order made testing possible

CONS:

  • Easy to miss dependency omissions if Puppet chose the right order (but do you really care?)

Manifest ordering, the ‘bath salts’ of ordering

Title-hash ordering seemed like the best of both worlds – being opinionated about resource dependencies but also giving sysadmins a reliable, and repeatable, way to test evaluation order before it’s pushed out to production.

Buuuuuuuuuut, y’all JUST weren’t happy enough, were you?

When you move from an imperative solution like scripts to a declarative(ish) solution like Puppet, it is absolutely a new way to think about modeling your system. Frequently we heard that people were having issues with Puppet because the order that resources shows up in a Puppet master WASN’T the order that Puppet would evaluate the resources. I just dropped a LOT of words explaining why this isn’t the case, but who really has the time to read up on all of this? People were dismissing Puppet too quickly because their expectations of how the tool worked didn’t align with reality. The assumption, then, was to align these expectations in the hopes that people wouldn’t dismiss Puppet so quickly.

Eric Sorenson wrote a blog post on our thesis and experimentation around manifest ordering that is worth a read (and, incidentally, is shorter than this damn post), but the short version is that we tested this theory out and determined that Manifest Ordering would help new users to Puppet. Because of this work, we created a feature called ‘Manifest Ordering’ that stated that resources that DID NOT HAVE DEPENDENCIES would be evaluated by Puppet in the order that they showed up in the Puppet manifest (when read top to bottom). If a resource truly does not have any dependencies, then you honestly should not care one bit what order it’s evaluated (because it doesn’t matter). Manifest Ordering made ordering of resources without dependencies VERY predictable.

But….

This doesn’t mean I think it’s the best thing in the world. In fact, I’m really wary of how I feel people will come to use Manifest Ordering. There’s a reason I called it the “bath salts of ordering” – because a little bit of it, when used correctly, can be a lovely thing, but too much of it, used in unintended circumstances, leads to hypothermia, paranoia, and the desire to gnaw someone else’s face off. We were/are giving you a way to bypass our dependency model by using the mental-model you had with scripts, but ALSO telling you NOT to rely on that mental-model (and instead set dependencies explicitly using metaparameters).

Seriously, what could go wrong?

Manifest Ordering is not a substitution for setting dependencies – that IS NOT what it was created for. Puppet Labs still maintains that you should use dependencies to order resources and NOT simply rely on Manifest Ordering as a form of setting dependencies! Again, the problem is that you need to KNOW this…and if Manifest Ordering allows you to keep the same imperative “mindset” inside a declarative(ish) language, then eventually you’re going to experience pain (if not today, but possibly later when you actually try to refactor code, or share code, or use this code on a system that ISN’T using Manifest Ordering). A declarative(ish) language like Puppet requires seeing your systems according to the way their end-state will look and worrying about WHAT the system will look like, and not necessarily HOW it will get there. Any shortcut to understanding this process means you’re going to miss key bits of what makes Puppet a good tool for modeling this state.

PROS:

  • Evaluation order of resources without dependencies is absolutely predictable

CONS:

  • If used as a substitution for setting dependencies, then refactoring code (moving around the order in which resources show up in a manifest) means changing the evaluation order

What should I actually take from this?

Okay, here’s a list of things you SHOULD be doing if you don’t want to create a problem for future-you or future-organization:

  • Use dependency metaparameters like ‘before’, ‘require’, ‘notify’, and ‘subscribe’ if resources in a catalog NEED to be evaluated in a particular order
  • Do not use Manifest Ordering as a substitute for explicitly setting dependencies (disable it if this is too tempting)
  • Use Roles and Profiles for a logical module layout (see: http://bit.ly/puppetworkflows2 for information on Roles and Profiles)
  • Order individual components inside the Profile
  • Order Profiles (if necessary) inside the Role

And, seriously, trust us with the explicit dependencies. It seems like a giant pain in the ass initially, but you’re ultimately documenting your infrastructure, and a dependency (or, saying ‘this thing MUST come before that thing’) is a pretty important decision. There’s a REASON behind it – treat it with some more weight other than having one line come before another line, ya know? The extra time right now is absolutely going to buy you the time you spend at home with your kids (and by ‘kids’, I mean ‘XBox’).

And don’t use bath salts, folks.

R10k + Directory Environments

If you’ve read anything I’ve posted in the past year, you know my feelings about the word ‘environments’ and about how well we tend to name things here at Puppet Labs (and if you don’t, you can check out that post here). Since then, Puppet Labs has released a new feature called directory environments (click this link for further reading) that replace the older ‘config file environments’ that we all used to use (i.e. stanzas in puppet.conf). Directory environments weren’t without their false starts and issues, but further releases of Puppet, and their inclusion in Puppet Enterprise 3.3.0, have allowed more people to ask about them. SO, I thought I’d do a quick writeup about them…

R10k had a child: Directory Environments

The Puppet platform team had a couple of problems with config file environments in puppet.conf – namely:

  • Entering them in puppet.conf meant that you couldn’t use environments named ‘master’, ‘main’, or ‘agent’
  • There was no easy/reliable way to determine all the available/used Puppet environments without making assumptions (and hacky code) – especially if someone were using R10k + dynamic environments
  • Adding more environments to puppet.conf made managing that file something of a nightmare (environments.d anyone?)

Combine this with the fact that most of the Professional Services team was rolling out R10k to create dynamic environments (which meant we were abusing $environment inside puppet.conf and creating environments…well… dynamically and on-the-fly), and they knew something needed to be done. Because R10k was so popular and widely deployed, an environment solution that was a simple step-up from an R10k deployment was made the target, and directory environments were born.

How does it work?

Directory environments, essentially, are born out of a folder on the Puppet master (typically $confdir/environments, where $confdir is /etc/puppetlabs/puppet in Puppet Enterprise) wherein every subfolder is a new Puppet environment. Every subfolder contains a couple of key items:

  • A modules folder containing all modules for that environment
  • A manifests/site.pp file containing the site.pp file for that environment
  • A new environment.conf file which can be used to set the modulepath, the environment_timeout, and, a new and often-requested feature, the ability to have environment-specific config_version settings

Basically, it’s everything that R10k ALREADY does with a couple of added goodies dropped into an environment.conf file. Feel free to read the official docs on configuring directory environments for further information on all of the goodies!

Cool, how do we set it up?

It wouldn’t be one of my blog posts if it didn’t include exact steps to configure shit, would it? For this walkthrough, I’m using a Centos 6.5 vm with DNS working (i.e. the node can ping itself and knows its own hostname and FQDN), and I’ve already installed an All-in-one installation of Puppet Enterprise 3.3.0. For the walkthrough, we’re going to setup:

  • Directory environments based on a control repo
  • Hiera data inside a hieradata folder in the control repo
  • Hiera to use the per-environment hieradata folder

Let’s start to break down the components:

The ‘Control Repo’?

Sometime between my initial R10k post and THIS post, the Puppet Labs PS team has come to call the repository that contains the Puppetfile and is used to track Puppet environments on all Puppet masters the ‘Control Repo’ (because it ‘Controls the creation of Puppet environments’, ya dig? Zack Smith and James Sweeny are actually pretty tickled about making that name stick). For the purpose of this demonstration, I’m using a repository on Github:

https://github.com/glarizza/puppet_repository

Everything you will need for this walkthrough is in that repository, and we will refer to it frequently. You DO NOT need to use my repository, and it’s definitely going to be required that you create your OWN, but it’s there for reference purposes (and to give you a couple of Puppet manifests to make setup a bit easier).

Configuring the Puppet master

We’re going to first clone my control repo to /tmp so we can use it to configure R10k and the Puppet master itself:

1
2
3
4
5
6
7
8
9
10
11
[root@master ~]# cd /tmp

[root@master /tmp]# git clone https://github.com/glarizza/puppet_repository.git
Initialized empty Git repository in /tmp/puppet_repository/.git/
remote: Counting objects: 164, done.
remote: Compressing objects: 100% (134/134), done.
remote: Total 164 (delta 54), reused 81 (delta 16)
Receiving objects: 100% (164/164), 22.68 KiB, done.
Resolving deltas: 100% (54/54), done.

[root@master /tmp]# cd puppet_repository

Great, I’ve cloned my repo. To configure R10k, we’re going to need to pull down Zack Smith’s R10k module from the forge with puppet module install zack/r10k and then use puppet apply on a manifest in my repo with puppet apply configure_r10k.pp. DO NOTE: If you want to use YOUR Control Repo, and NOT the one I use on Github, then you need to modify the configure_r10k.pp file and replace the remote property with the URL to YOUR Control Repo that’s housed on a git repository!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
[root@master /tmp/puppet_repository:production]# puppet module install zack/r10k

Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forgeapi.puppetlabs.com ...
Notice: Found at least one version of puppetlabs-stdlib compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Found at least one version of puppetlabs-inifile compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Found at least one version of puppetlabs-vcsrepo compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Found at least one version of puppetlabs-concat compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└─┬ zack-r10k (v2.2.7)
  ├─┬ gentoo-portage (v2.2.0)
  │ └── puppetlabs-concat (v1.0.3) [/opt/puppet/share/puppet/modules]
  ├── mhuffnagle-make (v0.0.2)
  ├── puppetlabs-gcc (v0.2.0)
  ├── puppetlabs-git (v0.2.0)
  ├── puppetlabs-inifile (v1.1.0) [/opt/puppet/share/puppet/modules]
  ├── puppetlabs-pe_gem (v0.0.1)
  ├── puppetlabs-ruby (v0.2.1)
  ├── puppetlabs-stdlib (v3.2.2) [/opt/puppet/share/puppet/modules]
  └── puppetlabs-vcsrepo (v1.1.0)

[root@master /tmp/puppet_repository:production]# puppet apply configure_r10k.pp

Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.71 seconds
Warning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.
   (at /opt/puppet/lib/ruby/site_ruby/1.9.1/puppet/type.rb:816:in `set_default')
Notice: /Stage[main]/R10k::Install/Package[r10k]/ensure: created
Notice: /Stage[main]/R10k::Install::Pe_gem/File[/usr/bin/r10k]/ensure: created
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/ensure: defined content as '{md5}5cda58e8a01e7ff12544d30105d13a2a'
Notice: Finished catalog run in 11.24 seconds

Performing those commands will successfully setup R10k to point to my Control Repo out on Github (and, again, if you don’t WANT that, then you need to make the change to the remote property in configure_r10k.pp). We next need to configure Directory Environments in puppet.conf by setting two attributes:

  • environmentpath (Or the path to the folder containing environments)
  • basemodulepath (Or, the set of modules that will be shared across ALL ENVIRONMENTS)

I have created a Puppet manifest that will set these attributes, and this manifest requires the puppetlabs/inifile module from the Puppet Forge. Fortunately, since I’m using Puppet Enterprise, that module is already installed. If you’re using open source Puppet and the module is NOT installed, feel free to install it by running puppet module install puppetlabs/inifile. Once this is done, go ahead and execute the manifest by running puppet apply configure_directory_environments.pp:

1
2
3
4
5
6
[root@master /tmp/puppet_repository:production]# puppet apply configure_directory_environments.pp

Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.05 seconds
Notice: /Stage[main]/Main/Ini_setting[Configure environmentpath]/ensure: created
Notice: /Stage[main]/Main/Ini_setting[Configure basemodulepath]/value: value changed '/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules' to '$confdir/modules:/opt/puppet/share/puppet/modules'
Notice: Finished catalog run in 0.20 seconds

The last step to configuring the Puppet master is to execute an R10k run. We can do that by running r10k deploy environment -pv:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@master /tmp/puppet_repository:production]# r10k deploy environment -pv

[R10K::Source::Git - INFO] Determining current branches for "https://github.com/glarizza/puppet_repository.git"
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment webinar_env
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying haproxy into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying haproxy into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

Great! Everything should be setup (if you’re using my repo)! My repository has a production branch, which is what Puppet’s default environment is named, so we can test that everything works by listing out all modules in the main production environment with the puppet module list command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[root@master /tmp/puppet_repository:production]# puppet module list

Warning: Module 'puppetlabs-stdlib' (v3.2.2) fails to meet some dependencies:
  'puppetlabs-ntp' (v3.1.2) requires 'puppetlabs-stdlib' (>= 4.0.0)
/etc/puppetlabs/puppet/environments/production/modules
├── notifyme (???)
├── profiles (???)
├── puppetlabs-apache (v1.1.1)
└── puppetlabs-ntp (v3.1.2)
/etc/puppetlabs/puppet/modules
├── gentoo-portage (v2.2.0)
├── mhuffnagle-make (v0.0.2)
├── puppetlabs-gcc (v0.2.0)
├── puppetlabs-git (v0.2.0)
├── puppetlabs-pe_gem (v0.0.1)
├── puppetlabs-ruby (v0.2.1)
├── puppetlabs-vcsrepo (v1.1.0)
└── zack-r10k (v2.2.7)
/opt/puppet/share/puppet/modules
├── puppetlabs-apt (v1.5.0)
├── puppetlabs-auth_conf (v0.2.2)
├── puppetlabs-concat (v1.0.3)
├── puppetlabs-firewall (v1.1.2)
├── puppetlabs-inifile (v1.1.0)
├── puppetlabs-java_ks (v1.2.4)
├── puppetlabs-pe_accounts (v2.0.2-3-ge71b5a0)
├── puppetlabs-pe_console_prune (v0.1.1-4-g293f45b)
├── puppetlabs-pe_mcollective (v0.2.10-15-gb8343bb)
├── puppetlabs-pe_postgresql (v1.0.4-4-g0bcffae)
├── puppetlabs-pe_puppetdb (v1.1.1-7-g8cb11bf)
├── puppetlabs-pe_razor (v0.2.1-1-g80acb4d)
├── puppetlabs-pe_repo (v0.7.7-32-gfd1c97f)
├── puppetlabs-pe_staging (v0.3.3-2-g3ed56f8)
├── puppetlabs-postgresql (v2.5.0-pe2)
├── puppetlabs-puppet_enterprise (v3.2.1-27-g8f61956)
├── puppetlabs-reboot (v0.1.4)
├── puppetlabs-request_manager (v0.1.1)
└── puppetlabs-stdlib (v3.2.2)  invalid

Notice a couple of things:

  • First, I’ve got some dependency issues…oh well, nothing that’s a game-stopper
  • Second, the path to the production environment’s module is correct at: /etc/puppetlabs/puppet/environments/production/modules

Configuring Hiera

The last dinghy to be configured on this dreamboat is Hiera. Hiera is Puppet’s data lookup mechanism, and is used to gather specific bits of data (such as versions of packages, hostnames, passwords, and other business-specific data). Explaining HOW Hiera works is beyond the scope of this article, but configuring Hiera data on a per-environment basis IS absolutely a worthwhile endeavor.

In this example, I’m going to demonstrate coupling Hiera data with the Control Repo for simple replication of Hiera data across environments. You COULD also choose to put your Hiera data in a separate repository and set it up in /etc/r10k.yaml as another source, but that exercise is left to the reader (and if you’re interested, I talk about it in this post).

You’ll notice that my demonstration repository ALREADY includes Hiera data, and so that data is automatically being replicated to all environments. By default, Hiera’s configuration file (hiera.yaml) has no YAML data directory specified, so we’ll need to make that change. In my demonstration control repository, I’ve included a sample hiera.yaml, but let’s take a look at one below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
## /etc/puppetlabs/puppet/hiera.yaml

---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{application_tier}"
  - common

:yaml:
# datadir is empty here, so hiera uses its defaults:
# - /var/lib/hiera on *nix
# - %CommonAppData%\PuppetLabs\hiera\var on Windows
# When specifying a datadir, make sure the directory exists.
  :datadir: "/etc/puppetlabs/puppet/environments/%{environment}/hieradata"

This hiera.yaml file specifies a hierarchy with three levels – a node-specific, level, a level for different application tiers (like ‘dev’, ‘test’, ‘prod’, and etc), and finally makes the change we need: mapping the data directory to each environment’s hieradata folder. The path to hiera.yaml is Puppet’s configuration directory (which is /etc/puppetlabs/puppet for Puppet Enterprise, or /etc/puppet for the open source version of Puppet), so open the file there, make your changes, and finally you’ll need to need to restart the Puppet master service to have the changes picked up.

Next, let’s perform a test by executing the hiera binary from the command line before running puppet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=production
This node is using common data

[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=webinar_env -d
DEBUG: 2014-08-31 19:55:44 +0000: Hiera YAML backend starting
DEBUG: 2014-08-31 19:55:44 +0000: Looking up message in YAML backend
DEBUG: 2014-08-31 19:55:44 +0000: Looking for data source common
DEBUG: 2014-08-31 19:55:44 +0000: Found message in common
This node is using common data

[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=bad_env -d
DEBUG: 2014-08-31 19:58:22 +0000: Hiera YAML backend starting
DEBUG: 2014-08-31 19:58:22 +0000: Looking up message in YAML backend
DEBUG: 2014-08-31 19:58:22 +0000: Looking for data source common
DEBUG: 2014-08-31 19:58:22 +0000: Cannot find datafile /etc/puppetlabs/puppet/environments/bad_env/hieradata/common.yaml, skipping
nil

You can see that for the first example, I passed the environment of production and did a simple lookup for a key called message – Hiera then returned me the value of out that environment’s common.yaml file. Next, I did another lookup, but added -d to enable debug mode (debug mode on the hiera binary is REALLY handy for debugging problems with Hiera – combine it with specifying values from the command line, and you can pretty quickly simulate what value a node is going to get). Notice the last example where I specified an invalid environment – Hiera logged that it couldn’t find the datafile requested and ultimately returned a nil, or empty, value.

Since we’re working on the Puppet master machine, we can even check for a value using puppet apply combined with the notice function:

1
2
3
4
[root@master /etc/puppetlabs/puppet/environments]# puppet apply -e "notice(hiera('message'))"
Notice: Scope(Class[main]): This node is using common data
Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.09 seconds
Notice: Finished catalog run in 0.19 seconds

Great, it’s working, but let’s look at pulling data from a higher level in the hierarchy – like from the application_tier level. We haven’t defined an application_tier fact, however, so we’ll need to fake it. First, let’s do that with the hiera binary:

1
2
3
4
5
6
[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=production application_tier=dev -d
DEBUG: 2014-08-31 20:04:12 +0000: Hiera YAML backend starting
DEBUG: 2014-08-31 20:04:12 +0000: Looking up message in YAML backend
DEBUG: 2014-08-31 20:04:12 +0000: Looking for data source dev
DEBUG: 2014-08-31 20:04:12 +0000: Found message in dev
You are in the development application tier

And then also with puppet apply:

1
2
3
4
[root@master /etc/puppetlabs/puppet/environments]# FACTER_application_tier=dev puppet apply -e "notice(hiera('message'))"
Notice: Scope(Class[main]): You are in the development application tier
Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.09 seconds
Notice: Finished catalog run in 0.18 seconds

Tuning environment.conf

The brand-new, per-environment environment.conf file is meant to be (for the most part) a one-stop-shop for your Puppet environment tuning needs. Right now, the only things you’ll need to tune will be the modulepath, config_version, and possibly the environment_timeout.

Module path

Before directory environments, every environment had its own modulepath that needed to be tuned to allow for modules that were to be used by this machine/environment, as well as shared modules. That modulepath worked like $PATH in that it was a priority-based lookup for modules (i.e. the first directory in modulepath that had a module matching the module name you wanted won). It also previously required the FULL path to be used for every path in modulepath.

Those days are over.

As I mentioned before, the main puppet.conf configuration file has a new parameter called basemodulepath that can be used to specify modules that are to be shared across ALL modules in ALL environments. Paths defined here (typically $confdir/modules and /opt/puppet/share/puppet/modules) are usually put at the END of a modulepath so Puppet can search for any overridden modules that show up in earlier modulepath paths. In the previous configuration steps, we executed a manifest that setup basemodulepath to look like:

1
basemodulepath = $confdir/modules:/opt/puppet/share/puppet/modules

Again, feel free to add or remove paths (except don’t remove /opt/puppet/share/puppet/modules if you’re using Puppet Enterprise, because that’s where all Puppet Enterprise modules are located), especially if you’re using a giant monolithic repo of modules (which was typically done before things like R10k evolved).

With basemodulepath configured, it’s now time to configure the modulepath to be defined for every environment. My demonstration control repo contains a sample environment.conf that defines a modulepath like so:

1
modulepath = modules:$basemodulepath

You’ll notice, now, that there are relative paths in modulepath. This is possible because now each environment contains an environment.conf, and thus relative paths make sense. In this example, nodes in the production environment (/etc/puppetlabs/puppet/environments/production) will look for a module by its name FIRST by looking in a folder called modules inside the current environment folder (i.e. /etc/puppetlabs/puppet/environments/production/modules/<module_name>). If the module wasn’t found there, it looks for the module in the order that paths are defined for basemodulepath above. If Puppet fails to find a module in ANY of the paths, a compile error is raised.

Per-environment config_version

Setting config_version has been around for awhile – hell, I remember video of Jeff McCune talking about it at the first Puppetcamp Europe in like 2010 – but the new directory environments implementation has fine tuned it a bit. Previously, config_version was a command executed on the Puppet master at compile time to determine a string used for versioning the configuration enforced during that Puppet run. When it’s not set it defaults to something of a time/date stamp off the parser, but it’s way more useful to make it do something like determine the most recent commit hash from a repository.

In the past when we used a giant monolithic repository containing all Puppet modules, it was SUPER easy to get a single commit hash and be done. As everyone moved their modules into individual repositories, determining WHAT you were enforcing became harder. With the birth of R10k an the control repo, we suddenly had something we could query for the state of our modules being enforced. The problem existed, though, that with multiple dynamic environments using multiple git branches, config_version wasn’t easily tuned to be able to grab the most recent commit from every branch.

Now that config_version is set in a per-environment environment.conf, we can make config_version much smarter. Again, looking in the environment.conf defined in my demonstration control repo produces this:

1
config_version = '/usr/bin/git --git-dir $confdir/environments/$environment/.git rev-parse HEAD'

This setting will cause the Puppet master to produce the most recent commit ID for whatever environment you’re in and embed it in the catalog and the report that is sent back to the Puppet master after a Puppet run.

I actually discovered a bug in config_version while writing this post, and it’s that config_version is subject to the relative pathing fun that other environment.conf settings are subject to. Relative pathing is great for things like modulepath, and it’s even good for config_version if you’re including the script you want to run to gather the config_version string inside the control repo, but using a one-line command that tries to execute a binary on the system that DOESN’T include the full path to the binary causes an error (because Puppet attempts to look for that binary in the current environment path, and NOT by searching $PATH on the system). Feel free to follow or comment on the bug if the mood hits you.

Caching and environment_timeout

The Puppet master loads environments on-request, but it also caches data associated with each environment to make things faster. This caching is finally tunable on a per-environment basis by defining the environment_timeout setting in environment.conf. The default setting is 3 minutes, which means the Puppet master will invalidate its caches and reload environment data every 3 minutes, but that’s now tunable. Definitely read up on this setting before making changes.

Classification

One of the last new features of directory environments is the ability to include an environment-specific site.pp file for classification. You could ALWAYS do this by modifying the manifest configuration item in puppet.conf, but now each environment can have its own manifest setting. The default behavior is to have the Puppet master look for manifests/site.pp in every environment directory, and I really wouldn’t change that unless you have a good reason. DO NOTE, however, that if you’re using Puppet Enterprise, you’ll need to be careful with your site.pp file. Puppet Enterprise defines things like the Filebucket and overrides for the File resource in site.pp, so if you’re using Puppet Enterprise, you’ll need to copy those changes into the site.pp file you add into your control repo (as I did).

It may take you a couple of times to change your thinking from looking at the main site.pp in $confdir/manifests to looking at each environment-specific site.pp file, but definitely take advantage of Puppet’s commandline tool to help you track which site.pp Puppet is monitoring:

1
2
3
4
5
[root@master /etc/puppetlabs/puppet/environments]# puppet config print manifest
/etc/puppetlabs/puppet/environments/production/manifests

[root@master /etc/puppetlabs/puppet/environments]# puppet config print manifest --environment webinar_env
/etc/puppetlabs/puppet/environments/webinar_env/manifests

You can see that puppet config print can be used to get the path to the directory that contains site.pp. Even cooler is what happens when you specify an environment that doesn’t exist:

1
2
[root@master /etc/puppetlabs/puppet/environments]# puppet config print manifest --environment bad_env
no_manifest

Yep, Puppet tells you if it can’t find the manifest file. That’s pretty cool.

Wrapping Up

Even though the new implementation of directory environments is meant to map closely to a workflow most of us have been using (if you’ve been using R10k, that is), there are still some new features that may take you by surprise. Hopefully this post gets you started with just enough information to setup your own test environment and start playing. PLEASE DO make sure to file bugs on any behavior that comes as unexpected or stops you from using your existing workflow. Cheers!

On R10k and ‘Environments’

There have been more than a couple of moments where I’m on-site with a customer who asks a seemingly simple question and I’ve gone “Oh shit; that’s a great question and I’ve never thought of that…” Usually that’s followed by me changing up the workflow and immediately regretting things I’ve done on prior gigs. Some people call that ‘agile’; I call it ‘me not having the forethought to consider conditions properly’.

‘Environment’, like ‘scaling’, ‘agent’, and ‘test’, has many meanings

It’s not a secret that we’ve made some shitty decisions in the past with regard to naming things in Puppet (and anyone who asks me what puppet agent -t stands for usually gets a heavy sigh, a shaken head, and an explanation emitted in dulcet, apologetic tones). It’s also very easy to conflate certain concepts that unfortunately share very common labels (quick – what’s the difference between properties and parameters, and give me the lowdown on MCollective agents versus Puppet agents!).

And then we have ‘environments’ + Hiera + R10k.

Puppet ‘environments’

Puppet has the concept of ‘environments’, which, to me, exist to provide a means of compiling a catalog using different paths to Puppet modules on the Puppet master. Using a Puppet environment is the same as saying “I made some changes to my tomcat class, but I don’t want to push it DIRECTLY to my production machines yet because I don’t drink Dos Equis. It would be great if I could stick this code somewhere and have a couple of my nodes test how it works before merging it in!”

Puppet environments suffer some ‘seepage’ issues, which you can read about here, but do a reasonable job of quickly testing out changes you’ve made to the Puppet DSL (as opposed to custom plugins, as detailed in the bug). Puppet environments work well when you need a pipeline for testing your Puppet code (again, when you’re refactoring or adding new functionality), and using them for that purpose is great.

Internal ‘environments’

What I consider ‘internal environments’ have a couple of names – sometimes they’re referred to as application or deployment gateways, sometimes as ‘tiers’, but in general they’re long-term groupings that machines/nodes are attached to (usually for the purpose of phased-out application deployments). They frequently have names such as ‘dev’, ‘test’, ‘prod’, ‘qa’, ‘uat’, and the like.

For the purpose of distinguishing them from Puppet environments, I’m going to refer to them as ‘application tiers’ or just ‘tiers’ because, fuck it, it’s a word.

Making both of them work

The problems with having Puppet environments and application tiers are:

  • Puppet environments are usually assigned to a node for short periods of time, while application tiers are usually assigned to a node for the life of the node.
  • Application tiers usually need different bits of data (i.e. NTP server addresses, versions of packages, etc), while Puppet environments usually use/involve differences to the Puppet DSL.
  • Similarly to the first point, the goal of Puppet environments is to eventually merge code differences into the main production Puppet environment. Application tiers, however, may always have differences about them and never become unified.

You can see where this would be problematic – especially when you might want to do things like use different Hiera values between different application tiers, but you want to TEST out those values before applying them to all nodes in an application tier. If you previously didn’t have a way to separate Puppet environments from application tiers, and you used R10k to generate Puppet environments, you would have things like long-term branches in your repositories that would make it difficult/annoying to manage.

NOTE: This is all assuming you’re managing component modules, Hiera data, and Puppet environments using R10k.

The first step in making both monikers work together is to have two separate variables in Puppet – namely $environment for Puppet environments, and something ELSE (say, $tier) for the application tier. The “something else” is going to depend on how your workflow works. For example, do you have something centrally that can correlate nodes to the tier in which they belong? If so, you can write a custom fact that will query that service. If you don’t have this magical service, you can always just attach an application tier to a node in your classification service (i.e. the Puppet Enterprise Console or Foreman). Failing both of those, you can look to external facts. External Fact support was introduced into Facter 1.7 (but Puppet Enterprise has supported them through the standard lib for quite awhile). External facts give you the ability to create a text file inside the facts.d directory in the format of:

1
2
tier=qa
location=portland

Facter will read this text file and store the values as facts for a Puppet run, so $tier will be qa and $location will be portland. This is handy for when you have arbitrary information that can’t be easily discovered by the node, but DOES need to be assigned for the node on a reasonably consistent basis. Usually these files are created during the provisioning process, but can also be managed by Puppet. At any rate, having $environment and $tier available allow us to start to make decisions based on the values.

Branch with $environment, Hiera with $tier

Like we said above, Puppet environments are frequently short-term assignments, while application tiers are usually long-term residencies. Relating those back to the R10k workflow: branches to the main puppet repo (containing the Puppetfile) are usually short-lived, while data in Hiera is usually longer-lived. It would then make sense that the name of the branches to the main puppet repo would resolve to being $environment (and thus the Puppet environment name), and $tier (and thus the application tier) would be used in the Hiera hierarchy for lookups of values that would remain different across application tiers (like package versions, credentials, and etc…).

Wins:

  • Puppet environment names (like repository branch names) become relatively meaningless and are the “means” to the end of getting Puppet code merged into the PUPPET CODE’s production branch (i.e. code that has been tested to work across all application tiers)
  • Puppet environments become short lived and thus have less opportunity to deviate from the main production codebase
  • Differences across application tiers are locked in one place (Hiera)
  • Differences to Puppet DSL code (i.e. in Manifests) can be pushed up to the profile level, and you have a fact ($tier) to catch those differences.

The ultimate reason why I’m writing about this is because I’ve seen people try to incorporate both the Puppet environment and application tier into both the environment name and/or the Hiera hierarchy. Many times, they run into all kinds of unscalable issues (large hierarchies, many Puppet environments, confusing testing paths to ‘production’). I tend to prefer this workflow choice, but, like everything I write about, take it and model it toward what works for you (because what works now may not work 6 months from now).

Thoughts?

Like I said before, I tend to discover new corner cases that change my mind on things like this, so it’s quite possible that this theory isn’t the most solid in the world. It HAS helped out some customers to clean up their code and make for a cleaner pipeline, though, and that’s always a good thing. Feel free to comment below – I look forward to making the process better for all!

Building a Functional Puppet Workflow Part 3b: More R10k Madness

In the last workflows post, I talked about dynamic Puppet environments and introduced R10k, which is an awesome tool for mapping modules to their environments which are dynamically generated by git branches. I didn’t get out everything I wanted to say because:

  • I was tired of that post sitting stale in a Google Doc
  • It was already goddamn long

So because of that, consider this a continuation of that previous monstrosity that talks about additional uses of R10k beyond the ordinary

Let’s talk Hiera

But seriously, let’s not actually talk about what Hiera does since there are better docs out there for that. I’m also not going to talk about WHEN to use Hiera because I’ve already done that before. Instead, let’s talk about a workflow for submitting changes to Hiera data and testing it out before it enters into production.

Most people store their Hiera data (if they’re using a backend that reads Hiera data from disk anyways) in separate repos as their Puppet repo. Some DO tie the Hiera datadir folder to something like the main Puppet repo that houses their Puppetfie (if they’re using R10k), but for the most part it’s a separate repo because you may want separate permissions for accessing that data. For the purposes of this post, I’m going to refer to a repository I use for storing Hiera data that’s out on Github.

The next logical step would be to integrate that Hiera repo into R10k so R10k can track and create paths for Hiera data just like it did for Puppet.

NOTE: Fundamentally, all that R10k does is checkout modules to a specific path whose folder name comes from a git branch. PUPPET ties its environment to this folder name with some puppet.conf trickery. So, to say that R10k “creates dynamic environments” is the end-result, but not the actual job of the tool.

We COULD add Hiera’s repository to the /etc/r10k.yaml file to track and create folders for us, and if we did it EXACTLY like we did for Puppet we would most definitely run into this R10k bug (AND, it comes up again in this bug).

UPDATE: So, I originally wrote this post BEFORE R10k version 1.1.4 was released. Finch released version 1.1.4 which FIXES THESE BUGS…so the workflow I’m going to describe (i.e. using prefixing to solve the problem of using multiple repos in /etc/r10k.yaml that could possibly share branch names) TECHNICALLY does NOT need to be followed ‘to the T’, as it were. You can disable prefixing when it comes to that step, and modify /etc/puppetlabs/puppet/hiera.yaml so you don’t prepend ‘hiera_’ to the path of each environment’s folder, and you should be totally fine…you know, as long as you use version 1.1.4 or greater of R10k. So, be forewarned

The issue is those bugs is that R10k collects the names of ALL the environments from ALL the sources at once, so if you have multiple source repositories and they share branch names, then you have clashes (since it only stores ONE branch name internally). The solution that Finch came up with was prefixing (or, prefixing the name of the branch with the name of the source). When you prefix, however, it creates a folder on-disk that matches the prefixed name (e.g. NameOfTheSource_NameOfTheBranch ). This is actually fine since we’ll catch it and deal with it, but you should be aware of it. Future versions of R10k may most likely deal with this in a different manner, so make sure to check out the R10k docs before blindly copying my code, okay? (Update: See the previous, bolded paragraph where I describe how Finch DID JUST THAT).

In the previous post I setup a file called r10k_installation.pp to setup R10k. Let’s revisit that manifest it and modify it for my Hiera repo:

/var/tmp/r10k_installation.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class { 'r10k':
  version           => '1.1.4',
  sources           => {
    'puppet' => {
      'remote'  => 'https://github.com/glarizza/puppet_repository.git',
      'basedir' => "${::settings::confdir}/environments",
      'prefix'  => false,
    },
    'hiera' => {
      'remote'  => 'https://github.com/glarizza/hiera_environment.git',
      'basedir' => "${::settings::confdir}/hiera",
      'prefix'  => true,
    }
  },
  purgedirs         => ["${::settings::confdir}/environments"],
  manage_modulepath => true,
  modulepath        => "${::settings::confdir}/environments/\$environment/modules:/opt/puppet/share/puppet/modules",
}

NOTE: For the duration of this post, I’ll be referring to Puppet Enterprise specific paths (like /etc/puppetlabs/puppet for $confdir). Please do the translation for open source Puppet, as R10k will work just fine with either the open source edition or the Enterprise edition of Puppet

You’ll note that I added a source called ‘hiera’ that tracks my Hiera repository, creates sub-folders in /etc/puppetlabs/puppet/hiera, and enables prefixing to deal with the bug I mentioned in the previous paragraph. Now, let’s run Puppet and do an R10k synchronization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
[root@master1 garysawesomeenvironment]# puppet apply /var/tmp/r10k_installation.pp
Notice: Compiled catalog for master1 in environment production in 1.78 seconds
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/content: content changed '{md5}c686917fcb572861429c83f1b67cfee5' to '{md5}69d38a14b5de0d9869ebd37922e7dec4'
Notice: Finished catalog run in 1.24 seconds

[root@master1 puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_testing
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/hiera

[root@master1 puppet]# ls /etc/puppetlabs/puppet/hiera
hiera_master  hiera_production  hiera_testing

[root@master1 puppet]# ls /etc/puppetlabs/puppet/environments/
development  garysawesomeenvironment  master  production

Great, so it configured R10k to clone the Hiera repository to /etc/puppetlabs/puppet/hiera like we wanted it to, and you can see that with prefixing enabled we have folders named “hiera_${branchname}”.

In Puppet, the magical connection that maps these subfolders to Puppet environments is in puppet.conf, but for Hiera that’s the hiera.yaml file. I’ve included that file in my Hiera repo, so let’s look at the copy at /etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml:

/etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml
1
2
3
4
5
6
7
8
9
10
---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - global

:yaml:
  :datadir: '/etc/puppetlabs/puppet/hiera/hiera_%{environment}/hieradata'

The magical line is in the :datadir: setting of the :yaml: section; it uses %{environment} to evaluate the environment variable set by Puppet and set the path accordingly.

As of right now R10k is configured to clone Hiera data from a known repository to /etc/puppetlabs/puppet/hiera, to create sub-folders based on branches to that repository, and to tie data provided to each Puppet environment to the respective subfolder of /etc/puppetlabs/puppet/hiera that matches the pattern of “hiera_(environment_name)”.

The problem with hiera.yaml

You’ll notice that each subfolder to /etc/puppetlabs/puppet/hiera contains its own copy of hiera.yaml. You’re probably drawing the conclusion that each Puppet environment can read from its own hiera.yaml for Hiera configuration.

And you would be wrong.

For information on this bug, check out this link. You’ll see that we provide a ‘hiera_config’ configuration option in Puppet that allows you to specify the path to hiera.yaml, but Puppet loads that config as singleton, which means that it’s read initially when the Puppet master process starts up and it’s NOT environment-aware. The workaround is to use one hiera.yaml for all environments on a Puppet master but to dynamically change the :datadir: path according to the current environment (in the same way that dynamic Puppet environments abuse ‘$environment’ in puppet.conf). You gain the ability to have per-environment changes to Hiera data but lose the ability to do things like using different hierarchies for different environments. As of right now, if you want a different hierarchy then you’re going to need to use a different master (or do some hacky things that I don’t even want to BEGIN to approach in this article).

In summary – there will be a hiera.yaml per environment, but they will not be consulted on a per-environment basis.

Workflow for per-environment Hiera data

Looking back on the previous post, you’ll see that the workflow for updating Hiera data is identical to the workflow for updating code to your Puppet environments. Namely, to create a new environment for testing Hiera data, you will:

  • Push a branch to the Hiera repository and name it accordingly (remembering that the name you choose will be a new environment).
  • Run R10k to synchronize the data down to the Puppet master
  • Add your node to that environment and test out the changes

For existing environments, simply push changes to that environment’s branch and repeat the last two steps.

NOTE: Puppet environments and Hiera environments are linked – both tools use the same ‘environment’ concept and so environment names MUST match for the data to be shared (i.e. if you create an environment in Puppet called ‘yellow’, you will need a Hiera environment called ‘yellow’ for that data).

This tight-coupling can cause issues, and will ultimately mean that certain branches are longer-lived than others. It’s also the reason why I don’t use defaults in my hiera() lookups inside Puppet manifests – I WANT the early failure of a compilation error to alert me of something that needs fixed.

You will need to determine whether this tight-coupling is worth it for your organization to tie your Hiera repository directly into R10k or to handle it out-of-band.

R10k and monolithic module repositories

One of the first requirements you encounter when working with R10k is that your component modules need to be stored in their own repositories. That convention is still relatively new – it wasn’t so long ago that we were recommending that modules be locked away in a giant repo. Why?

  • It’s easier to clone
  • The state of module reusability was poor

The main reason was that it was easier to put everything in one repo and clone it out on all your Puppet master servers. This becomes insidious as your module count rises and people start doing lovely things like committing large binaries into modules, pulling in old versions of modules they find out on the web, and the like. It also becomes an issue when you start needing to lock committers out of specific directories due to sensitive data, and blah blah blah blah…

There are better posts out there justifying/villafying the choice of one or multiple repositories, this section’s meant only to show you how to incorporate a single repository containing multiple modules into your R10k workflow.

From the last post you’ll remember that the Puppetfile allows you to tie a repository, and some version reference, to a directory using R10k. Incorporating a monolithic repository starts with an entry in the Puppetfile like so:

Puppetfile
1
2
3
mod "my_big_module_repo",
  :git => "git://github.com/glarizza/my_big_module_repo.git",
  :ref => '1.0.0'

NOTE: That git repository doesn’t exist. I don’t HAVE a monolithic repo to demonstrate, so I’ve chosen an arbitrary URI. Also note that you can use ANY name you like after the mod syntax to name the resultant folder – it doesn’t HAVE to mirror the URI of the repository.

Adding this entry to the Puppetfile would checkout that repository to wherever all the other modules are checked out with a folder name of ‘my_big_module_repo’. Within that folder would most-likely (again, depending on how you’ve laid out your repository) contain subfolders containing Puppet modules. This entry gets the modules onto your Puppet master, but it doesn’t make Puppet aware of their location. For that, we’re going to need to add an entry to the ‘modulepath’ configuration item in puppet.conf

Inside /etc/puppetlabs/puppet/puppet.conf you should see a configuration item called ‘modulepath’ that currently has a value of:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules

The modulepath itself works like a PATH environment variable in Linux – it’s a priority-based lookup mechanism that Puppet uses to find modules. Currently, Puppet will first look in /etc/puppetlabs/puppet/environments/$environment/modules for a module. If a the module that Puppet was looking for was found, Puppet will use it and not inspect the second path. If the module was not found at the FIRST path, it will inspect the second path. Failing to find the module at the second path results in a compilation error for Puppet. Using this to our advantage, we can add the path to the monolithic repository checked-out by the Puppetfile AFTER the path to where all the individual modules are checked-out. This should look something like this:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/etc/puppetlabs/puppet/environments/$environment/modules/my_big_module_repo:/opt/puppet/share/puppet/modules

Note: This assumes all modules are in the root of the monolithic repo. If they’re in a subdirectory, you must adjust accordingly

That’s a huge line (and if you’re afraid of anything over 80 column-widths then I’m sorry…and you should probably buy a new monitor…and the 80s are over), but the gist is that we’re first going to look for modules checked out by R10k, THEN we’re going to look for modules in our monolithic repo, then we’re going to look in Puppet Enterprise’s vendored module directory, and finally, like I said above, we’ll fail if we can’t find our module. This will allow you to KEEP using your monolithic repository and also slowly cut modules inside that monolithic repo over to their own repositories (since when they gain their own repository, they will be located in a path that COMES before the monolithic repo, and thus will be given priority).

Using MCollective to perform R10k synchronizations

This section is going to be much less specific than the rest because the piece that does the ACTION is part of a module for R10k. As of the time of this writing, this agent is in one state, but that could EASILY change. I will defer to the module in question (and specifically its README file) should you need specifics (or if my module is dated). What I CAN tell you, however, is that the R10k module does come with a class that will setup and configure both an MCollective agent for R10k and also a helper application that should make doing R10k synchroniations on multiple Puppet masters much easier than doing them by hand. First, you’ll need to INSTALL the MCollective agent/application, and you can do that by pulling down the module and its dependencies, and classifying all Puppet masters with R10k enabled by doing the following:

1
include r10k::mcollective

Terribly difficult, huh? With that, both the MCollective agent and application should be available to MCollective on that node. The way to trigger a syncronization is to login to an account on a machine that has MCollective client access (in Puppet Enterprise, this would be any Puppet master that’s allowed the role, and then, specifically, the peadmin user…so doing a su - peadmin should afford you access to that user), and perform the following command:

1
mco r10k deploy

This is where the README differs a bit, and the reason for that is because Finch changed the syntax that R10k uses to synchronize and deploy modules to a Master. The CURRENTLY accepted command (because, knowing Finch, that shit might change) is r10k deploy environment -p, and the action to the MCollective agent that EXECUTES that command is the ‘deploy’ action. The README refers to the ‘synchronize’ action, which executes the r10k synchronize command. This command MAY STILL WORK, but it’s deprecated, and so it’s NOT recommended to be used.

Like I said before, this agent is subject to change (mainly do to R10k command deprecation and maturation), so definitely refer to the README and the code itself for more information (or file issues and pull requests on the module repo directly).

Tying R10k to CI workflows

I spent a year doing some presales work for the Puppet Labs SE team, so I can hand-wave and tapdance like a motherfucker. I’m going to need those skills for this next section, because if you thought the previous section glossed over the concepts pretty quickly and without much detail, then this section is going to feel downright vaporous (is that a word? Fuck it; I’m handwaving – it’s a word). I really debated whether to include the following sections in this post because I don’t really give you much specific information; it’s all very generic and full of “ideas” (though I do list some testing libraries below that are helpful if you’ve never heard of them). Feel free to abandon ship and skip to the FINAL section right now if you don’t want to hear about ‘ideas’.

For the record, I’m going to just pick and use the term “CI” when I’m referring to the process of automating the testing and deployment of, in this case, Puppet code. There have definitely been posts arging about which definition is more appropriate, but, frankly, I’m just going to pick a term and go with it,

The issue at hand is that when you talk “CI” or “CD” or “Continuous (fill_in_the_blank)”, you’re talking about a workflow that’s tailored to each organization (and sometimes each DEPARTMENT of an organization). Sometimes places can agree on a specific tool to assist them with this process (be it Jenkins, Hudson, Bamboo, or whatever), but beyond that it’s anyone’s game.

Since we’re talking PUPPET code, though, you’re restricted to certain tasks that will show up in any workflow…and THAT is what I want to talk about here.

To implement some sort of CI workflow means laying down a ‘pipeline’ that takes a change of your Puppet code (a new module, a change to an existing module, some Hiera data updates, whatever) from the developer’s/operations engineer’s workstation right into production. The way we do this with R10k currently is to:

  • Make a change to an individual module
  • Commit/push those changes to the module’s remote repository
  • Create a test branch of the puppet_repository
  • Modify the Puppetfile and tie your module’s changes to this environment
  • Commit/push those changes to the puppet_repository
  • Perform an R10k synchronization
  • Test
  • Repeat steps 1-7 as necessary until shit works how you like it
  • Merge the changes in the test branch of the puppet_repository with the production branch
  • Perform an R10k synchronization
  • Watch code changes become active in your production environment

Of those steps, there’s arguably about 3 unique steps that could be automated:

  • R10k synchronizations
  • ‘Testing’ (whatever that means)
  • Merging the changes in the test branch of the puppet_repository with the production branch

NOTE: As we get progressively-more-handwavey (also probably not a word, but fuck it – let’s be thought leaders and CREATE IT), each one of these steps is going to be more and more…generic. For example – to say “test your code” is a great idea, but, seriously, defining how to do that could (and should) be multiple blog posts.

Laying down the pipeline

If I were building an automated workflow, the first thing I would do is setup something like Jenkins and configure it to watch the puppet_repository that contains the Puppetfile mapping all my modules and versions to Puppet environments. On changes to this repository, we want Jenkins to perform an R10k synchronization, run tests, and then, possibly, merge those changes into production (depending on the quality of your tests and how ‘webscale’ you think you are on that day).

R10k synchronizations

If you’re paying attention, we solved this problem in the previous section with the R10k MCollective agent. Jenkins should be running on a machine that has the ability to execute MCollective client commands (such as triggering mco r10k deploy when necessary). You’ll want to tailor your calls from Jenkins to only deploy environments it’s currently testing (remember in the puppet_repository that topic branches map to Puppet environments, so this is a per-branch action) as opposed to deploying ALL environments every time.

Also, if you’re buiding a pipeline, you might not want to do R10k synchronizations on ALL of your Puppet Masters at this point. Why not? Well, if your testing framework is good enough and has sufficient coverage that you’re COMPLETELY trusting it to determine whether code is acceptable or not, then this is just the FIRST step – making the code available to be tested. It’s not passed tests yet, so pushing it out to all of your Puppet masters is a bit wasteful. You’ll probably want to only synchronize with a single master that’s been identified for testing (and a master that has the ability to spin up fresh nodes, enforce the Puppet code on them, submit those nodes to a battery of tests, and then tear them down when everything has been completed).

If you’re like the VAST majority of Puppet users out there that DON’T have a completely automated testing framework that has such complete coverage that you trust it to determine whether code changes are acceptable or not, then you’re probably ‘testing’ changes manually. For these people, you’ll probably want to synchronize code to whichever Puppet master(s) are suitable.

The cool thing about these scenarios is that MCollective is flexible enough to handle this. MCollective has the ability to filter your nodes based on things like available MCollective agents, Facter facts, Puppet classes, and even things like the MD5 hashes of arbitrary files on the filesystem…so however you want to restrict synchronization, you can do it with MCollective.

After all of that, the answer here is “Use MCollective to do R10k syncs/deploys.”

Testing

This section needs its own subset of blog posts. There are all kinds of tools that will allow you to test all sorts of things about your Puppet code (from basic syntax checking and linting, to integration tests that check for the presence of resources in the catalog, to acceptance-level tests that check the end-state of the system to make sure Puppet left it in a state that’s acceptable). The most common tools for these types of tests are:

Unfortunately, the point of this section is NOT to walk you through setting up one or more of those tools (I’d love to write those posts soon…), but rather to make you aware of their presence and identify where they fit in our Pipeline.

Once you’ve synchronized/deployed code changes to a specific machine (or subset of machines), the next step is to trigger tests.

Backing up the train a bit, certain kinds of ‘tests’ should be done WELL in advance of this step. For example, if code changes don’t even pass basic syntax checking and linting, they shouldn’t even MAKE it into your repository. Things like pre-commit hooks will allow you to trigger syntactical checks and linting before a commit is allowed. We’re assuming you’ve already set those up (and if you’ve NOT, then you should probably do that RIGHT NOW).

Rather, in this section, we’re talking about doing some basic integration smoke testing (i.e. running the rspec-puppet tests on all the modules to ensure that what we EXPECT in the catalog is actually IN the catalog), moving into acceptance level testing (i.e. spinning up pristine/clean nodes, actually applying the Puppet code to the nodes, and then running things like Beaker or Serverspec on the nodes to check the end-state of things like services, open ports, configuration files, and whatever to ensure that Puppet ACTUALLY left the system in a workable state), and then returning a “PASS” or “FAIL” response to Jenkins (or whatever is controlling your pipeline).

These tests can be as thorough or as loose as is acceptable to you (obviously, the goal is to automate ALL of your tests so you don’t have to manually check ANY changes, but that’s the nerd-nirvana state where we’re all browsing the web all day), but they should catch the most NOTORIOUS and OBVIOUS things FIRST. Follow the same rules you did when you got started with Puppet – catch the things that are easiest to catch and start building up your cache of “Total Time Saved.”

Jenkins needs to be able to trigger these tests from wherever it’s running, so your Jenkins box needs the ability to, say, spin up nodes in ESX, or locally with something like Vagrant, or even cloud nodes in EC2 or GCE, then TRIGGER the tests, and finally get a “PASS” or “FAIL” response back. The HARDEST part here, by far, is that you have to define what level of testing you’re going to implement, how you’re going to implement it, and devise the actual process to perform the testing. Like I said before, there are other blog posts that talk about this (and I hope to tackle this topic in the very near future), so I’ll leave it to them for the moment.

To merge or not to merge

The final step for any test code is to determine whether it should be merged into production or not. Like I said before, if your tests are sufficient and are adequate at determining whether a change is ‘good’ or not, then you can look at automating the process of merging those changes into production and killing off the test branch (or, NOT merging those changes, and leaving the branch open for more changes).

Automatically merging is scary for obvious reasons, but it’s also a good ‘test’ for your test coverage. Committing to a ‘merge upon success’ workflow takes trust, and there’s absolutely no shame in leaving this step to a human, to a change review board, or to some out-of-band process.

Use your illusion

These are the most common questions I get asked after the initial shock of R10k, and its workflow, wears off. Understand that I do these posts NOT from a “Here’s what you should absolutely be doing!” standpoint, but more from a “Here’s what’s going on out there.” vantage. Every time I’m called on-site with a customer, I evaluate:

  • The size and experience level of the team involved
  • The processes that the team must adhere to
  • The Puppet experience level of the team
  • The goals of the team

Frankly, after all those observations, sometimes I ABSOLUTELY come to the conclusion that something like R10k is entirely-too-much process for not-enough benefit. For those who are a fit, though, we go down the checklists and tailor the workflow to the environment.

What more IS there on R10k?

I do have at least a couple of more posts in me on some specific issues I’ve hit when consulting with companies using R10k, such as:

  • How best to use Hiera and R10k with Puppet ‘environments’ and internal, long-term ‘environments’
  • Better ideas on ‘what to branch and why’ with regard to component modules and the puppet_repository
  • To inherit or not to inherit with Roles
  • How to name things (note that I work for Puppet Labs, so I’m most likely very WRONG with this section)
  • Other random things I’ve noticed…

Also, I apologize if it’s been awhile since I’ve replied to a couple of comments. I’m booked out 3 months in advance and things are pretty wild at the moment, but I’m REALLY thankful of everyone who cares enough to drop a note, and I hope I’m providing some good info you can actually use! Cheers!