Shit Gary Says

...things I don't want to forget

Workflows Evolved: Even Besterer Practices

It’s nearly been two years since I posted the Puppet Workflow series and several things have changed:

  • R10k now ships with Puppet Enterprise and there are docs for it!
  • There’s even a pe_r10k module that ships with Puppet Enterprise 2015.2.x and higher to configure R10k
  • Control repos are the standard and are popping up all over the place
  • Most people are bundling Hiera data with their Control repo (unless they have a very good reason not to)
  • Ditto for Roles and Profiles
  • The one-role-per-node rule is a good start, but PE’s rules-based classification engine allows us to relax that rule
  • Roles still include Profiles, but conditional logic is allowed and recommended to keep Hiera hierarchy levels minimal
  • ‘Data’ goes in Hiera, but the definition of ‘data’ changes between organizations
  • There’s now a (somewhat) defined path for whether ‘data’ is included in a profile or Hiera
  • Automatic Parameter Lookup + Hiera…it’s still hard to debug, but we’re getting there
  • I’m incredibly wary of taking Uber during peak travel times with rate multipliers

It’s been awhile since I’ve had a good rant, so let’s get right into it!

Code Management with R10k

As of PE 3.8, R10k became bundled with Puppet Enterprise (PE) and was referred to as “Code Management” which initially confused people because the only thing about PE that was changed was that the R10k gem was preinstalled into PE’s Ruby installation. The purpose of this act was twofold:

  1. The Professional Services team was installing R10k in essentially EVERY services engagement, and so it made sense to ship R10k and thus officially support its installation
  2. We’ve always had plans to keep the functionality that R10k provided but not NECESSARILY the tool-known-as-R10k, so calling the service it provided something OTHER than R10k would allow us to swap out the implementation underneath the hood while still being able to talk about the functionality it provided

Of course, if you didn’t live inside Puppet Labs it’s possible that you might not have gotten this memo, but, hey: better late than never?

For various reasons, we also never initially shipped a PE-specific module to configure R10k, so you ALSO had to either manually setup r10k.yaml or use Zack Smith’s R10k module to manage that file. Of course, that module did all kinds of OTHER things (like installing the R10k gem, setting up webhooks, and making my breakfast), which meant that if you used it with the version of PE that shipped R10k, you had to be careful to use the version of the module that didn’t ALSO try to upgrade that gem on your system (and whoops if the module actually upgraded the version of R10k that we shipped). This is why that module is Puppet Approved but not an offical Puppet Labs module: it does things that we would consider “unsupported” outside of a professional services engagement (i.e. the webhook stuff). Finally, the path to r10k.yaml was changed to /etc/puppetlabs/r10k/r10k.yaml, but, in its absence, the old path of /etc/r10k.yaml would be used and a message would be displayed to inform you of the new file path (in the case that both files were present, the file at /etc/puppetlabs/r10k/r10k.yaml would win).

When PE version 2015.2.0 shipped (I’m still not used to these version numbers either, folks), we FINALLY shipped a pe_r10k module with similar structure to Zack’s R10k module – this meant you could FINALLY setup R10k immediatly without having to install additional Puppet modules. Even better(er), in PE 2015.2.2 we expose a couple of PE installer answer file questions that allow you to configure R10k DURING INSTALL TIME – so now your servers could be immediately bootstrapped with a single answers file (seriously, I know, it’s about time; I do this shit every week, you have no idea). It finally feels like R10k has grown into the first-class citizen we all wanted it to be!

Which means it’s time to dump it.

I kid. Mostly. The fact of the matter is that we’re introducing a new service to manage code within Puppet Enterprise, and if you’re interested in reading more about it, check out this blog post by Lindsay Smith about Code Manager. For you, the consumer, the process will be the same: you have a control repo, you push changes, a service is triggered on your Puppet masters, and code is synchronized on the Puppet master. What WILL change is the setup of this tool (there will still be PE installer answer file questions that allow you to configure this service, don’t fret, and you’ll still be able to configure this service through a Puppet module, but the name of said module and configuration files on disk will probably be different. Welcome to IT).

Be on the lookout for this service, and, as always, check out the PE docs site for more information on the Code Management service.

Control (repo) freak

With the explosion of R10k came the explosion of “Control Repos” all over the place. Everyone had one, everyone had an opinion on what worked best, and, well, we didn’t really do a good job at offering a good startup control repo for you. Because of that, we recently posted a ‘starter’ control repo on Github in the Puppet Labs namespace that could be used to get started with R10k. Yes, it’s definitely long overdue, but there it is! I use it on all engagements I do with new customers, so you can guarantee it’ll have the use of Puppet Labs’ PS team behind it. If you’ve not started with R10k yet (or if you have but you wanna see what kinda crazy shit we’re doing now), check it out. It’s got great stuff in there like a config_version script to spit out the most recent commit of the current branch of the control repo (read: also current Puppet environment) as the “Config Version” string that Puppet prints out during every Puppet run (see here for more info on this functionality). We’re also slowly adding things like inital bootstrapping profiles that will do things like configure R10k/Code Manager, manage the SSH key necessary to contact the control repo (should you be using an internal git repository server and also require an SSH key to access that repo), and so on. Star that repo and keep checking back, especially around PE releases, to see if we’ve updated things in a way that will help you out!

“Just put it in the control repo”

Look, if there’s one thing that my blog emphasizes (other than the fact that I’ve got a hairpin trigger for cursing and an uncomfortable Harry Potter fetish) it’s that “best practices” are inversely related to the insecurities of the speaker. Fortunately, I have no problem saying when I’m wrong. If you’ve got the time, allow me my mea culpa moment. In the past I had recommended:

  • Using a separate git repo for Hiera data
  • Using separate git repos for Roles and Profiles
  • The Dave Matthews Band

Time, experience, and the legalization of recreational marijuana in Oregon have helped me see the error in my ways (though, look, #41 is a good goddamn song, especially on the Dave & Tim Live at Luther College album), so allow me to provide some insight into WHY I’ve reconsidered my message(s)…

Hiera Data

In the past, I recommended a separate git repo for Hiera data along with a separate entry in r10k.yaml that would allow R10k to clone the Hiera data repo along the same vein as the control repo. The pro was that a separate Hiera data repo would afford you different access rights to this repo as you would the control repo (especially if different people needed different access to each function). The con was that now the branch structure of your Hiera data repo needed to EXACTLY MIRROR the structure of your control repo….even if certain branches had EXACTLY THE SAME Hiera data and no changes were necessary.

Puppet has enough moving parts, why did we need to complicate this if most people didn’t care about access levels differing between the two repos? The solution was to bundle the Hiera data inside the control repo all the way up until you had a specific need to split it out. Truth be told both methods work with Puppet, so the choice is up to you (read: I DON’T CARE WHICH METHOD YOU USE OH MY GOD WILL YOU QUIT TRYING TO PICK A FIGHT WITH ME OVER THIS LOL) :)

Finally, there’s an added benefit of putting this data inside the control repo, and it’s ALSO the reason for the next recommendation…

Roles and Profiles

This is one that I actually fought when someone suggested it…I even started to recommend that a customer NOT do the thing I’m about to recommend to you until they very eloquently explained why they did it. In the end, they were right, and I’m passing this tip on to you: Unless you have a very specific reason NOT to, put your ‘roles’ and ‘profiles’ modules in your control repo.

Here’s the thing about the control repo – you can set a post-receive hook on the repository (or setup a Jenkins/Bamboo/whatever job) that will update all your Puppet masters whenever changes are pushed to the remote git repository (i.e. your git repository server). This means that anytime the control repo is updated your Puppet masters will be updated. That’s why it’s CALLED the control repo – it effectively CONTROLS your Puppet masters.

Understanding THAT, think about when you want your Puppet masters updated? Well, you usually want to update them when you’re testing something out – you made a change to a couple of modules, then a profile (and possibly also a role), and now you wanna see if that code works on more than just your local laptop. But the Puppet landscape has changed a bit as the Puppet Forge has matured – most people are using modules off the Forge and are at least TRYING not to use their own component modules. This means that changes to your infrastructure are being controlled from within roles/profiles. But even IF you’re one of those people who aren’t using the Forge or who have to update an internal component module, you’re probably not wanting to update all your Puppet masters every time you update a component module. There’s probably lots of tinkering there, and every change isn’t “update-worthy”. Conversely, changes to your profiles probably ARE “update-worthy”: “Okay, let’s pull this bit from Hiera, pass it as a parameter, and now I’m ready to check it out on a couple of machines.”

If your roles and profiles modules are separate from your control repo, you end up having to push changes to, say, a class in the profiles module, then updating the Puppetfile in the control repo, then trigger an R10k run/sync. If things aren’t correct, you end up changing the profile, pushing that change to the profile repo, and THEN having to trigger an R10k run/sync (and if you don’t have SSH access to your masters, you have to make a dummy commit to the control repo so it triggers an R10k run OR doing a curl to some endpoint that will update your Puppet master for you). That last step is the thing that ends up wasting a bit of your time: why do we need to push a profile and then manually do an R10k run of we’ve established that roles and profiles will pretty much ALWAYS be “update-worthy”? We don’t. If you put the roles and profiles module inside the control repo, then it will automatically update your Puppet masters every time you make a change to one or the other. Bam – step saved. ALSO, if you do this, you can take Roles/Profiles out of Puppetfile, which means you no longer need to pin them! No more will you have to tie that module to a topic branch during development time: just create a branch of the control repo and go to town! Wow, that saves even more time! I’m uncomfortable with this level of excitement!

The one thing you WILL need to do is to update environment.conf so that it knows to look for the roles/profiles modules in a different path from all the other modules (because removing it from Puppetfile means that it will no longer go to the same modulepath as every other module managed inside Puppetfile). For the purposes of cleanliness, we usually end up putting both roles/profiles inside a site folder in the control repo. If you do that, your modulepath in environment.conf looks a little something like this:

1
modulepath = site:modules:$basemodulepath

This means that Puppet will look for modules first in the ‘site’ directory of its current environment (this is the directory where we put roles/profiles), and then inside the ‘modules’ directory (this is where modules managed in Puppetfile are cloned by default), and then in $basemodulepath (i.e. modules common to all environments and also modules that Puppet Enterprise ships).

LOOK, BEFORE YOU FREAK OUT, YES, SITE COMES FIRST HERE, AND OTHER PEOPLE HAVE SITE COME SECOND! Basically, if you have roles/profiles in the ‘site’ directory AND you manage to still have the module in Puppetfile, then the module in the ‘site’ directory will win. Feel free to flip/flop that if you want.

TL;AR: (yes, you already read all of this so it’s futile) put roles/profiles inside the site directory of the control repo to save you time, but also don’t do it if you have a specific reason not to…or if you like being contrarian.

Dave Matthews

The “Everyday” album was the “jump the shark” moment for the Dave Matthews band, while the leaked “Lillywhite Sessions” that would largely make it to “Busted Stuff” definitely indicated where the band wanted to go. They never recovered after that, and, just like Boone’s Farm ‘wine’, I stopped partaking in them.

Also, not ONCE did being able to play most every Dave Matthews song on the acoustic guitar ever get me laid…though I can’t tell exactly whose fault that was. On second thought, that was probably me. Though Tim Reynolds is an absolute beast of a musician; I’m still #teamtim.

One role per node, until you don’t want to

Why do we even make these rules if you’re not gonna follow them? It’s getting awfully “Who’s Line Is It Anyways?” up in here. Before PE 3.7, and its rules-based classification engine, we recommended not assigning more than one role to a node. Why? Well, the Puppet Enterprise Console around that time wasn’t the best at tracking changes or providing authentication around tasks like classification. This meant if you tried to manage ALL of your classification within the console you could have a hard time telling when things changed or why. Fortunately, git provides you with this functionality. Because of that, we (and when I say ‘we’ I mean ‘everyone in the field trying to design a Puppet workflow that not only made sense but also had some level of accountability’) tried to displace most classification tasks from the Console into flat files that could be managed with git. This is largely the impetus for Roles and Profiles when you think about it: Profiles connect Puppet to external ata and give you a layer to express dependencies between multiple Puppet classes, and Roles is a mechanism for boiling down classification to a single unit.

Once we launched a new Node Classifier that had a rules-based classification engine AND role-based authentication control, we became more comfortable delegating some of these classification tasks BACK to the console. The Node Classifier ALSO made it easy to click on a node and not only see what was classified to that node, but also WHERE it got that bit of classification from (“This node is getting the JBoss profile because it was put into the App Servers nodegroup”). With that level of accountability, we could start relaxing our “One Role Per Node™” mandate, OR eliminate the roles module altogether (and use nodegroups in the Node Classifier in place of roles).

The goal has always been to err on the side of “debugability” (I like making words). I will usually try to optimize a task for tracing errors later, because I’ve been a sysadmin where the world is falling apart around you and you need to quickly determine what caused this mess. Using one role per node makes sense if you don’t use a node classifier that gives you this flexibility, but MIGHT not if you DO use a classifier that has some level of accountability.

Roles, conditional logic, Hiera, and you

Over time as I’ve talked to people that ended up building Puppet workflows based on the things I’ve written (which still feels batshit crazy to me, by the way, since I’ve known myself for over 34 years), I’ve noticed that people seem to take the things I say VERY LITERALLY. And to this I say: “You should probably send me money via Paypal.” Also – note that I’m writing these things to address the 80% of people out there using/getting started with Puppet. You don’t HAVE to do what I say, especially if you have a good reason not to, and you SHOULDN’T do what I say, especially if you’re the one that’s going to stay with that organization forever and manage the entire Puppet deployment. For everyone else out there, let’s talk some more about roles.

The talking points around roles has always been “Roles include profiles; that’s it.” Again, going back to the idea that roles exist to support classification, this makes sense – you don’t want to add resources at a very high level like a roles class because, well, honestly, there’s probably a better place for it, but any logic added to simply classification is a win.

Consider an organization that has both Windows and Linux application servers. The question of whether to have separate roles for Linux and Windows application servers is always one of the first questions to be surfaced. At a low level, everything you do in a Puppet manifest is solely for the purpose of getting resources into the catalog (a JSON object containing a list of all resource Puppet is to be managing ond their desired end-state). Whether you have two different roles matters not to Puppet so long as the right node gets the right catalog. For a Puppet developer writing code, having two separate roles also might not matter (and, in reality, based on the amount of code assigned to either role, it might be cleaner to have different roles for each). For the person in charge of classifying nodes with their assigned role, it’s probably easier to have a single role (roles::application_server, for example) that can be assigned to ALL application servers, and then logic inside the role to determine whether this will be a Windows application server using IIS or a Linux application server using JBoss (or, going further, a Linux application server running Weblogic, or Websphere, or Tomcat, whatever). Like we mentioned in the previous point, if you’re using the “One role per node” philosophy, then you probably want a single role with conditional logic to determine Windows/Linux, and then determine Tomcat/JBoss, and so on. If you’re using the Puppet Enterprise Console’s node classifier, and thus the rule-based engine, you can afford not to care about the number of node groups you create because you can create a rule to match for application servers, and then a rule to match on operating system, and create as many rules as you want to dynamically discover and classify nodes on the fly.

The point here is that the PURPOSE of the Role is to aid classification, and the focus on creating a role is to start small, use conditional logic to determine which profiles to include, and then simply include them. If that conditional logic uses Facter facts, awesome. If you need to look at a variable coming from the Console to do the job, fine – go for it! But if you’re using the Role as a substitute for a Profile (i.e. data lookups, declaring classes, even declaring resources), then you’re probably going down a path that’s gonna make it confusing for people follow what’s going on.

Bottom line: technology-agnostic roles that utilize conditional logic around including profiles is a win, but keep tasks like declaring resources and component modules to Profiles. Doing this provides a top-down path for debugging and a cleaner overall Puppet codebase.

What the hell is ‘Data’ anyhow?

This point has single-handedly caused more people to come up and argue with me. I’m not kidding. I shit you not, I’ve had people legitimately *SCREAM* at me about how wrong I was with my opinions here. The cool thing is that people LOVE the idea of Hiera – it lets you keep the business-specific data out of your Puppet manifests, it’s expressed in YAML and not the Puppet DSL, and when it works, it’s magical.

The problem is that it’s fucking magical. Seriously.

So what IS a good use of Hiera? Anytime you have a bit of data that is subject to override (for example: the classical NTP problem where everyone should use the generic company NTP server, except nodes at this location should use a different NTP server, and this particular node should use ITSELF as its NTP server), that bit of data goes into Hiera (and by ‘that bit of data’, I mean ‘the value of the NTP server’ or ‘the NTP server’s FQDN’), which would look SOMETHING like this:

1
ntpserver: pool.ntp.org

What does NOT go into Hiera is a hash-based representation of the Puppet resource that would then be passed to create_resources() and used to create the resource in the catalog…which would look something like this:

1
2
3
4
5
6
7
ntpfiles:
  '/etc/ntp/ntpd.conf':
    ensure: file
    owner:  0
    group:  0
    mode:   0644
    source: 'puppet:///modules/ntp/ntpd.conf'

…which would then be passed into Puppet like this:

1
create_resources('file', hiera_hash('ntpfiles))

Yes, this is an exaggeration based on a very narrow use case, but what I’m trying to highlight is that the ‘data’ bit in all that above mess is SOLELY an FQDN, and everything else is arguably the “Model”, or your Puppet code.

Organizations LOVE that you can put as much “stuff” into Hiera as you want and then Puppet can call Hiera, create resources based on what it tells you, and merrily be on your way. Well, they “love” it until it doesn’t work or does something unexpected, and then debugging Hiera is a right bastard.

Understand that the problem I have would be with unexpected Hiera behavior. If you’re skilled in the ways of the Hiera and its (sometimes cloudy) interaction with Puppet, then by ALL means use it for whatever ya like. BUT, if you’re still new to Puppet, then you may have a very loose mental map for how Hiera works and where it interacts with Puppet…and nobody should have to have that advanced level of knowledge just to debug the damn thing.

The Hiera + create_resources() use above is of particular nastiness simply because it turns your Hiera YAML files into a potential mechanized weapon of Puppet destruction. If I know that you’re doing this under the hood, I could POTENTIALLY slip data into Hiera that would end up creating resources on a node to do what I want. Frequently Puppet code is more heavily scrutinized than Hiera data, and I could see something like this getting overlooked (especially if you don’t have a ton of testing around your Puppet code before it gets deployed).

The REASON why create_resources() was created was because Puppet lacked the ability to do things like recursion and loops inside the DSL, and sometimes you WANT to automate very repeated tasks. Consider the case where you truly DON’T know how many of something is going to be on a node ahead of time – maybe you’re using VMware vRO/vRA and someone is building a node on-the-fly with the web GUI. For every checkbox someone ticks there will be another application to be installed, or another series of firewall rules, or SOMETHING like that. You can choose to model these individually with profiles, OR, if the task is repetitive, you can accept their choices as data and feed it back into Puppet like a defined resource type. In fact, most use-cases for Hiera + create_resources() is passing data into a defined resource type. As of Puppet 4.x.x, we have looping constructs inside the DSL, so we can finally AUTOMATE these tasks without having to use an extra function (of course, in THIS use case, whether you use recursion/looping in the DSL or create_resources() matters not – you get the same thing in the end).

For one last point, the Puppet DSL is still pretty easy to read (as of right now), and most people can follow what’s going on even if they’re NOT PuppEdumicated. Having 10 resource declarations in a row seems like a pain in the ass to write when you’re doing it, but READING it makes sense. Later on, if you need to know what’s going on with this profile, you can scan it and see exactly what’s there. If you start slipping lots of data into Hiera and looping logic into the DSL, you’re gonna force the person who manages Puppet to go back and forth between reading Hiera code, then back to Puppet code, then back to the node, and so on. Again, it’s totally possible to do now, and frequently NECESSARY when you have a more complex deployment and well-trained Puppet administrators, but initially it’s possible to build your own DSL to Puppet by slipping things into Hiera and running away laughing.

So when do I put this ‘data’ into the Profile and when is a good time to put it into Hiera? I’m glad you asked…

A path to Hiera data

These last two points I’ve written about before. I may be repeating myself, but bytes are cheap. Like I wrote above (and before), putting data directly into a Profile is the easiest and most legible way of providing “external data” into Puppet. Yes, you’ll argue, putting the data into a Profile, which is Puppet code, is ARGUABLY NOT being very “external” about it. In my opinion it is – your Profile is YOUR IMPLEMENTATION of a technology stack, and thus isn’t going to be shared outside your organization. I consider that external to all the component modules out there, but, again, potato/potato. I recommend STARTING HERE when you’re getting started with Puppet. Hiera comes in when you have a very clear-cut need for overriding data (a la: this NTP server everywhere, except here and here). The second you might need to have different data, you can either start building conditional logic inside the Profile, OR use the conditional logic that Hiera provides.

So – which do you use?

The point of Hiera is to solve 80% or better of all conditional choices in your organization. Consider this data organization model:

  • Everyone shares most of the same data items
  • San Francisco/London do their own things sometimes
  • Application tiers get their own level for dev/test/qa/prod-specific overrides
  • Combinations of tiers/locations/and business units want their own overrides
  • Node specific data is the most specific (and least-used) level

If you’re providing some data to Puppet that follows this model, then cool – use Hiera. What about specific “exceptions” that don’t fit this model? Do you try to create specialized layers in Hiera just for these exceptions? Certain organizations absolutely do – I see it all the time. What you find is that certain layers in Hiera go together (this location/tier/business_unit level goes right above location/tier, which goes right above location), and we start referring to those coupled layers as “Chains”. Chains are usually tied to some specific need (deploying applications, for example). Sometimes you create a chain just to solve a VERY SPECIFIC hard problem (populating /etc/sudoers in large organizations, for example).

The question is – do I create another “Chain” of layers in the hierarchy solely because deploying sudoers is hard, or do I throw a couple of case statements into the sudoers profile and keep it out of Hiera altogether?

My answer is to start with conditional logic in the sudoers profile and break it out into Hiera if you see that “Chain” being needed elsewhere. Why? Because, like I’ve said many times before, debugging Hiera kinda sucks right now – there’s no way currently to get a dump of all variables and parameters for a particular node and determine which were set by Hiera, which were set with variables in the DSL, which came out of the console, and so on. If we HAD that tool, I’d be all about using it and polluting your hierarchy all day long (I expand upon this slightly in the next point about the Automatic Parameter Lookup + Hiera).

Bottom line: Start with the data in the Profile, then move it to Hiera when you need to override. Start with conditional logic in the Profile, then create a “Chain” in the Hierarchy if you need to use it in more than one place.

Hiera, APL, Refactoring, WTF

Like I said, I’ve written about this before. I like the Automatic Parameter Lookup functionality in Puppet – it’s ace. I like Hiera. But if you don’t know how it works, or that it exists, it feels too much like Magic. There are certain things in the product that can ONLY be set by putting data inside Hiera and running Puppet, and that is truly an awesome thing: just tell a customer “drop this bit of data somewhere in Hiera, run Puppet, and you’re all set.” But, again, if you need to know how a particular line got into a particular config file on your node, and it was set with the APL, then you’ve got some digging to do.

There’s still no tool, like I mentioned in the last item, to give me full introspection into all variables/parameters set for a node and that variable/parameter’s origin. Part of the reason as to WHY this tool doesn’t exist is because the internals of Puppet don’t necessarily make it easy for you to determine where a parameter/variable was set. That’s OUR problem, and I feel like we’re slowly making progress on marking these things internally so we can expose them to our customers. Until then, you have to trace through code and Hiera data.

I know the second I publish and tweet about this, I’m gonna get a message from R.I. Pienaar saying that I’ve crazy for NOT pushing people toward using Hiera more with the Automatic Parameter Lookup, because the more we use it, the faster we can move away from things like params classes, and profiles, and everything else, but the reality is I’m ALL ABOUT PEOPLE using it if they know how it works. I’m ACTUALLY fucking happy that it works well for you – please continue to use it and do awesome Puppet things. I only recommend to people who are getting started to NOT USE it FIRST, and then, when you understand how it would help you by clocking some hours of Puppet code writing and debugging, do some refactoring and move to it!

Yes, refactoring is involved.

Look, refactoring is a way of life. You’re gonna re-tool your Puppet code for the purposes of legibility, or efficiency, or any of the many other reasons why you refactor code – it’s unavoidable. Also, if I come into your org and setup Puppet for the most efficient use-case, and then I leave that into your relatively-new-to-Puppet hands, it’s probably not gonna be the best situation because you won’t have known WHY I made the decisions I did (and, even if I document them, you might have gaps of knowledge that would help you understand the problems I’m helping you avoid).

Sometimes hitting the problem so you have first-hand knowledge of why you need to avoid it in the future isn’t the WORST thing in the world.

To move to any configuration management system means you’re gonna be refactoring. Embrace it. Start small, get things working, then clean it up. Don’t try to build the “fortress of sysadmin perfection” with your first bit of Puppet code – just get shit done! Allow yourself time during the month simply to unwind some misgivings you realize after-the fact, and definitely seek advice before doing something you feel might be particularly complex or overarching, but getting shit done is gonna trump “not working” any day (or whatever the manager-y buzzspeak is this week).

Bottom Line: APL if you understand it, start small, get shit done, refactor, repeat

Hopefully this leads to more posts

Holy shit, you’re still reading?! Ohh, you skimmed down this far to see how long this post was gonna be – got it. Either way, I’m glad I finally got this out there. It’s been months, yes, but that doesn’t mean I haven’t been writing. We’ve been doing lots of internal work to try and get more official docs out to you and less of “Go read Gary’s blog!” You’ll notice R10k has some official docs, right?! Yeah, that’s awesome! We want more of that. BUT, there’s still going to be times where I feel like what I’m gonna say isn’t necessarily the “party line”, and that’s what this blog is about.

Thanks to everyone at Puppetconf and beyond who approached me and told me how much they love what I write. I’m gonna be humble as fuck in person, but I really do get excited whenever someone says that. It’s also crazy as hell when someone from Wal-mart approaches you and says they built part of their deployment based on the shit you wrote. From a guy who came from a town in Ohio with a population of less than 8000 people, it’s crazy to see where you’re “recognized.”

So thank you, again, for all the support.

And sorry, Dave Matthews – it’s not you, it’s me. Actually, that’s a lie; it was you.

Puppet Workflows 4: Using Hiera in Anger

Hiera. That thing nobody is REALLY quite sure how to say (FYI: It’s pronounced ‘hiera’), the tool that everyone says you should be using, and the tool that will make you hate YAML syntax errors with a passion. It’s a data/code separation dream, (potentially) a debugging nightmare, and absolutely vital in creating a Puppet workflow that scales better than your company’s Wifi strategy (FYI: your company’s Wifi password just changed. Again. Because they’re not using certificates). I’ve already written a GOOD AMOUNT on why/how to use it, but now I’m going to give you a couple of edge cases. Call them “best practices” (and I’ll cut you), but I like to call it “shit I learned after using Hiera in anger.” Here are a couple of the most popular questions I hear, and my usual responses…

“How should I setup my hierarchy?”

This is such a subjective question because it’s specific to your organization (because it’s your data). I usually ask back “What are the things about your nodes that are different, and when are they different?” Usually I hear something back like “Well, nodes in this datacenter have different DNS settings” or “Application servers in production use one version of java, and those in dev use a different version” or “All machines in the dev environment in this datacenter need to have a specific repository”. All of these replies give me ideas to your hierarchy. When you think of Hiera as a giant conditional statment, you can start seeing how your hierarchy could be laid out. With the first response, we know we need a location fact to determine where a node is, and then we can have a hierarchy level for that location. The second response tells me we need a level for the application tier (i.e. dev/test/prod). The third response tells me we need a level that combines both the location and the application tier. When you add in that you should probably have a node-specific level at the top (for overrides) and a default level at the bottom (or not: see the next section), I’m starting to picture this:

1
2
3
4
5
6
:hierarchy:
  - "nodes/%{::clientcert}"
  - "%{::location}/%{::applicationtier}"
  - "%{::location}/common"
  - "tier/%{::applicationtier}"
  - common

Every time you have a need, you consider a level. Now, obviously, it doesn’t mean that you NEED a level for every request (sometimes if it’s an edge case you can handle it in the profile or the role). There’s a performance hit for every level of your Hiera hierarchy, so ideally keep it minimal (or around 5 levels or so), but we’re talking about flexibility here, and, if that’s more important than performance then you should go for it.

Next comes ordering. This one’s SLIGHTLY easier – your hierarchy should read from most-specific to least-specific. Note that when you specify an application tier at a specific location that that it is MORE specific than just saying “all nodes in this application tier.” Sometimes you will have levels that might be hard to define an order – such as location vs. application tier. You kinda just have to go with your gut here. In many cases you may find that the data you put in those two levels will be entirely different (location-based data may not ever overlap with application-tier-specific data). Do remember than any time you change the order of your hierarchy you’re going to introduce the possibility that values get flip/flopped.

If you look at level 3 of the hierarchy above, you’ll see that I have ‘common’ at the end. Some people like this syntax (where they put a ‘common’ file in a folder that matches the fact they’re checking against), and some people prefer a filename matching the fact. Do what makes you happy, but, in this case, we can unify the location folder and just put the common file underneath the application tier files.

Finally, DO MAKE USE OF FOLDERS! For the love of god, this. Putting all files in a single folder both makes that a BIG folder, but also introduces a namespace collision (i.e. what if you have a location named ‘dev’ for example? Now you have both an application tier and a location with the same name. Oops).

How you setup your hierarchy is up to you, but this should hopefully give you somewhere to start.

Common.yaml, your organization’s common values – REVISED

UPDATE – 28 October

Previously, this section was where I presented the idea of removing the lowest level of the hierarchy as a way of ensuring that you didn’t omit a value in Hiera (the idea being that common values would be in the profile, anything higher would be in Hiera, and all your ‘defaults’, or ‘common values’ would be inside the profile). The idea of removing the lowest level of the Hiera hierarchy was always something I was kicking around in my head, but R.I. made a comment below that’s made me revise my thought process. There’s still a greater concern around definitively tracking down values pulled from Hiera, but I think we can accomplish that through other means. I’m going to revise what I wrote below to point out the relevant details.

When using Hiera, you need to define a hierarchy that Hiera uses in its search for your data. Most often, it looks something like this:

hiera.yaml
1
2
3
4
5
6
7
8
9
10
---
:backends:
  - yaml
:yaml:
  :datadir: /etc/puppetlabs/puppet/hieradata
:hierarchy:
  - "nodes/%{::clientcert}"
  - "location/%{::location}"
  - "environment/%{::applicationtier}"
  - common

Notice that little “common” at the end? That means that, failing everything else, it’s going to look in common.yaml for a value. I had thought of common as the ‘defaults’ level, but the reality is that it is a list of values common across all the nodes in your infrastructure. These are the values, SPECIFIC TO YOUR ORGANIZATION, that should be the same everywhere. Barring an override at a higher level, these values are your organization’s ‘defaults’, if you will.

Previously, you may have heard me rail against Hiera’s optional second argument and how I really don’t like it. Take this example:

1
$foo = hiera('port', '80')

Given this code, Hiera is going to look for a parameter called ‘port’ in its hierarchy, and, if it doesn’t find one in ANY of the levels, assign back a default value of ‘80’. I don’t like using this second argument because:

  1. If you forget to enter the ‘port’ parameter into the hierarchy, or typo it in the YAML file, Hiera will gladly assign the default value of ‘80’ (which, unless you’re checking for this, might sneak and get into production)
  2. Where is the real ‘default’ value: the value in common.yaml or the optional second argument?

It actually depends on where you do the hiera() call as to what ‘kind’ of default value this is. Note that previously we talked about how the ‘common’ level represented values common across your infrastructure. If you do this hiera() call inside a profile (which is where I recommend it be done), providing the optional second argument ends up being redundant (i.e. the value should be inside Hiera).

The moral of this story being: values common to all nodes should be in the lowest level of the Hiera hierarchy, and all explicit hiera calls should omit the default second argument if that common value is expected to be found in the hierarchy.

Data Bindings

In Puppet 3, we introduced the concept of ‘data bindings’ for parameterized classes, which meant that Puppet now had another choice for gathering parmeter values. Previously, the order Puppet would look to assign a value for parameters to classes was:

  1. A value passed to the class via the parameterized class syntax
  2. A default value provided by the class

As of Puppet 3, this is the new parameter assignment order:

  1. A value passed to the class via the parameterized class syntax
  2. A Hiera lookup for classname::parametername
  3. A default value provided by the class

Data bindings is meant to be pluggable to allow for ANY data backend, but, as of this writing, there’s currently only one: Hiera. Because of this, Puppet will now automatically do a Hiera lookup for every parameter to a parameterized class that isn’t explicitly passed a value via the parameterized class syntax (which means that if you just do include classname, Puppet will do a Hiera lookup for EVERY parameter defined to the “classname” class).

This is really cool because it means that you can just add classname::parametername to your Hiera setup, and, as long as you’re not EXPLICITLY passing that parameter’s value to the class, Puppet will do a lookup and find the value.

It’s also completely transparent to you unless you know it’s happening.

The issue here is that this is new functionality to Puppet, and it feels like magic to me. You can make the argument and say “If you don’t start using it, Gary, people will never take to it,” however I feel like this kind of magical lookup in the background is always going to be a bad thing.

There’s also another problem. Consider a Hiera hierarchy that has 15 levels (they exist, TRUST ME). What happens if you don’t define ANY parameters in Hiera in the form of classname::parametername and simply want to rely on the default values for every class? Well, it means that Hiera is STILL going to be triggered for every parameter to a class that isn’t explicitly passed a value. That’s a hell of a performance hit. Fortunately, there’s a way to disable this lookup. Simply add the following to the Puppet master’s puppet.conf file:

1
data_binding_terminus = none

It’s going to be up to how your team needs to work as to whether you use Hiera data bindings or not. If you have a savvy team that feels they can debug these lookups, then cool – use the hell out of it. I prefer to err on the side of an explicit hiera() lookup for every value I’m querying, even if it’s a lot of extra lines of code. I prefer the visibility, especially for new members to your team. For those people with large hierarchies, you may want to weigh the performance hit. Try to disable data bindings and see if your master is more performant. If so, then explicit hiera() calls may actually buy you some rewards.

PROS:

  • Adding parameters to Hiera in the style of classname::parametername will set parameterized class values automatically
  • Simplified code – simply use the include() function everywhere (which is safer than the parameterized class syntax)

CONS:

  • Lookup is completely transparent unless you know what’s going on
  • Debugging parameter values can be difficult (especially with typos or forgetting to set values in Hiera)
  • Performance hit for values you want to be assigned the class default value

Where to data – Hiera or Profile?

“Does this go right into the Profile or into Hiera?” I get that question repeatedly when I’m working with customers. It’s a good question, and one of the quickest ways to blow up your YAML files in Hiera. Here’s the order I use when deciding where to put data:

WHERE did that data come from?

Remember that the profile is YOUR implementation – it describes how YOU define the implementation of a piece of technology in YOUR organization. As such, it’s less about Puppet code and more about pulling data and passing it TO the Puppet code. It’s the glue-code that grabs the data and wires it up to the model that uses it. How it grabs the data is not really a big deal, so long as it grabs the RIGHT data – right? You can choose to hardcode it into the Profile, or use Hiera, or use some other magical data lookup mechanism – we don’t really care (so long as the Profile gathers the data and passes it to the correct Puppet class).

The PROBLEM here is debugging WHERE the data came from. As I said previously, Hiera has a level for all bits of data common to your organization, and, obviously, data overridden at a higher level takes precedence over the ‘common’ level at the bottom. With Hiera, unless you run the hiera binary in debug mode (-d), you can never be completely sure where the data came from. Puppet has no way of dumping out every variable and where it came from (whether Hiera or set directly in the DSL, and, if it WAS Hiera, exactly what level or file it came from).

It is THIS REASON that causes me to eschew things like data bindings in Puppet. Debugging where a value came from can be a real pain in the ass. If there were amazing tooling around this, I would 100% support using data bindings and just setting everything inside Hiera and using the include() function, but, alas, that’s not been my experience. Until then, I will continue to recommend explicit hiera calls for visibility into when Hiera is being called and when values are being set inside the DSL.

Enter the data into the Profile

One of the first choices people make is to enter the data (like ntpserver address, java version, or whatever it is) directly into the Profile. “BUT GARY! IT’S GOING TO MAKE IT HARD TO DEBUG!” Not really. You’re going to have to open the Profile anyway to see what’s going on (whether you pull the data from Hiera or hardcode it in the Profile), right? And, arguably, the Profile is legible…doing Hiera lookups gives you flexibility at a cost of abstracting away how it got that bit of data (i.e. “It used Hiera”). For newer users of Puppet, having the data in the Profile is easier to follow. So, in the end, putting the data into the Profile itself is the least-flexible and most-visible option…so consequently people consider it as the first available option. This option is good for common/default values, BUT, if you eventually want to use Hiera, you need to re-enter the data into the common level of Hiera. It also splits up your “source of truth” to include BOTH the Profile manifest and Hiera. In the end, you need to weigh your team’s goals, who has access to the Hiera repo, and how flexible you need to be with your data.

PROS:

  • Data is clearly visible and legible in the profile (no need to open additional files)

CONS:

  • Inability to redefine variables in Puppet DSL makes any settings constants by default (i.e. no overriding permitted)
  • Data outside of Hiera creates a second “source of truth”

Enter the data into Hiera

If you find that you need to have different bits of data for different nodes (i.e. a different version of Java in the dev tier instead of the prod tier), then you can look to put the data into Hiera. Where to put the data is going to depend on your own needs – I’m trusting that you can figure this part out – but the bigger piece here is that once the data is in Hiera you need to ensure you’re getting the RIGHT data (i.e. if it’s overridden at a higher level, you are certain you entered it into the right file and didn’t typo anything).

This answers that “where” question, but doesn’t answer the “what” question…as in “What data should I put into Hiera?” For that, we have another section…

PROS:

  • Flexibility in returning different values based on different conditions
  • All the data is inside one ‘source of truth’ for data according to your organization

CONS:

  • Visibility – you must do a Hiera lookup to find the value (or open Hiera’s YAML files)

“What exactly goes into Hiera?”

If there were one question that, if answered incorrectly, could make or break your Puppet deployment, this would be it. The greatest strength and weakness of Hiera is its flexibility. You can truly put almost anything in Hiera, and, when combined with something like the create_resources() function, you can create your own YAML configuration language (tip: don’t actually do this).

“But, seriously, what should go into Hiera, and what shouldn’t?”

The important thing to consider here is the price you pay by putting data into Hiera. You’re gaining flexibility at a cost of visibility. This means that you can do things like enter values at all level of the hierarchy that can be concatenated together with a single hiera_array() call, BUT, you’re losing the visibility of having the data right in front of you (i.e. you need to open up all the YAML files individually, or use the hiera binary to debug how you got those values). Hiera is REALLY COOL until you have to debug why it grabbed (or DIDN’T grab) a particular value.

Here’s what I usually tell people about what should be put into Hiera:

  • The exact data values that need to be different conditionally (i.e. a different ntp server for different sites, different java versions in dev/prod, a password hash, etc.)
  • Dynamic data expressed in multiple levels of the hierarchy (i.e. a lookup for ‘packages’ that returns back an array of all the values that were found in all the levels of the hierarchy)
  • Resources as a hash ONLY WHEN ABSOLUTELY NECESSARY

Puppet manifest vs. create_resources()

Bullets 1 and 2 above should be pretty straightforward – you either need to use Hiera to grab a specific value or return back a list of ALL the values from ALL the levels of the hierarchy. The point here is that Hiera should be returning back only the minimal amount of data that is necessary (i.e. instead of returning back a hash that contains the title of the resource, all the attributes of the resource, and all the attribute values for that resource, just return back a specific value that will be assigned to an attribute…like the password hash itself for a user). This data lookup appears to be “magic” to new users of Puppet – all they see is the magic phrase of “hiera” and a parameter to search for – and so it becomes slightly confusing. It IS, however, easier to understand that this magical phrase will return data, and that that data is going to be used to set the value for an attribute. Consider this example:

1
2
3
4
5
6
7
8
9
$password = hiera('garypassword')

user { 'gary':
  ensure   => present,
  uid      => '5001',
  gid      => 'gary',
  shell    => 'zsh',
  password => $password,
}

This leads us to bullet 3, which is “the Hiera + create_resources() solution.” This solution allows you to lookup data from within Hiera and pass it directly to a function where Puppet creates the individual resources as if you had typed them into a Puppet manifest itself. The previous example can be entered into a Hiera YAML file like so:

sysadmins.yaml
1
2
3
4
5
6
7
users:
  gary:
    ensure: 'present'
    uid: '5001'
    gid: 'gary'
    shell: 'zsh'
    password: 'biglongpasswordhash'

And then a resource can be created inside the Puppet DSL by doing the following:

1
2
$users = hiera('users')
create_resources('users')

Both examples are functionally identical, except the first one only uses Hiera to get the password hash value, whereas the second one grabs both the attributes, and their values, for a specific resource. Imagine Puppet gives you an error with the ‘gary’ user resource and you were using the latter example. You grep your Puppet code looking for ‘gary’, but you won’t find that user resource in your Puppet manifest anywhere (because it’s being created with the create_resources() function). You will instead have to know to go into Hiera’s data directory, then the correct datafile, and then look for the hash of values for the ‘gary’ user.

Functional differences between the two approaches

Functionally, you COULD do this either way. When you come up with a solution using create_resources(), I challenge you to draw up another solution using Puppet code in a Puppet manifest (however lengthy it may be) that queries Hiera for ONLY the specific values necessary. Consider this example, but, instead, you need to manage 500 users. If you use create_resources(), you would then need to add 500 more blocks to the ‘users’ parameter in your Hiera datafiles. That’s a lot of YAML. And on what level will you add these blocks? prod.yaml? dev.yaml? Are you using a common.yaml? Your YAML files suddenly got huge, and the rest of your team modifying them will not be so happy to scroll through 500 entries. Now consider the first example using Puppet code. Your Puppet manifest suddenly grew, but it didn’t affect all the OTHER manifests out there: only this file. The Hiera YAML files will still grow – but now 500 individual lines instead of 3000 lines in the previous example. Okay, now which one is more LEGIBLE? I would argue that the Puppet manifest is more legible, because I consider the Puppet DSL to be very legible (again, subject to debate versus YAML). Moreover, when debugging, you can stay inside Puppet files more often using Puppet manifests to define your resources. Using create_resources, you need to jump into Hiera more often. That’s a context shift, which adds more annoyance to debugging. Also, it creates multiple “sources of truth.” Suddenly you have the ability of entering data in Hiera as well as entering it in the Puppet manifest, which may be clear to YOU, but if you leave the company, or you get another person on your team, they may choose to abuse the Hiera settings without knowing why.

Now consider an example that you might say is more tailored to create_resources(). Say you have a defined type that sets up tomcat applications. This defined type accepts things like a path to install the application, the application’s package name, the version, which tomcat installation to target, and etc. Now consider that all application servers need application1, but only a couple of servers need application2, and a very snowflake server needs application3 (in this case, we’re NOT saying that all applications are on all boxes and that their data, like the version they’re using, is different. We’re actually saying that different machines require entirely different applications).

Using Hiera + create_resources() you could enter the resource for the application1 at a low level, then, at a higher level, add the resource for application2, and finally add the resource for application3 at the node-specific level. In the end, you can do a hiera_hash() lookup to discover and concatenate all resources from all levels of the hierarchy and pipe that to create_resources.

How would you do this with Puppet code? Well, I would create profiles for every application, and either different roles for the different kinds of servers (i.e. the snowflake machine gets its own role), or conditional checks inside the role (i.e. if this node is at the London location, it gets these application profiles, and etc…).

Now which is more legible? At this point, I’d still say that separate profiles and conditional checks in roles (or sub-roles) are more legible – including a class is a logical thing to follow, and conditionals inside Puppet code are easy to follow. The create_resources() solution just becomes magic. Suddenly, applications are on the node. If you want to know where they came from, you have to switch contexts and open Hiera data files or use the hiera binary and do a debug run. If you’re a small team that’s been using Puppet forever, then rock on and go for it. If you’re just getting started, though, I’d shy away.

Final word on create_resources?

1
2
Some people, when confronted with a problem, think “I know, I'll use create_resources()."
Now they have two problems.

The create_resources() function is often called the “PSE Swiss Army knife” (or, Professional Services Engineer – the people who do what I do and consult with our customers) because we like to break it out when we’re painted into a corner by customer requirements. It will work ANYWHERE, but, again, at that cost of visibility. I am okay with someone using it so long as they understand the cost of visibility and the potential debugging issues they’ll hit. I will always argue against using it, however, for those reasons. More code in a Puppet manifest is not a bad thing…especially if it’s reasonably legible code that can be kept to a specific class. Consider the needs and experience level of your team before using create_resources() – if you don’t have a good reason for using it, simply don’t.

create_resources()

PROS:

  • Dynamically iterate and create resources based on Hiera data
  • Using Hiera’s hash merging capability, you can functionally override resource values at higher levels of the hierarchy

CONS:

  • Decreased visibility
  • Becomes a second ‘source of truth’ to Puppet
  • Can increase confusion about WHERE to manage resources
  • When used too much, it creates a DSL to Puppet’s DSL (DSLs all the way down)

Puppet DSL + single Hiera lookup

PROS:

  • More visible (sans the bit of data you’re looking up)
  • Using wrapper classes allows for flexibility and conditional inclusion of resources/classes

CONS:

  • Very explicit – doesn’t have the dynamic overriding capability like Hiera does

Using Hiera as an ENC

One of the early “NEAT!” moments everyone has with Hiera is using it as an External Node Classifier, or ENC. There is a function called hiera_include() that allows you to include classes into the catalog as if you were to write “include (classname)” in a Puppet manifest. It works like this:

london.yaml
1
2
3
classes:
  - profiles::london::base
  - profiles::london::network
dev.yaml
1
2
classes:
  - profiles::tomcat::application2
site.pp
1
2
3
node default {
  hiera_include('classes')
}

Given the above example, the hiera_include() function will search every level of the hierarchy looking for a parameter called ‘classes’. It returns a concatenated list of classnames, which it then passes to Puppet’s include() function (in the end, Puppet will declare the profiles::london::base, profiles::london::network, and profiles::tomcat::application2 classes). Puppet puts the contents of these classes into the catalog, and away we go. This is awesome because you can change the classification of a node conditionally according to a Hiera lookup, and it’s terrible because you can CHANGE THE CLASSIFICATION OF A NODE CONDITIONALLY ACCORDING TO A HIERA LOOKUP! This means that anyone with access to the repo holding your Hiera data files can affect changes to every node in Puppet just by modifying a magical key. It also means that in order to see the classification for a node, you need to do a Hiera lookup (i.e. you can’t just open a file and see it).

Remember that WHOLE blog post about Roles and Profiles? I do, because I wrote the damn thing. You can even go back and read it again, too, if you want to. One of the core tenets of that article was that each node get classified with a single role. If you adhere to that (and you should; it makes for a much more logical Puppet deployment), a node really only ever needs to be classified ONCE. You don’t NEED this conditional classification behavior. It’s one of those “It seemed like a good idea at the time” moments that I assure you will pass.

Now, you CAN use Roles with hiera_include() – simply create a Facter fact that returns the node’s role, add a level to the Hiera hierarchy for this role fact, and in the role’s YAML file in Hiera, simply do:

appserver.yaml
1
classes: role::application_server

Then you can use the same hiera_include() call in the default node definition in site.pp. The ONLY time I recommend this is if you don’t already have some other classification method. The downside of this method is that if your role fact CHANGES, for some reason or another, classification immediately changes. Facts are NOT secure – they can be overridden really easily. I don’t like to leave classification to an insecure method that anyone with root access on a machine can change. Using an ENC or site.pp for classification means that the node ABSOLUTELY CANNOT override its classification. It’s the difference between being authoritative and simply ‘suggesting’ a classification.

PROS:

  • Dynamic classification: no need to maintain a site.pp file or group in the Console
  • Fact-based: a node’s classification can change immediately when its role fact does

CONS:

  • Decreased visibility: need to do a Hiera lookup to determine classification
  • Insecure: since facts are insecure and can be overridden, so can classification

Puppetconf 2014 Talk - the Refactor Dance

This year at Puppetconf 2014, I presented a 1.5 hour talk entitled “The Refactor Dance” that comprised nearly EVERYTHING that I’ve written about in my Puppet Workflows series (from writing better component modules, to Roles/Profiles, to Workflow, and lots of stories in-between) as well as a couple of bad words, a pair of leather pants (trousers), and an Uber story that beats your Uber story. It’s long, informative, and you get to watch the sweat stains under my arms grow in an attractive grey Puppet Labs shirt. What’s not to love?

To watch the video, click here to check it out!

On Dependencies and Order

This blog post was born out of a number of conversations that I’ve had about Puppet, its dependency model, and why ‘ordering’ is not necessarily the way to think about dependencies when writing Puppet manifests. Like most everything on this site, I’m getting it down in a file so I don’t have to repeat this all over again the next time someone asks. Instead, I can point them to this page (and, when they don’t actually READ this page, I can end up explaining everything I’ve written here anyways…).

Before we go any further, let me define a couple of terms:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
dependencies     - In a nutshell, what happens when you use the metaparameters of
                   'before', 'require', 'subscribe' or 'notify' on resources in a
                   Puppet manifest: it's a chain of resources that are to be
                   evaluted in a specific order every time Puppet runs. Any failure
                   of a resource in this chain stops Puppet from evaluating the
                   remaining resources in the chain.

evaluate         - When Puppet determines the 'is' value (or current state) of a
                   resource (i.e. for package resources, "is the package installed?")

remediate        - When Puppet determines that the 'is' value (or current state of
                   the resource) is different from the 'should' value (or the value
                   entered into the Puppet manifest...the way the resource SHOULD
                   end up looking on the system) and Puppet needs to make a change.

declarative(ish) - When I use the word 'declarative(ish)', I mean that the order
                   by which Puppet evaluates resources that do not contain dependencies
                   does not have a set procedure/order. The way Puppet EVALUATES
                   resources does not have a set procedure/order, but the order
                   that Puppet reads/parses manifest files IS from top-to-bottom
                   (which is why variables in Puppet manifests need to be declared
                   before they can be used).

Why Puppet doesn’t care about execution order (until it does)

The biggest shock to the system when getting started with a declarative (ish) configuration management tool like Puppet is understanding that Puppet describes the end-state of the machine, and NOT the order that it’s (Puppet) going to take you to that state. To Puppet, the order that it chooses to affect change in any resource (be it a file to be corrected, a package to be installed, or any other resource type) is entirely arbitrary because resources that have no relationship to another resource shouldn’t CARE about the order in which they’re evaluated and remediated.

For example, imagine Puppet is going to create both /etc/sudoers and update the system’s authorized keys file to enter all the sysadmins’ SSH keys. Which one should it do first? In an imperative system like shell scripts or a runbook-style system, you are forced to choose an order. So I ask again, which one goes first? If you try to update the sudoers file in your script first, and there’s a problem with that update, then the script fails and the SSH keys aren’t installed. If you switch the order and there’s a problem with the SSH keys, then you can’t sudo up because the sudoers file hasn’t been touched.

Because of this, Puppet has always taken the stance that if there are failures, we want to get as much of the system into a working state as possible (i.e. any resources that don’t depend upon the failing resource are going to still be evaluated, or ‘inspected’, and remediated, or ‘changed if need be’). There are definitely philosophical differences here: the argument can be made that if there’s a failure somewhere, the system is bad and you should cast it off until you’ve fixed whatever the problem is (or the part of the code causing the problem). In virtualized or ‘cloud’ environments where everything is automated, this is just fine, but in environments without complete and full automation, sometimes you have to fix and deal with what you have. Puppet “believes in your system”, which is borderline marketing-doubletalk for “alert you of errors and give you time to fix the damn thing and do another Puppet run without having to spin up a whole new system.”

Once you know WHY Puppet takes the stance it does, you realize that Puppet does not give two shits about the order of resources without dependencies. If you write perfect Puppet code, you’re fine. But the majority of the known-good-world does not do that. In fact, most of us write shit code. Which was the problem…

The history of Puppet’s ordering choices

‘Random’ random order

In the early days, the only resources that were guaranteed to have a consistent order were those resources with dependencies (i.e. as I stated above, resources that used the ‘before’, ‘require’, ‘subscribe’, or ‘notify’ metaparameters to establish an evaluation order). Every other resource was evaluted at random every time that Puppet ran…which meant that you could run Puppet ten times and, theoretically, resources without dependencies could be evaluated in a different order between every Puppet run (we call this non-deterministic ordering). This made things REALLY hard to debug. Take the case where you had a catalog of thousands of resources but you forgot a SINGLE dependency between a couple of file resources. If you roll that change out to 1000 nodes, you might have 10 or less of them fail (because Puppet chose an evaluation order that ordered these two resources incorrectly). Imagine trying to figure out what happened and replicate the problem. You could waste lots of time just trying to REPLICATE the issue, even if it was a small fix like this.

PROS:

  • IS there a pro here?

CONS:

  • Ordering could change between runs, and thus it was very hard to debug missing dependencies

Philosophically, we were correct: resources that are to be evaluated in a certain order require dependencies. Practically, we were creating more work for ourselves.

Incidentally, I’d heard that Adam Jacob, who created Chef, had cited this reason as one of the main motivators for creating Chef. I’d heard that as a Puppet consultant, he would run into these buried dependency errors and want to flip tables. Even if it’s not a true STORY, it was absolutely true for tables where I used to work…

Title-hash, ‘Predictable’ random order

Cut to Puppet version 2.7 where we introduced deterministic ordering with ‘title-hash’ ordering. In a nutshell, resources that didn’t have dependencies would still be executed in a random order, but the order Puppet chose could be replicated (it created a SHA1 hash based on the titles of the resources without dependencies, and ordered the hashes alphabetically). This meant that if you tested out a catalog on a node, and then ran that same catalog on 1000 other nodes, Puppet would choose the same order for all 1000 of the nodes. This gave you the ability to actually TEST whether your changes would successfully run in production. If you omitted a dependency, but Puppet managed to pick the correct evaluation order, you STILL had a missing dependency, but you didn’t care about it because the code worked. The next change you made to the catalog (by adding or removing resources), the order might change, but you would discover and fix the dependency at that time.

PROS:

  • ‘Predictable’ and repeatable order made testing possible

CONS:

  • Easy to miss dependency omissions if Puppet chose the right order (but do you really care?)

Manifest ordering, the ‘bath salts’ of ordering

Title-hash ordering seemed like the best of both worlds – being opinionated about resource dependencies but also giving sysadmins a reliable, and repeatable, way to test evaluation order before it’s pushed out to production.

Buuuuuuuuuut, y’all JUST weren’t happy enough, were you?

When you move from an imperative solution like scripts to a declarative(ish) solution like Puppet, it is absolutely a new way to think about modeling your system. Frequently we heard that people were having issues with Puppet because the order that resources shows up in a Puppet master WASN’T the order that Puppet would evaluate the resources. I just dropped a LOT of words explaining why this isn’t the case, but who really has the time to read up on all of this? People were dismissing Puppet too quickly because their expectations of how the tool worked didn’t align with reality. The assumption, then, was to align these expectations in the hopes that people wouldn’t dismiss Puppet so quickly.

Eric Sorenson wrote a blog post on our thesis and experimentation around manifest ordering that is worth a read (and, incidentally, is shorter than this damn post), but the short version is that we tested this theory out and determined that Manifest Ordering would help new users to Puppet. Because of this work, we created a feature called ‘Manifest Ordering’ that stated that resources that DID NOT HAVE DEPENDENCIES would be evaluated by Puppet in the order that they showed up in the Puppet manifest (when read top to bottom). If a resource truly does not have any dependencies, then you honestly should not care one bit what order it’s evaluated (because it doesn’t matter). Manifest Ordering made ordering of resources without dependencies VERY predictable.

But….

This doesn’t mean I think it’s the best thing in the world. In fact, I’m really wary of how I feel people will come to use Manifest Ordering. There’s a reason I called it the “bath salts of ordering” – because a little bit of it, when used correctly, can be a lovely thing, but too much of it, used in unintended circumstances, leads to hypothermia, paranoia, and the desire to gnaw someone else’s face off. We were/are giving you a way to bypass our dependency model by using the mental-model you had with scripts, but ALSO telling you NOT to rely on that mental-model (and instead set dependencies explicitly using metaparameters).

Seriously, what could go wrong?

Manifest Ordering is not a substitution for setting dependencies – that IS NOT what it was created for. Puppet Labs still maintains that you should use dependencies to order resources and NOT simply rely on Manifest Ordering as a form of setting dependencies! Again, the problem is that you need to KNOW this…and if Manifest Ordering allows you to keep the same imperative “mindset” inside a declarative(ish) language, then eventually you’re going to experience pain (if not today, but possibly later when you actually try to refactor code, or share code, or use this code on a system that ISN’T using Manifest Ordering). A declarative(ish) language like Puppet requires seeing your systems according to the way their end-state will look and worrying about WHAT the system will look like, and not necessarily HOW it will get there. Any shortcut to understanding this process means you’re going to miss key bits of what makes Puppet a good tool for modeling this state.

PROS:

  • Evaluation order of resources without dependencies is absolutely predictable

CONS:

  • If used as a substitution for setting dependencies, then refactoring code (moving around the order in which resources show up in a manifest) means changing the evaluation order

What should I actually take from this?

Okay, here’s a list of things you SHOULD be doing if you don’t want to create a problem for future-you or future-organization:

  • Use dependency metaparameters like ‘before’, ‘require’, ‘notify’, and ‘subscribe’ if resources in a catalog NEED to be evaluated in a particular order
  • Do not use Manifest Ordering as a substitute for explicitly setting dependencies (disable it if this is too tempting)
  • Use Roles and Profiles for a logical module layout (see: http://bit.ly/puppetworkflows2 for information on Roles and Profiles)
  • Order individual components inside the Profile
  • Order Profiles (if necessary) inside the Role

And, seriously, trust us with the explicit dependencies. It seems like a giant pain in the ass initially, but you’re ultimately documenting your infrastructure, and a dependency (or, saying ‘this thing MUST come before that thing’) is a pretty important decision. There’s a REASON behind it – treat it with some more weight other than having one line come before another line, ya know? The extra time right now is absolutely going to buy you the time you spend at home with your kids (and by ‘kids’, I mean ‘XBox’).

And don’t use bath salts, folks.

R10k + Directory Environments

If you’ve read anything I’ve posted in the past year, you know my feelings about the word ‘environments’ and about how well we tend to name things here at Puppet Labs (and if you don’t, you can check out that post here). Since then, Puppet Labs has released a new feature called directory environments (click this link for further reading) that replace the older ‘config file environments’ that we all used to use (i.e. stanzas in puppet.conf). Directory environments weren’t without their false starts and issues, but further releases of Puppet, and their inclusion in Puppet Enterprise 3.3.0, have allowed more people to ask about them. SO, I thought I’d do a quick writeup about them…

R10k had a child: Directory Environments

The Puppet platform team had a couple of problems with config file environments in puppet.conf – namely:

  • Entering them in puppet.conf meant that you couldn’t use environments named ‘master’, ‘main’, or ‘agent’
  • There was no easy/reliable way to determine all the available/used Puppet environments without making assumptions (and hacky code) – especially if someone were using R10k + dynamic environments
  • Adding more environments to puppet.conf made managing that file something of a nightmare (environments.d anyone?)

Combine this with the fact that most of the Professional Services team was rolling out R10k to create dynamic environments (which meant we were abusing $environment inside puppet.conf and creating environments…well… dynamically and on-the-fly), and they knew something needed to be done. Because R10k was so popular and widely deployed, an environment solution that was a simple step-up from an R10k deployment was made the target, and directory environments were born.

How does it work?

Directory environments, essentially, are born out of a folder on the Puppet master (typically $confdir/environments, where $confdir is /etc/puppetlabs/puppet in Puppet Enterprise) wherein every subfolder is a new Puppet environment. Every subfolder contains a couple of key items:

  • A modules folder containing all modules for that environment
  • A manifests/site.pp file containing the site.pp file for that environment
  • A new environment.conf file which can be used to set the modulepath, the environment_timeout, and, a new and often-requested feature, the ability to have environment-specific config_version settings

Basically, it’s everything that R10k ALREADY does with a couple of added goodies dropped into an environment.conf file. Feel free to read the official docs on configuring directory environments for further information on all of the goodies!

Cool, how do we set it up?

It wouldn’t be one of my blog posts if it didn’t include exact steps to configure shit, would it? For this walkthrough, I’m using a Centos 6.5 vm with DNS working (i.e. the node can ping itself and knows its own hostname and FQDN), and I’ve already installed an All-in-one installation of Puppet Enterprise 3.3.0. For the walkthrough, we’re going to setup:

  • Directory environments based on a control repo
  • Hiera data inside a hieradata folder in the control repo
  • Hiera to use the per-environment hieradata folder

Let’s start to break down the components:

The ‘Control Repo’?

Sometime between my initial R10k post and THIS post, the Puppet Labs PS team has come to call the repository that contains the Puppetfile and is used to track Puppet environments on all Puppet masters the ‘Control Repo’ (because it ‘Controls the creation of Puppet environments’, ya dig? Zack Smith and James Sweeny are actually pretty tickled about making that name stick). For the purpose of this demonstration, I’m using a repository on Github:

https://github.com/glarizza/puppet_repository

Everything you will need for this walkthrough is in that repository, and we will refer to it frequently. You DO NOT need to use my repository, and it’s definitely going to be required that you create your OWN, but it’s there for reference purposes (and to give you a couple of Puppet manifests to make setup a bit easier).

Configuring the Puppet master

We’re going to first clone my control repo to /tmp so we can use it to configure R10k and the Puppet master itself:

1
2
3
4
5
6
7
8
9
10
11
[root@master ~]# cd /tmp

[root@master /tmp]# git clone https://github.com/glarizza/puppet_repository.git
Initialized empty Git repository in /tmp/puppet_repository/.git/
remote: Counting objects: 164, done.
remote: Compressing objects: 100% (134/134), done.
remote: Total 164 (delta 54), reused 81 (delta 16)
Receiving objects: 100% (164/164), 22.68 KiB, done.
Resolving deltas: 100% (54/54), done.

[root@master /tmp]# cd puppet_repository

Great, I’ve cloned my repo. To configure R10k, we’re going to need to pull down Zack Smith’s R10k module from the forge with puppet module install zack/r10k and then use puppet apply on a manifest in my repo with puppet apply configure_r10k.pp. DO NOTE: If you want to use YOUR Control Repo, and NOT the one I use on Github, then you need to modify the configure_r10k.pp file and replace the remote property with the URL to YOUR Control Repo that’s housed on a git repository!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
[root@master /tmp/puppet_repository:production]# puppet module install zack/r10k

Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forgeapi.puppetlabs.com ...
Notice: Found at least one version of puppetlabs-stdlib compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Found at least one version of puppetlabs-inifile compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Found at least one version of puppetlabs-vcsrepo compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Found at least one version of puppetlabs-concat compatible with PE (3.3.0);
Notice: Skipping versions which don't express PE compatibility. To install
the most recent version of the module regardless of compatibility
with PE, use the '--ignore-requirements' flag.
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└─┬ zack-r10k (v2.2.7)
  ├─┬ gentoo-portage (v2.2.0)
  │ └── puppetlabs-concat (v1.0.3) [/opt/puppet/share/puppet/modules]
  ├── mhuffnagle-make (v0.0.2)
  ├── puppetlabs-gcc (v0.2.0)
  ├── puppetlabs-git (v0.2.0)
  ├── puppetlabs-inifile (v1.1.0) [/opt/puppet/share/puppet/modules]
  ├── puppetlabs-pe_gem (v0.0.1)
  ├── puppetlabs-ruby (v0.2.1)
  ├── puppetlabs-stdlib (v3.2.2) [/opt/puppet/share/puppet/modules]
  └── puppetlabs-vcsrepo (v1.1.0)

[root@master /tmp/puppet_repository:production]# puppet apply configure_r10k.pp

Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.71 seconds
Warning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.
   (at /opt/puppet/lib/ruby/site_ruby/1.9.1/puppet/type.rb:816:in `set_default')
Notice: /Stage[main]/R10k::Install/Package[r10k]/ensure: created
Notice: /Stage[main]/R10k::Install::Pe_gem/File[/usr/bin/r10k]/ensure: created
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/ensure: defined content as '{md5}5cda58e8a01e7ff12544d30105d13a2a'
Notice: Finished catalog run in 11.24 seconds

Performing those commands will successfully setup R10k to point to my Control Repo out on Github (and, again, if you don’t WANT that, then you need to make the change to the remote property in configure_r10k.pp). We next need to configure Directory Environments in puppet.conf by setting two attributes:

  • environmentpath (Or the path to the folder containing environments)
  • basemodulepath (Or, the set of modules that will be shared across ALL ENVIRONMENTS)

I have created a Puppet manifest that will set these attributes, and this manifest requires the puppetlabs/inifile module from the Puppet Forge. Fortunately, since I’m using Puppet Enterprise, that module is already installed. If you’re using open source Puppet and the module is NOT installed, feel free to install it by running puppet module install puppetlabs/inifile. Once this is done, go ahead and execute the manifest by running puppet apply configure_directory_environments.pp:

1
2
3
4
5
6
[root@master /tmp/puppet_repository:production]# puppet apply configure_directory_environments.pp

Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.05 seconds
Notice: /Stage[main]/Main/Ini_setting[Configure environmentpath]/ensure: created
Notice: /Stage[main]/Main/Ini_setting[Configure basemodulepath]/value: value changed '/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules' to '$confdir/modules:/opt/puppet/share/puppet/modules'
Notice: Finished catalog run in 0.20 seconds

The last step to configuring the Puppet master is to execute an R10k run. We can do that by running r10k deploy environment -pv:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@master /tmp/puppet_repository:production]# r10k deploy environment -pv

[R10K::Source::Git - INFO] Determining current branches for "https://github.com/glarizza/puppet_repository.git"
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment webinar_env
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying haproxy into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying profiles into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying haproxy into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying ntp into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/webinar_env/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

Great! Everything should be setup (if you’re using my repo)! My repository has a production branch, which is what Puppet’s default environment is named, so we can test that everything works by listing out all modules in the main production environment with the puppet module list command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[root@master /tmp/puppet_repository:production]# puppet module list

Warning: Module 'puppetlabs-stdlib' (v3.2.2) fails to meet some dependencies:
  'puppetlabs-ntp' (v3.1.2) requires 'puppetlabs-stdlib' (>= 4.0.0)
/etc/puppetlabs/puppet/environments/production/modules
├── notifyme (???)
├── profiles (???)
├── puppetlabs-apache (v1.1.1)
└── puppetlabs-ntp (v3.1.2)
/etc/puppetlabs/puppet/modules
├── gentoo-portage (v2.2.0)
├── mhuffnagle-make (v0.0.2)
├── puppetlabs-gcc (v0.2.0)
├── puppetlabs-git (v0.2.0)
├── puppetlabs-pe_gem (v0.0.1)
├── puppetlabs-ruby (v0.2.1)
├── puppetlabs-vcsrepo (v1.1.0)
└── zack-r10k (v2.2.7)
/opt/puppet/share/puppet/modules
├── puppetlabs-apt (v1.5.0)
├── puppetlabs-auth_conf (v0.2.2)
├── puppetlabs-concat (v1.0.3)
├── puppetlabs-firewall (v1.1.2)
├── puppetlabs-inifile (v1.1.0)
├── puppetlabs-java_ks (v1.2.4)
├── puppetlabs-pe_accounts (v2.0.2-3-ge71b5a0)
├── puppetlabs-pe_console_prune (v0.1.1-4-g293f45b)
├── puppetlabs-pe_mcollective (v0.2.10-15-gb8343bb)
├── puppetlabs-pe_postgresql (v1.0.4-4-g0bcffae)
├── puppetlabs-pe_puppetdb (v1.1.1-7-g8cb11bf)
├── puppetlabs-pe_razor (v0.2.1-1-g80acb4d)
├── puppetlabs-pe_repo (v0.7.7-32-gfd1c97f)
├── puppetlabs-pe_staging (v0.3.3-2-g3ed56f8)
├── puppetlabs-postgresql (v2.5.0-pe2)
├── puppetlabs-puppet_enterprise (v3.2.1-27-g8f61956)
├── puppetlabs-reboot (v0.1.4)
├── puppetlabs-request_manager (v0.1.1)
└── puppetlabs-stdlib (v3.2.2)  invalid

Notice a couple of things:

  • First, I’ve got some dependency issues…oh well, nothing that’s a game-stopper
  • Second, the path to the production environment’s module is correct at: /etc/puppetlabs/puppet/environments/production/modules

Configuring Hiera

The last dinghy to be configured on this dreamboat is Hiera. Hiera is Puppet’s data lookup mechanism, and is used to gather specific bits of data (such as versions of packages, hostnames, passwords, and other business-specific data). Explaining HOW Hiera works is beyond the scope of this article, but configuring Hiera data on a per-environment basis IS absolutely a worthwhile endeavor.

In this example, I’m going to demonstrate coupling Hiera data with the Control Repo for simple replication of Hiera data across environments. You COULD also choose to put your Hiera data in a separate repository and set it up in /etc/r10k.yaml as another source, but that exercise is left to the reader (and if you’re interested, I talk about it in this post).

You’ll notice that my demonstration repository ALREADY includes Hiera data, and so that data is automatically being replicated to all environments. By default, Hiera’s configuration file (hiera.yaml) has no YAML data directory specified, so we’ll need to make that change. In my demonstration control repository, I’ve included a sample hiera.yaml, but let’s take a look at one below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
## /etc/puppetlabs/puppet/hiera.yaml

---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{application_tier}"
  - common

:yaml:
# datadir is empty here, so hiera uses its defaults:
# - /var/lib/hiera on *nix
# - %CommonAppData%\PuppetLabs\hiera\var on Windows
# When specifying a datadir, make sure the directory exists.
  :datadir: "/etc/puppetlabs/puppet/environments/%{environment}/hieradata"

This hiera.yaml file specifies a hierarchy with three levels – a node-specific, level, a level for different application tiers (like ‘dev’, ‘test’, ‘prod’, and etc), and finally makes the change we need: mapping the data directory to each environment’s hieradata folder. The path to hiera.yaml is Puppet’s configuration directory (which is /etc/puppetlabs/puppet for Puppet Enterprise, or /etc/puppet for the open source version of Puppet), so open the file there, make your changes, and finally you’ll need to need to restart the Puppet master service to have the changes picked up.

Next, let’s perform a test by executing the hiera binary from the command line before running puppet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=production
This node is using common data

[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=webinar_env -d
DEBUG: 2014-08-31 19:55:44 +0000: Hiera YAML backend starting
DEBUG: 2014-08-31 19:55:44 +0000: Looking up message in YAML backend
DEBUG: 2014-08-31 19:55:44 +0000: Looking for data source common
DEBUG: 2014-08-31 19:55:44 +0000: Found message in common
This node is using common data

[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=bad_env -d
DEBUG: 2014-08-31 19:58:22 +0000: Hiera YAML backend starting
DEBUG: 2014-08-31 19:58:22 +0000: Looking up message in YAML backend
DEBUG: 2014-08-31 19:58:22 +0000: Looking for data source common
DEBUG: 2014-08-31 19:58:22 +0000: Cannot find datafile /etc/puppetlabs/puppet/environments/bad_env/hieradata/common.yaml, skipping
nil

You can see that for the first example, I passed the environment of production and did a simple lookup for a key called message – Hiera then returned me the value of out that environment’s common.yaml file. Next, I did another lookup, but added -d to enable debug mode (debug mode on the hiera binary is REALLY handy for debugging problems with Hiera – combine it with specifying values from the command line, and you can pretty quickly simulate what value a node is going to get). Notice the last example where I specified an invalid environment – Hiera logged that it couldn’t find the datafile requested and ultimately returned a nil, or empty, value.

Since we’re working on the Puppet master machine, we can even check for a value using puppet apply combined with the notice function:

1
2
3
4
[root@master /etc/puppetlabs/puppet/environments]# puppet apply -e "notice(hiera('message'))"
Notice: Scope(Class[main]): This node is using common data
Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.09 seconds
Notice: Finished catalog run in 0.19 seconds

Great, it’s working, but let’s look at pulling data from a higher level in the hierarchy – like from the application_tier level. We haven’t defined an application_tier fact, however, so we’ll need to fake it. First, let’s do that with the hiera binary:

1
2
3
4
5
6
[root@master /etc/puppetlabs/puppet/environments]# hiera message environment=production application_tier=dev -d
DEBUG: 2014-08-31 20:04:12 +0000: Hiera YAML backend starting
DEBUG: 2014-08-31 20:04:12 +0000: Looking up message in YAML backend
DEBUG: 2014-08-31 20:04:12 +0000: Looking for data source dev
DEBUG: 2014-08-31 20:04:12 +0000: Found message in dev
You are in the development application tier

And then also with puppet apply:

1
2
3
4
[root@master /etc/puppetlabs/puppet/environments]# FACTER_application_tier=dev puppet apply -e "notice(hiera('message'))"
Notice: Scope(Class[main]): You are in the development application tier
Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.09 seconds
Notice: Finished catalog run in 0.18 seconds

Tuning environment.conf

The brand-new, per-environment environment.conf file is meant to be (for the most part) a one-stop-shop for your Puppet environment tuning needs. Right now, the only things you’ll need to tune will be the modulepath, config_version, and possibly the environment_timeout.

Module path

Before directory environments, every environment had its own modulepath that needed to be tuned to allow for modules that were to be used by this machine/environment, as well as shared modules. That modulepath worked like $PATH in that it was a priority-based lookup for modules (i.e. the first directory in modulepath that had a module matching the module name you wanted won). It also previously required the FULL path to be used for every path in modulepath.

Those days are over.

As I mentioned before, the main puppet.conf configuration file has a new parameter called basemodulepath that can be used to specify modules that are to be shared across ALL modules in ALL environments. Paths defined here (typically $confdir/modules and /opt/puppet/share/puppet/modules) are usually put at the END of a modulepath so Puppet can search for any overridden modules that show up in earlier modulepath paths. In the previous configuration steps, we executed a manifest that setup basemodulepath to look like:

1
basemodulepath = $confdir/modules:/opt/puppet/share/puppet/modules

Again, feel free to add or remove paths (except don’t remove /opt/puppet/share/puppet/modules if you’re using Puppet Enterprise, because that’s where all Puppet Enterprise modules are located), especially if you’re using a giant monolithic repo of modules (which was typically done before things like R10k evolved).

With basemodulepath configured, it’s now time to configure the modulepath to be defined for every environment. My demonstration control repo contains a sample environment.conf that defines a modulepath like so:

1
modulepath = modules:$basemodulepath

You’ll notice, now, that there are relative paths in modulepath. This is possible because now each environment contains an environment.conf, and thus relative paths make sense. In this example, nodes in the production environment (/etc/puppetlabs/puppet/environments/production) will look for a module by its name FIRST by looking in a folder called modules inside the current environment folder (i.e. /etc/puppetlabs/puppet/environments/production/modules/<module_name>). If the module wasn’t found there, it looks for the module in the order that paths are defined for basemodulepath above. If Puppet fails to find a module in ANY of the paths, a compile error is raised.

Per-environment config_version

Setting config_version has been around for awhile – hell, I remember video of Jeff McCune talking about it at the first Puppetcamp Europe in like 2010 – but the new directory environments implementation has fine tuned it a bit. Previously, config_version was a command executed on the Puppet master at compile time to determine a string used for versioning the configuration enforced during that Puppet run. When it’s not set it defaults to something of a time/date stamp off the parser, but it’s way more useful to make it do something like determine the most recent commit hash from a repository.

In the past when we used a giant monolithic repository containing all Puppet modules, it was SUPER easy to get a single commit hash and be done. As everyone moved their modules into individual repositories, determining WHAT you were enforcing became harder. With the birth of R10k an the control repo, we suddenly had something we could query for the state of our modules being enforced. The problem existed, though, that with multiple dynamic environments using multiple git branches, config_version wasn’t easily tuned to be able to grab the most recent commit from every branch.

Now that config_version is set in a per-environment environment.conf, we can make config_version much smarter. Again, looking in the environment.conf defined in my demonstration control repo produces this:

1
config_version = '/usr/bin/git --git-dir $confdir/environments/$environment/.git rev-parse HEAD'

This setting will cause the Puppet master to produce the most recent commit ID for whatever environment you’re in and embed it in the catalog and the report that is sent back to the Puppet master after a Puppet run.

I actually discovered a bug in config_version while writing this post, and it’s that config_version is subject to the relative pathing fun that other environment.conf settings are subject to. Relative pathing is great for things like modulepath, and it’s even good for config_version if you’re including the script you want to run to gather the config_version string inside the control repo, but using a one-line command that tries to execute a binary on the system that DOESN’T include the full path to the binary causes an error (because Puppet attempts to look for that binary in the current environment path, and NOT by searching $PATH on the system). Feel free to follow or comment on the bug if the mood hits you.

Caching and environment_timeout

The Puppet master loads environments on-request, but it also caches data associated with each environment to make things faster. This caching is finally tunable on a per-environment basis by defining the environment_timeout setting in environment.conf. The default setting is 3 minutes, which means the Puppet master will invalidate its caches and reload environment data every 3 minutes, but that’s now tunable. Definitely read up on this setting before making changes.

Classification

One of the last new features of directory environments is the ability to include an environment-specific site.pp file for classification. You could ALWAYS do this by modifying the manifest configuration item in puppet.conf, but now each environment can have its own manifest setting. The default behavior is to have the Puppet master look for manifests/site.pp in every environment directory, and I really wouldn’t change that unless you have a good reason. DO NOTE, however, that if you’re using Puppet Enterprise, you’ll need to be careful with your site.pp file. Puppet Enterprise defines things like the Filebucket and overrides for the File resource in site.pp, so if you’re using Puppet Enterprise, you’ll need to copy those changes into the site.pp file you add into your control repo (as I did).

It may take you a couple of times to change your thinking from looking at the main site.pp in $confdir/manifests to looking at each environment-specific site.pp file, but definitely take advantage of Puppet’s commandline tool to help you track which site.pp Puppet is monitoring:

1
2
3
4
5
[root@master /etc/puppetlabs/puppet/environments]# puppet config print manifest
/etc/puppetlabs/puppet/environments/production/manifests

[root@master /etc/puppetlabs/puppet/environments]# puppet config print manifest --environment webinar_env
/etc/puppetlabs/puppet/environments/webinar_env/manifests

You can see that puppet config print can be used to get the path to the directory that contains site.pp. Even cooler is what happens when you specify an environment that doesn’t exist:

1
2
[root@master /etc/puppetlabs/puppet/environments]# puppet config print manifest --environment bad_env
no_manifest

Yep, Puppet tells you if it can’t find the manifest file. That’s pretty cool.

Wrapping Up

Even though the new implementation of directory environments is meant to map closely to a workflow most of us have been using (if you’ve been using R10k, that is), there are still some new features that may take you by surprise. Hopefully this post gets you started with just enough information to setup your own test environment and start playing. PLEASE DO make sure to file bugs on any behavior that comes as unexpected or stops you from using your existing workflow. Cheers!

On R10k and ‘Environments’

There have been more than a couple of moments where I’m on-site with a customer who asks a seemingly simple question and I’ve gone “Oh shit; that’s a great question and I’ve never thought of that…” Usually that’s followed by me changing up the workflow and immediately regretting things I’ve done on prior gigs. Some people call that ‘agile’; I call it ‘me not having the forethought to consider conditions properly’.

‘Environment’, like ‘scaling’, ‘agent’, and ‘test’, has many meanings

It’s not a secret that we’ve made some shitty decisions in the past with regard to naming things in Puppet (and anyone who asks me what puppet agent -t stands for usually gets a heavy sigh, a shaken head, and an explanation emitted in dulcet, apologetic tones). It’s also very easy to conflate certain concepts that unfortunately share very common labels (quick – what’s the difference between properties and parameters, and give me the lowdown on MCollective agents versus Puppet agents!).

And then we have ‘environments’ + Hiera + R10k.

Puppet ‘environments’

Puppet has the concept of ‘environments’, which, to me, exist to provide a means of compiling a catalog using different paths to Puppet modules on the Puppet master. Using a Puppet environment is the same as saying “I made some changes to my tomcat class, but I don’t want to push it DIRECTLY to my production machines yet because I don’t drink Dos Equis. It would be great if I could stick this code somewhere and have a couple of my nodes test how it works before merging it in!”

Puppet environments suffer some ‘seepage’ issues, which you can read about here, but do a reasonable job of quickly testing out changes you’ve made to the Puppet DSL (as opposed to custom plugins, as detailed in the bug). Puppet environments work well when you need a pipeline for testing your Puppet code (again, when you’re refactoring or adding new functionality), and using them for that purpose is great.

Internal ‘environments’

What I consider ‘internal environments’ have a couple of names – sometimes they’re referred to as application or deployment gateways, sometimes as ‘tiers’, but in general they’re long-term groupings that machines/nodes are attached to (usually for the purpose of phased-out application deployments). They frequently have names such as ‘dev’, ‘test’, ‘prod’, ‘qa’, ‘uat’, and the like.

For the purpose of distinguishing them from Puppet environments, I’m going to refer to them as ‘application tiers’ or just ‘tiers’ because, fuck it, it’s a word.

Making both of them work

The problems with having Puppet environments and application tiers are:

  • Puppet environments are usually assigned to a node for short periods of time, while application tiers are usually assigned to a node for the life of the node.
  • Application tiers usually need different bits of data (i.e. NTP server addresses, versions of packages, etc), while Puppet environments usually use/involve differences to the Puppet DSL.
  • Similarly to the first point, the goal of Puppet environments is to eventually merge code differences into the main production Puppet environment. Application tiers, however, may always have differences about them and never become unified.

You can see where this would be problematic – especially when you might want to do things like use different Hiera values between different application tiers, but you want to TEST out those values before applying them to all nodes in an application tier. If you previously didn’t have a way to separate Puppet environments from application tiers, and you used R10k to generate Puppet environments, you would have things like long-term branches in your repositories that would make it difficult/annoying to manage.

NOTE: This is all assuming you’re managing component modules, Hiera data, and Puppet environments using R10k.

The first step in making both monikers work together is to have two separate variables in Puppet – namely $environment for Puppet environments, and something ELSE (say, $tier) for the application tier. The “something else” is going to depend on how your workflow works. For example, do you have something centrally that can correlate nodes to the tier in which they belong? If so, you can write a custom fact that will query that service. If you don’t have this magical service, you can always just attach an application tier to a node in your classification service (i.e. the Puppet Enterprise Console or Foreman). Failing both of those, you can look to external facts. External Fact support was introduced into Facter 1.7 (but Puppet Enterprise has supported them through the standard lib for quite awhile). External facts give you the ability to create a text file inside the facts.d directory in the format of:

1
2
tier=qa
location=portland

Facter will read this text file and store the values as facts for a Puppet run, so $tier will be qa and $location will be portland. This is handy for when you have arbitrary information that can’t be easily discovered by the node, but DOES need to be assigned for the node on a reasonably consistent basis. Usually these files are created during the provisioning process, but can also be managed by Puppet. At any rate, having $environment and $tier available allow us to start to make decisions based on the values.

Branch with $environment, Hiera with $tier

Like we said above, Puppet environments are frequently short-term assignments, while application tiers are usually long-term residencies. Relating those back to the R10k workflow: branches to the main puppet repo (containing the Puppetfile) are usually short-lived, while data in Hiera is usually longer-lived. It would then make sense that the name of the branches to the main puppet repo would resolve to being $environment (and thus the Puppet environment name), and $tier (and thus the application tier) would be used in the Hiera hierarchy for lookups of values that would remain different across application tiers (like package versions, credentials, and etc…).

Wins:

  • Puppet environment names (like repository branch names) become relatively meaningless and are the “means” to the end of getting Puppet code merged into the PUPPET CODE’s production branch (i.e. code that has been tested to work across all application tiers)
  • Puppet environments become short lived and thus have less opportunity to deviate from the main production codebase
  • Differences across application tiers are locked in one place (Hiera)
  • Differences to Puppet DSL code (i.e. in Manifests) can be pushed up to the profile level, and you have a fact ($tier) to catch those differences.

The ultimate reason why I’m writing about this is because I’ve seen people try to incorporate both the Puppet environment and application tier into both the environment name and/or the Hiera hierarchy. Many times, they run into all kinds of unscalable issues (large hierarchies, many Puppet environments, confusing testing paths to ‘production’). I tend to prefer this workflow choice, but, like everything I write about, take it and model it toward what works for you (because what works now may not work 6 months from now).

Thoughts?

Like I said before, I tend to discover new corner cases that change my mind on things like this, so it’s quite possible that this theory isn’t the most solid in the world. It HAS helped out some customers to clean up their code and make for a cleaner pipeline, though, and that’s always a good thing. Feel free to comment below – I look forward to making the process better for all!

Building a Functional Puppet Workflow Part 3b: More R10k Madness

In the last workflows post, I talked about dynamic Puppet environments and introduced R10k, which is an awesome tool for mapping modules to their environments which are dynamically generated by git branches. I didn’t get out everything I wanted to say because:

  • I was tired of that post sitting stale in a Google Doc
  • It was already goddamn long

So because of that, consider this a continuation of that previous monstrosity that talks about additional uses of R10k beyond the ordinary

Let’s talk Hiera

But seriously, let’s not actually talk about what Hiera does since there are better docs out there for that. I’m also not going to talk about WHEN to use Hiera because I’ve already done that before. Instead, let’s talk about a workflow for submitting changes to Hiera data and testing it out before it enters into production.

Most people store their Hiera data (if they’re using a backend that reads Hiera data from disk anyways) in separate repos as their Puppet repo. Some DO tie the Hiera datadir folder to something like the main Puppet repo that houses their Puppetfie (if they’re using R10k), but for the most part it’s a separate repo because you may want separate permissions for accessing that data. For the purposes of this post, I’m going to refer to a repository I use for storing Hiera data that’s out on Github.

The next logical step would be to integrate that Hiera repo into R10k so R10k can track and create paths for Hiera data just like it did for Puppet.

NOTE: Fundamentally, all that R10k does is checkout modules to a specific path whose folder name comes from a git branch. PUPPET ties its environment to this folder name with some puppet.conf trickery. So, to say that R10k “creates dynamic environments” is the end-result, but not the actual job of the tool.

We COULD add Hiera’s repository to the /etc/r10k.yaml file to track and create folders for us, and if we did it EXACTLY like we did for Puppet we would most definitely run into this R10k bug (AND, it comes up again in this bug).

UPDATE: So, I originally wrote this post BEFORE R10k version 1.1.4 was released. Finch released version 1.1.4 which FIXES THESE BUGS…so the workflow I’m going to describe (i.e. using prefixing to solve the problem of using multiple repos in /etc/r10k.yaml that could possibly share branch names) TECHNICALLY does NOT need to be followed ‘to the T’, as it were. You can disable prefixing when it comes to that step, and modify /etc/puppetlabs/puppet/hiera.yaml so you don’t prepend ‘hiera_’ to the path of each environment’s folder, and you should be totally fine…you know, as long as you use version 1.1.4 or greater of R10k. So, be forewarned

The issue is those bugs is that R10k collects the names of ALL the environments from ALL the sources at once, so if you have multiple source repositories and they share branch names, then you have clashes (since it only stores ONE branch name internally). The solution that Finch came up with was prefixing (or, prefixing the name of the branch with the name of the source). When you prefix, however, it creates a folder on-disk that matches the prefixed name (e.g. NameOfTheSource_NameOfTheBranch ). This is actually fine since we’ll catch it and deal with it, but you should be aware of it. Future versions of R10k may most likely deal with this in a different manner, so make sure to check out the R10k docs before blindly copying my code, okay? (Update: See the previous, bolded paragraph where I describe how Finch DID JUST THAT).

In the previous post I setup a file called r10k_installation.pp to setup R10k. Let’s revisit that manifest it and modify it for my Hiera repo:

/var/tmp/r10k_installation.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class { 'r10k':
  version           => '1.1.4',
  sources           => {
    'puppet' => {
      'remote'  => 'https://github.com/glarizza/puppet_repository.git',
      'basedir' => "${::settings::confdir}/environments",
      'prefix'  => false,
    },
    'hiera' => {
      'remote'  => 'https://github.com/glarizza/hiera_environment.git',
      'basedir' => "${::settings::confdir}/hiera",
      'prefix'  => true,
    }
  },
  purgedirs         => ["${::settings::confdir}/environments"],
  manage_modulepath => true,
  modulepath        => "${::settings::confdir}/environments/\$environment/modules:/opt/puppet/share/puppet/modules",
}

NOTE: For the duration of this post, I’ll be referring to Puppet Enterprise specific paths (like /etc/puppetlabs/puppet for $confdir). Please do the translation for open source Puppet, as R10k will work just fine with either the open source edition or the Enterprise edition of Puppet

You’ll note that I added a source called ‘hiera’ that tracks my Hiera repository, creates sub-folders in /etc/puppetlabs/puppet/hiera, and enables prefixing to deal with the bug I mentioned in the previous paragraph. Now, let’s run Puppet and do an R10k synchronization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
[root@master1 garysawesomeenvironment]# puppet apply /var/tmp/r10k_installation.pp
Notice: Compiled catalog for master1 in environment production in 1.78 seconds
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/content: content changed '{md5}c686917fcb572861429c83f1b67cfee5' to '{md5}69d38a14b5de0d9869ebd37922e7dec4'
Notice: Finished catalog run in 1.24 seconds

[root@master1 puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_testing
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment hiera_master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/hiera

[root@master1 puppet]# ls /etc/puppetlabs/puppet/hiera
hiera_master  hiera_production  hiera_testing

[root@master1 puppet]# ls /etc/puppetlabs/puppet/environments/
development  garysawesomeenvironment  master  production

Great, so it configured R10k to clone the Hiera repository to /etc/puppetlabs/puppet/hiera like we wanted it to, and you can see that with prefixing enabled we have folders named “hiera_${branchname}”.

In Puppet, the magical connection that maps these subfolders to Puppet environments is in puppet.conf, but for Hiera that’s the hiera.yaml file. I’ve included that file in my Hiera repo, so let’s look at the copy at /etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml:

/etc/puppetlabs/puppet/hiera/hiera_production/hiera.yaml
1
2
3
4
5
6
7
8
9
10
---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - global

:yaml:
  :datadir: '/etc/puppetlabs/puppet/hiera/hiera_%{environment}/hieradata'

The magical line is in the :datadir: setting of the :yaml: section; it uses %{environment} to evaluate the environment variable set by Puppet and set the path accordingly.

As of right now R10k is configured to clone Hiera data from a known repository to /etc/puppetlabs/puppet/hiera, to create sub-folders based on branches to that repository, and to tie data provided to each Puppet environment to the respective subfolder of /etc/puppetlabs/puppet/hiera that matches the pattern of “hiera_(environment_name)”.

The problem with hiera.yaml

You’ll notice that each subfolder to /etc/puppetlabs/puppet/hiera contains its own copy of hiera.yaml. You’re probably drawing the conclusion that each Puppet environment can read from its own hiera.yaml for Hiera configuration.

And you would be wrong.

For information on this bug, check out this link. You’ll see that we provide a ‘hiera_config’ configuration option in Puppet that allows you to specify the path to hiera.yaml, but Puppet loads that config as singleton, which means that it’s read initially when the Puppet master process starts up and it’s NOT environment-aware. The workaround is to use one hiera.yaml for all environments on a Puppet master but to dynamically change the :datadir: path according to the current environment (in the same way that dynamic Puppet environments abuse ‘$environment’ in puppet.conf). You gain the ability to have per-environment changes to Hiera data but lose the ability to do things like using different hierarchies for different environments. As of right now, if you want a different hierarchy then you’re going to need to use a different master (or do some hacky things that I don’t even want to BEGIN to approach in this article).

In summary – there will be a hiera.yaml per environment, but they will not be consulted on a per-environment basis.

Workflow for per-environment Hiera data

Looking back on the previous post, you’ll see that the workflow for updating Hiera data is identical to the workflow for updating code to your Puppet environments. Namely, to create a new environment for testing Hiera data, you will:

  • Push a branch to the Hiera repository and name it accordingly (remembering that the name you choose will be a new environment).
  • Run R10k to synchronize the data down to the Puppet master
  • Add your node to that environment and test out the changes

For existing environments, simply push changes to that environment’s branch and repeat the last two steps.

NOTE: Puppet environments and Hiera environments are linked – both tools use the same ‘environment’ concept and so environment names MUST match for the data to be shared (i.e. if you create an environment in Puppet called ‘yellow’, you will need a Hiera environment called ‘yellow’ for that data).

This tight-coupling can cause issues, and will ultimately mean that certain branches are longer-lived than others. It’s also the reason why I don’t use defaults in my hiera() lookups inside Puppet manifests – I WANT the early failure of a compilation error to alert me of something that needs fixed.

You will need to determine whether this tight-coupling is worth it for your organization to tie your Hiera repository directly into R10k or to handle it out-of-band.

R10k and monolithic module repositories

One of the first requirements you encounter when working with R10k is that your component modules need to be stored in their own repositories. That convention is still relatively new – it wasn’t so long ago that we were recommending that modules be locked away in a giant repo. Why?

  • It’s easier to clone
  • The state of module reusability was poor

The main reason was that it was easier to put everything in one repo and clone it out on all your Puppet master servers. This becomes insidious as your module count rises and people start doing lovely things like committing large binaries into modules, pulling in old versions of modules they find out on the web, and the like. It also becomes an issue when you start needing to lock committers out of specific directories due to sensitive data, and blah blah blah blah…

There are better posts out there justifying/villafying the choice of one or multiple repositories, this section’s meant only to show you how to incorporate a single repository containing multiple modules into your R10k workflow.

From the last post you’ll remember that the Puppetfile allows you to tie a repository, and some version reference, to a directory using R10k. Incorporating a monolithic repository starts with an entry in the Puppetfile like so:

Puppetfile
1
2
3
mod "my_big_module_repo",
  :git => "git://github.com/glarizza/my_big_module_repo.git",
  :ref => '1.0.0'

NOTE: That git repository doesn’t exist. I don’t HAVE a monolithic repo to demonstrate, so I’ve chosen an arbitrary URI. Also note that you can use ANY name you like after the mod syntax to name the resultant folder – it doesn’t HAVE to mirror the URI of the repository.

Adding this entry to the Puppetfile would checkout that repository to wherever all the other modules are checked out with a folder name of ‘my_big_module_repo’. Within that folder would most-likely (again, depending on how you’ve laid out your repository) contain subfolders containing Puppet modules. This entry gets the modules onto your Puppet master, but it doesn’t make Puppet aware of their location. For that, we’re going to need to add an entry to the ‘modulepath’ configuration item in puppet.conf

Inside /etc/puppetlabs/puppet/puppet.conf you should see a configuration item called ‘modulepath’ that currently has a value of:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules

The modulepath itself works like a PATH environment variable in Linux – it’s a priority-based lookup mechanism that Puppet uses to find modules. Currently, Puppet will first look in /etc/puppetlabs/puppet/environments/$environment/modules for a module. If a the module that Puppet was looking for was found, Puppet will use it and not inspect the second path. If the module was not found at the FIRST path, it will inspect the second path. Failing to find the module at the second path results in a compilation error for Puppet. Using this to our advantage, we can add the path to the monolithic repository checked-out by the Puppetfile AFTER the path to where all the individual modules are checked-out. This should look something like this:

1
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/etc/puppetlabs/puppet/environments/$environment/modules/my_big_module_repo:/opt/puppet/share/puppet/modules

Note: This assumes all modules are in the root of the monolithic repo. If they’re in a subdirectory, you must adjust accordingly

That’s a huge line (and if you’re afraid of anything over 80 column-widths then I’m sorry…and you should probably buy a new monitor…and the 80s are over), but the gist is that we’re first going to look for modules checked out by R10k, THEN we’re going to look for modules in our monolithic repo, then we’re going to look in Puppet Enterprise’s vendored module directory, and finally, like I said above, we’ll fail if we can’t find our module. This will allow you to KEEP using your monolithic repository and also slowly cut modules inside that monolithic repo over to their own repositories (since when they gain their own repository, they will be located in a path that COMES before the monolithic repo, and thus will be given priority).

Using MCollective to perform R10k synchronizations

This section is going to be much less specific than the rest because the piece that does the ACTION is part of a module for R10k. As of the time of this writing, this agent is in one state, but that could EASILY change. I will defer to the module in question (and specifically its README file) should you need specifics (or if my module is dated). What I CAN tell you, however, is that the R10k module does come with a class that will setup and configure both an MCollective agent for R10k and also a helper application that should make doing R10k synchroniations on multiple Puppet masters much easier than doing them by hand. First, you’ll need to INSTALL the MCollective agent/application, and you can do that by pulling down the module and its dependencies, and classifying all Puppet masters with R10k enabled by doing the following:

1
include r10k::mcollective

Terribly difficult, huh? With that, both the MCollective agent and application should be available to MCollective on that node. The way to trigger a syncronization is to login to an account on a machine that has MCollective client access (in Puppet Enterprise, this would be any Puppet master that’s allowed the role, and then, specifically, the peadmin user…so doing a su - peadmin should afford you access to that user), and perform the following command:

1
mco r10k deploy

This is where the README differs a bit, and the reason for that is because Finch changed the syntax that R10k uses to synchronize and deploy modules to a Master. The CURRENTLY accepted command (because, knowing Finch, that shit might change) is r10k deploy environment -p, and the action to the MCollective agent that EXECUTES that command is the ‘deploy’ action. The README refers to the ‘synchronize’ action, which executes the r10k synchronize command. This command MAY STILL WORK, but it’s deprecated, and so it’s NOT recommended to be used.

Like I said before, this agent is subject to change (mainly do to R10k command deprecation and maturation), so definitely refer to the README and the code itself for more information (or file issues and pull requests on the module repo directly).

Tying R10k to CI workflows

I spent a year doing some presales work for the Puppet Labs SE team, so I can hand-wave and tapdance like a motherfucker. I’m going to need those skills for this next section, because if you thought the previous section glossed over the concepts pretty quickly and without much detail, then this section is going to feel downright vaporous (is that a word? Fuck it; I’m handwaving – it’s a word). I really debated whether to include the following sections in this post because I don’t really give you much specific information; it’s all very generic and full of “ideas” (though I do list some testing libraries below that are helpful if you’ve never heard of them). Feel free to abandon ship and skip to the FINAL section right now if you don’t want to hear about ‘ideas’.

For the record, I’m going to just pick and use the term “CI” when I’m referring to the process of automating the testing and deployment of, in this case, Puppet code. There have definitely been posts arging about which definition is more appropriate, but, frankly, I’m just going to pick a term and go with it,

The issue at hand is that when you talk “CI” or “CD” or “Continuous (fill_in_the_blank)”, you’re talking about a workflow that’s tailored to each organization (and sometimes each DEPARTMENT of an organization). Sometimes places can agree on a specific tool to assist them with this process (be it Jenkins, Hudson, Bamboo, or whatever), but beyond that it’s anyone’s game.

Since we’re talking PUPPET code, though, you’re restricted to certain tasks that will show up in any workflow…and THAT is what I want to talk about here.

To implement some sort of CI workflow means laying down a ‘pipeline’ that takes a change of your Puppet code (a new module, a change to an existing module, some Hiera data updates, whatever) from the developer’s/operations engineer’s workstation right into production. The way we do this with R10k currently is to:

  • Make a change to an individual module
  • Commit/push those changes to the module’s remote repository
  • Create a test branch of the puppet_repository
  • Modify the Puppetfile and tie your module’s changes to this environment
  • Commit/push those changes to the puppet_repository
  • Perform an R10k synchronization
  • Test
  • Repeat steps 1-7 as necessary until shit works how you like it
  • Merge the changes in the test branch of the puppet_repository with the production branch
  • Perform an R10k synchronization
  • Watch code changes become active in your production environment

Of those steps, there’s arguably about 3 unique steps that could be automated:

  • R10k synchronizations
  • ‘Testing’ (whatever that means)
  • Merging the changes in the test branch of the puppet_repository with the production branch

NOTE: As we get progressively-more-handwavey (also probably not a word, but fuck it – let’s be thought leaders and CREATE IT), each one of these steps is going to be more and more…generic. For example – to say “test your code” is a great idea, but, seriously, defining how to do that could (and should) be multiple blog posts.

Laying down the pipeline

If I were building an automated workflow, the first thing I would do is setup something like Jenkins and configure it to watch the puppet_repository that contains the Puppetfile mapping all my modules and versions to Puppet environments. On changes to this repository, we want Jenkins to perform an R10k synchronization, run tests, and then, possibly, merge those changes into production (depending on the quality of your tests and how ‘webscale’ you think you are on that day).

R10k synchronizations

If you’re paying attention, we solved this problem in the previous section with the R10k MCollective agent. Jenkins should be running on a machine that has the ability to execute MCollective client commands (such as triggering mco r10k deploy when necessary). You’ll want to tailor your calls from Jenkins to only deploy environments it’s currently testing (remember in the puppet_repository that topic branches map to Puppet environments, so this is a per-branch action) as opposed to deploying ALL environments every time.

Also, if you’re buiding a pipeline, you might not want to do R10k synchronizations on ALL of your Puppet Masters at this point. Why not? Well, if your testing framework is good enough and has sufficient coverage that you’re COMPLETELY trusting it to determine whether code is acceptable or not, then this is just the FIRST step – making the code available to be tested. It’s not passed tests yet, so pushing it out to all of your Puppet masters is a bit wasteful. You’ll probably want to only synchronize with a single master that’s been identified for testing (and a master that has the ability to spin up fresh nodes, enforce the Puppet code on them, submit those nodes to a battery of tests, and then tear them down when everything has been completed).

If you’re like the VAST majority of Puppet users out there that DON’T have a completely automated testing framework that has such complete coverage that you trust it to determine whether code changes are acceptable or not, then you’re probably ‘testing’ changes manually. For these people, you’ll probably want to synchronize code to whichever Puppet master(s) are suitable.

The cool thing about these scenarios is that MCollective is flexible enough to handle this. MCollective has the ability to filter your nodes based on things like available MCollective agents, Facter facts, Puppet classes, and even things like the MD5 hashes of arbitrary files on the filesystem…so however you want to restrict synchronization, you can do it with MCollective.

After all of that, the answer here is “Use MCollective to do R10k syncs/deploys.”

Testing

This section needs its own subset of blog posts. There are all kinds of tools that will allow you to test all sorts of things about your Puppet code (from basic syntax checking and linting, to integration tests that check for the presence of resources in the catalog, to acceptance-level tests that check the end-state of the system to make sure Puppet left it in a state that’s acceptable). The most common tools for these types of tests are:

Unfortunately, the point of this section is NOT to walk you through setting up one or more of those tools (I’d love to write those posts soon…), but rather to make you aware of their presence and identify where they fit in our Pipeline.

Once you’ve synchronized/deployed code changes to a specific machine (or subset of machines), the next step is to trigger tests.

Backing up the train a bit, certain kinds of ‘tests’ should be done WELL in advance of this step. For example, if code changes don’t even pass basic syntax checking and linting, they shouldn’t even MAKE it into your repository. Things like pre-commit hooks will allow you to trigger syntactical checks and linting before a commit is allowed. We’re assuming you’ve already set those up (and if you’ve NOT, then you should probably do that RIGHT NOW).

Rather, in this section, we’re talking about doing some basic integration smoke testing (i.e. running the rspec-puppet tests on all the modules to ensure that what we EXPECT in the catalog is actually IN the catalog), moving into acceptance level testing (i.e. spinning up pristine/clean nodes, actually applying the Puppet code to the nodes, and then running things like Beaker or Serverspec on the nodes to check the end-state of things like services, open ports, configuration files, and whatever to ensure that Puppet ACTUALLY left the system in a workable state), and then returning a “PASS” or “FAIL” response to Jenkins (or whatever is controlling your pipeline).

These tests can be as thorough or as loose as is acceptable to you (obviously, the goal is to automate ALL of your tests so you don’t have to manually check ANY changes, but that’s the nerd-nirvana state where we’re all browsing the web all day), but they should catch the most NOTORIOUS and OBVIOUS things FIRST. Follow the same rules you did when you got started with Puppet – catch the things that are easiest to catch and start building up your cache of “Total Time Saved.”

Jenkins needs to be able to trigger these tests from wherever it’s running, so your Jenkins box needs the ability to, say, spin up nodes in ESX, or locally with something like Vagrant, or even cloud nodes in EC2 or GCE, then TRIGGER the tests, and finally get a “PASS” or “FAIL” response back. The HARDEST part here, by far, is that you have to define what level of testing you’re going to implement, how you’re going to implement it, and devise the actual process to perform the testing. Like I said before, there are other blog posts that talk about this (and I hope to tackle this topic in the very near future), so I’ll leave it to them for the moment.

To merge or not to merge

The final step for any test code is to determine whether it should be merged into production or not. Like I said before, if your tests are sufficient and are adequate at determining whether a change is ‘good’ or not, then you can look at automating the process of merging those changes into production and killing off the test branch (or, NOT merging those changes, and leaving the branch open for more changes).

Automatically merging is scary for obvious reasons, but it’s also a good ‘test’ for your test coverage. Committing to a ‘merge upon success’ workflow takes trust, and there’s absolutely no shame in leaving this step to a human, to a change review board, or to some out-of-band process.

Use your illusion

These are the most common questions I get asked after the initial shock of R10k, and its workflow, wears off. Understand that I do these posts NOT from a “Here’s what you should absolutely be doing!” standpoint, but more from a “Here’s what’s going on out there.” vantage. Every time I’m called on-site with a customer, I evaluate:

  • The size and experience level of the team involved
  • The processes that the team must adhere to
  • The Puppet experience level of the team
  • The goals of the team

Frankly, after all those observations, sometimes I ABSOLUTELY come to the conclusion that something like R10k is entirely-too-much process for not-enough benefit. For those who are a fit, though, we go down the checklists and tailor the workflow to the environment.

What more IS there on R10k?

I do have at least a couple of more posts in me on some specific issues I’ve hit when consulting with companies using R10k, such as:

  • How best to use Hiera and R10k with Puppet ‘environments’ and internal, long-term ‘environments’
  • Better ideas on ‘what to branch and why’ with regard to component modules and the puppet_repository
  • To inherit or not to inherit with Roles
  • How to name things (note that I work for Puppet Labs, so I’m most likely very WRONG with this section)
  • Other random things I’ve noticed…

Also, I apologize if it’s been awhile since I’ve replied to a couple of comments. I’m booked out 3 months in advance and things are pretty wild at the moment, but I’m REALLY thankful of everyone who cares enough to drop a note, and I hope I’m providing some good info you can actually use! Cheers!

Building a Functional Puppet Workflow Part 3: Dynamic Environments With R10k

Workflows are like kickball games: everyone knows the general idea of what’s going on, there’s an orderly progression towards an end-goal, nobody wants to be excluded, and people lose their shit when they get hit in the face by a big rubber ball. Okay, so maybe it’s not a perfect mapping but you get the idea.

The previous two posts (one and two) focused on writing modules, wrapping modules, and classification. While BOTH of these things are very important in the grand scheme of things, one of the biggest problems people get hung-up on is how do you iterate upon your modules, and, more importantly, how do you eventually get these changes pushed into production in a reasonably orderly fashion?

This post is going to be all over the place. We’re gonna cover the idea of separate environments in Puppet, touch on dynamic environments, and round it out with that mother-of-a-shell-script-turned-personal-savior, R10k. Hold on to your shit.

Puppet Environments

Puppet has the concept of ‘environments’ where you can logically separate your modules and manifest (read: site.pp) into separate folders to allow for nodes to get entirely separate bits of code based on which ‘environment’ the node belongs to.

Puppet environments are statically set in puppet.conf, but, as other blog posts have noted, you can do some crafty things in puppet.conf to give you the solution of having ‘dynamic environments’.

NOTE: The solutions in this post are going to rely on Puppet environments, however environments aren’t without their own shortcomings namely, this bug on Ruby plugins in Puppet). For testing and promoting Puppet classes written in the DSL, environments will help you out greatly. For complete separation of Ruby instances and any plugins to Puppet written in Ruby, however, you’ll need separate masters (which is something that I won’t be covering in this article).

One step further – ‘dynamic’ environments

Adrien Thebo, hitherto known as ‘Finch’, – who is known for building awesome things and talking like he’s fresh from a Redbull binge – created the now-famous blog post on creating dynamic environments in Puppet with git. That post relied upon a post-commit hook to do all the jiggery-pokery necessary to checkout the correct branches in the correct places, and thus it had a heavy reliance upon git.

Truly, the only magic in puppet.conf was the inclusion of ‘$environment’ in the modulepath configuration entry on the Puppet master (literally that string and not the evaluated form of your environment). By doing that, the Puppet master would replace the string ‘$environment’ with the environment of the node checking in and would look to that path for Puppet manifests and modules. If you use something OTHER than git, it would be up to you to create a post-receive hook that populated those paths, but you could still replicate the results (albiet with a little work on your part).

People used this pattern and it worked fairly well. Hell, it STILL works fairly well, nothing has changed to STOP you from using it. What changed, however, was the ecosystem around modules, the need for individual module testing, and the further need to automate this whole goddamn process.

Before we deliver the ‘NEW SOLUTION’, let’s provide a bit of history and context.

Module repositories: the one-to-many problem

I touched on this topic in the first post, but one of the first problems you encounter when putting your modules in version control is whether or not to have ONE GIANT REPO with all of your modules, or a repository for every module you create. In the past we recommended putting every module in one repository (namely because it was easier, the module sharing landscape was pretty barren, and teams were smaller). Now, we recommend the opposite for the following reasons:

  • Individual repos mean individual module development histories
  • Most VCS solutions don’t have per-folder ACLs for a single repositories; having multiple repos allows per-module security settings.
  • With the one-repository-per-module solution, modules you pull down from the Forge (or Github) must be committed to your repo. Having multiple repositories for each module allow you to keep everything separate
  • Publishing this module to the Forge (or Github/Stash/whatever) is easier with separate repos (rather than having to split-out the module later).

The problem with having a repository for every Puppet module you create is that you need a way to map every module with every Puppet master (and, also which version of every module should be installed in which Puppet environment).

A project called librarian-puppet sprang up that created the ‘Puppetfile’, a file that would map modules and their versions to a specific directory. Librarian was awesome, but, as Finch noted in his post, it had some shortcomings when used in an environment with many and fast-changing modules. His solution, that he documented here,, was the tool we now come to know as R10k.

Enter R10k

R10k is essentially a Ruby project that wraps a bunch of shell commands you would NORMALLY use to maintain an environment of ever-changing Puppet modules. Its power is in its ability to use Git branches combined with a Puppetfile to keep your Puppet environments in-sync. Because of this, R10k is CURRENTLY restricted to git. There have been rumblings of porting it to Hg or svn, but I know of no serious attempts at doing this (and if you ARE doing this, may god have mercy on your soul). Great, so how does it work?

Well, you’ll need one main repository SIMPLY for tracking the Puppetfile. I’ve got one right here, and it only has my Puppetfile and a site.pp file for classification (should you use it).

NOTE: The Puppetfile and librarian-puppet-like capabilities under the hood are going to be doing most of the work here – this repository is solely so you can create topic branches with changes to your Puppetfile that will eventually become dynamically-created Puppet environments.

Let’s take a look at the Puppetfile and see what’s going on:

Puppetfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
forge "http://forge.puppetlabs.com"

# Modules from the Puppet Forge
mod "puppetlabs/stdlib"
mod "puppetlabs/apache", "0.11.0"
mod "puppetlabs/pe_gem"
mod "puppetlabs/mysql"
mod "puppetlabs/firewall"
mod "puppetlabs/vcsrepo"
mod "puppetlabs/git"
mod "puppetlabs/inifile"
mod "zack/r10k"
mod "gentoo/portage"
mod "thias/vsftpd"


# Modules from Github using various references
mod "wordpress",
  :git => "git://github.com/hunner/puppet-wordpress.git",
  :ref => '0.4.0'

mod "property_list_key",
  :git => "git://github.com/glarizza/puppet-property_list_key.git",
  :ref => '952a65d9ea2c5809f4e18f30537925ee45548abc'

mod 'redis',
  :git => 'git://github.com/glarizza/puppet-redis',
  :ref => 'feature/debian_support'

This example lists the syntax for dealing with modules from both the Forge and Github, as well as pulling specific versions of modules (whether versions in the case of the Forge, or Github references as tags, branches, or even specific commits). The syntax is not hard to follow – just remember that we’re mapping modules and their versions to a set/known environment.

For every topic branch on this repository (containing the Puppetfile), R10k will in turn create a Puppet environment with the same name. For this reason, it’s convention to rename the ‘master’ branch to ‘production’ since that’s the default environment in Puppet (note that renaming branches locally is easy – renaming the branch on Github can sometimes be a pain in the ass). You will also note why it’s going to be somewhat hard to map R10k to subversion, for example, due to the lack of lightweight branching schemes.

To explain any more of R10k reads just as if I were describing its installation, so let’s quit screwing around and actually INSTALL/SETUP the damn thing.

Setting up R10k

As I mentioned before, we have the main repository that will be used to track the Puppetfile, which in turn will track the modules to be installed (whether from The Forge, Github, or some internal git repo). Like any good Puppet component, R10k itself can be setup with a Puppet module. The module I’ll be using was developed by Zack Smith, and is pretty simple to get started. Let’s download it from the forge first:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@master1 vagrant]# puppet module install zack/r10k
Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└─┬ zack-r10k (v1.0.2)
  ├─┬ gentoo-portage (v2.1.0)
  │ └── puppetlabs-concat (v1.0.1)
  ├── mhuffnagle-make (v0.0.2)
  ├── puppetlabs-gcc (v0.1.0)
  ├── puppetlabs-git (v0.0.3)
  ├── puppetlabs-inifile (v1.0.1)
  ├── puppetlabs-pe_gem (v0.0.1)
  ├── puppetlabs-ruby (v0.1.0)
  └── puppetlabs-vcsrepo (v0.2.0)

The module will be installed into the first path in your modulepath, which in the case above is /etc/puppetlabs/puppet/modules. This modulepath will change due to the way we’re going to setup our dynamic Puppet environments. For this example, I’m going to have environments dynamically generated at /etc/puppetlabs/puppet/environments, so let’s create that directory first:

1
[root@master1 vagrant]# mkdir -p /etc/puppetlabs/puppet/environments

Now, we need to setup R10k on this machine. The module we downloaded will allow us to do that, but we’ll need to create a small Puppet manifest that will allow us to setup R10k out-of-band from a regular Puppet run (you CAN continuously-enforce R10k configuration in-band with your regular Puppet run, but if we’re setting up a Puppet master to use R10k to serve out dynamic environments it’s possible to create a chicken-and-egg situation.). Let’s generate a file called r10k_installation.pp in /var/tmp and have it look like the following:

/var/tmp/r10k_installation.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
class { 'r10k':
  version           => '1.1.3',
  sources           => {
    'puppet' => {
      'remote'  => 'https://github.com/glarizza/puppet_repository.git',
      'basedir' => "${::settings::confdir}/environments",
      'prefix'  => false,
    }
  },
  purgedirs         => ["${::settings::confdir}/environments"],
  manage_modulepath => true,
  modulepath        => "${::settings::confdir}/environments/\$environment/modules:/opt/puppet/share/puppet/modules",
}

So what is every section of that declaration doing?

  • version => '1.1.3' sets the version of the R10k gem to install
  • sources => {...} is a hash of sources that R10k is going to track. For now it’s only our main Puppet repo, but you can also track a Hiera installation too. This hash accepts key/value pairs for configuration settings that are going to be written to /etc/r10k.yaml, which is R10k’s main configuration file. The keys in-use are remote, which is the path to the repository to-be-checked-out by R10k, basedir, which is the path on-disk to where dynamic environments are to be created (we’re using the $::settings::confdir variable which maps to the Puppet master’s configuration directory, or /etc/puppetlabs/puppet), and prefix which is a boolean to determine whether to use R10k’s source-prefixing feature. NOTE: the false value is a BOOLEAN value, and thus SHOULD NOT BE QUOTED. Quoting it turns it into a string, which matches as a boolean TRUE value. Don’t quote false – that’s bad, mmkay.
  • purgedirs=> ["${::settings::confdir}/environments"] is configuring R10k to implement purging on the environments directory (so any folders that R10k doesn’t create it will delete). This configuration MAY be moot with newer versions of R10k as I believe it implements this behavior by default.
  • manage_modulepath => true will ensure that this module sets the modulepath configuration item in /etc/puppetlabs/puppet/puppet.conf
  • modulepath => ... sets the modulepath value to be dropped into /etc/puppetlabs/puppet/puppet.conf. Note that we are interpolating variables ($::settings::confdir again), AND inserting the LITERAL string of $environment into the modulepath – this is because Puppet will replace $environment with the value of the agent’s environment at catalog compilation.

JUST IN CASE YOU MISSED IT: Don’t quote the false value for the prefix setting in the sources block. That is all.

Okay, we have our one-time Puppet manifest, and now the only thing left to do is to run it:

1
2
3
4
5
6
7
[root@master1 tmp]# puppet apply /var/tmp/r10k_installation.pp
Notice: Compiled catalog for master1 in environment production in 2.05 seconds
Notice: /Stage[main]/R10k::Config/File[r10k.yaml]/ensure: defined content as '{md5}0b619d5148ea493e2d6a5bb205727f0c'
Notice: /Stage[main]/R10k::Config/Ini_setting[R10k Modulepath]/value: value changed '/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules' to '/etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules'
Notice: /Package[r10k]/ensure: created
Notice: /Stage[main]/R10k::Install::Pe_gem/File[/usr/bin/r10k]/ensure: created
Notice: Finished catalog run in 10.55 seconds

At this point, it goes without saying that git needs to be installed, but if you’re firing up a new VM that DOESN’T have git, then R10k is going to spit out an awesome error – so ensure that git is installed. After that, let’s synchronize R10k with the r10k deploy environment -pv command (-p for Puppetfile synchronization and -v for verbose mode):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
[root@master1 puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying make into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying concat into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying ruby into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying redis into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/master/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying property_list_key into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/development/modules
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

I ran this first synchronization with verbose mode so you can see exactly what’s getting copied where. Futher synchronizations don’t have to be in verbose mode, but it’s good for debugging. After all of that, we have an /etc/puppetlabs/puppet/environments folder containing our dynamic Puppet environments based off of the branches of the main Puppet repo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@master1 puppet]# ls -lah /etc/puppetlabs/puppet/environments/
total 20K
drwxr-xr-x 5 root root 4.0K Feb 19 11:44 .
drwxr-xr-x 7 root root 4.0K Feb 19 11:25 ..
drwxr-xr-x 4 root root 4.0K Feb 19 11:44 development
drwxr-xr-x 5 root root 4.0K Feb 19 11:43 master
drwxr-xr-x 5 root root 4.0K Feb 19 11:42 production

[root@master1 puppet]# cd /etc/puppetlabs/puppet/environments/production/
[root@master1 production]# git branch -a
  master
* production
  remotes/origin/HEAD -> origin/master
  remotes/origin/development
  remotes/origin/master
  remotes/origin/production

As you can see (at the time of this writing), my main Puppet repo has three main branches: development, master, and production, and so R10k created three Puppet environments matching those names. It’s somewhat of a convention to rename the master branch to production, but in this case I left it alone to demonstrate how this works.

ONE OTHER BIG GOTCHA: R10k does NOT resolve dependencies, and so it is UP TO YOU to track them in your Puppetfile. Check this out:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
[root@master1 production]# puppet module list
Warning: Module 'puppetlabs-firewall' (v1.0.0) fails to meet some dependencies:
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'puppetlabs-firewall' (v0.3.x)
Warning: Module 'puppetlabs-stdlib' (v4.1.0) fails to meet some dependencies:
  'puppetlabs-pe_accounts' (v2.0.1) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-pe_mcollective' (v0.1.14) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'puppetlabs-stdlib' (v3.2.x)
  'puppetlabs-request_manager' (v0.0.10) requires 'puppetlabs-stdlib' (v3.2.x)
Warning: Missing dependency 'cprice404-inifile':
  'puppetlabs-pe_puppetdb' (v0.0.11) requires 'cprice404-inifile' (>=0.9.0)
  'puppetlabs-puppet_enterprise' (v3.1.0) requires 'cprice404-inifile' (v0.10.x)
  'puppetlabs-puppetdb' (v1.5.1) requires 'cprice404-inifile' (>= 0.10.3)
Warning: Missing dependency 'puppetlabs-concat':
  'puppetlabs-apache' (v0.11.0) requires 'puppetlabs-concat' (>= 1.0.0)
  'gentoo-portage' (v2.1.0) requires 'puppetlabs-concat' (v1.0.x)
Warning: Missing dependency 'puppetlabs-gcc':
  'zack-r10k' (v1.0.2) requires 'puppetlabs-gcc' (>= 0.0.3)
/etc/puppetlabs/puppet/environments/production/modules
├── gentoo-portage (v2.1.0)
├── mhuffnagle-make (v0.0.2)
├── property_list_key (???)
├── puppetlabs-apache (v0.11.0)
├── puppetlabs-firewall (v1.0.0)  invalid
├── puppetlabs-git (v0.0.3)
├── puppetlabs-inifile (v1.0.1)
├── puppetlabs-mysql (v2.2.1)
├── puppetlabs-pe_gem (v0.0.1)
├── puppetlabs-ruby (v0.1.0)
├── puppetlabs-stdlib (v4.1.0)  invalid
├── puppetlabs-vcsrepo (v0.2.0)
├── redis (???)
├── ripienaar-concat (v0.2.0)
├── thias-vsftpd (v0.2.0)
├── wordpress (???)
└── zack-r10k (v1.0.2)
/opt/puppet/share/puppet/modules
├── cprice404-inifile (v0.10.3)
├── puppetlabs-apt (v1.1.0)
├── puppetlabs-auth_conf (v0.1.7)
├── puppetlabs-firewall (v0.3.0)  invalid
├── puppetlabs-java_ks (v1.1.0)
├── puppetlabs-pe_accounts (v2.0.1)
├── puppetlabs-pe_common (v0.1.0)
├── puppetlabs-pe_mcollective (v0.1.14)
├── puppetlabs-pe_postgresql (v0.0.5)
├── puppetlabs-pe_puppetdb (v0.0.11)
├── puppetlabs-postgresql (v2.5.0)
├── puppetlabs-puppet_enterprise (v3.1.0)
├── puppetlabs-puppetdb (v1.5.1)
├── puppetlabs-reboot (v0.1.2)
├── puppetlabs-request_manager (v0.0.10)
├── puppetlabs-stdlib (v3.2.0)  invalid
└── ripienaar-concat (v0.2.0)

I’ve installed Puppet Enterprise 3.1.0, and so /opt/puppet/share/puppet/modules reflects the state of the Puppet Enterprise (also known as ‘PE’) modules at that time. You can see that there are some conflicts because certain modules require certain versions of other modules. This is currently the nature of the beast with regard to Puppet modules. Some of these errors are loud and incidental (i.e. someone set a dependency on a version and forgot to update it), some are due to namespace changes (i.e. cfprice-inifile being ported over to puppetlabs-inifile), and so on. Basically, ensure that you handle the dependencies you care about inside the Puppetfile as R10k won’t do it for you.

There – we’ve done it! We’ve configured R10k! Now how the hell do you use it?

R10k demonstration – from module iteration to environment iteration

Let’s take the environment we’ve setup in the previous steps and walk you through adding a new module to your production environment, iterating upon that module, pushing the changes to that module, pushing the changes to a Puppet environment, and then promoting those changes to production.

NOTES ON THE SETUP OF THIS DEMO:

  • In this demonstration, classification method is going to be left to the user (i.e. it’s not a part of the magic). So, when I tell you to classify your node with a specific class, I don’t care if you use the Puppet Enterprise Console, site.pp, or any other manner.
  • I’m using Github for my repositories so that you folk watching and playing along at home can have something to follow. Feel free to substitute Github for something like Atlassian Stash/Bitbucket, internal repos, or whatever.

Add the module to an environment

The module we’ll be working with, a simple module called ‘notifyme’, will notify a message that will help us track the module’s process through all phases of iteration.

The first thing we need to do is to add the module to an environment, so let’s dynamically create a NEW environment by creating a new topic branch and pushing it up to the main puppet repo. I will perform this step on my laptop and outside of the VM I’m using to test R10k:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
└(~/src/puppet_repository)▷ git branch
  master
* production

└(~/src/puppet_repository)▷ git checkout -b notifyme
Switched to a new branch 'notifyme'

└(~/src/puppet_repository)▷ vim Puppetfile

# Perform the changes to Puppetfile here

└(~/src/puppet_repository)▷ git add Puppetfile
└(~/src/puppet_repository)▷ git commit
[notifyme 5239538] Add the 'notifyme' module
 1 file changed, 3 insertions(+)

└(~/src/puppet_repository)▷ git push origin notifyme:notifyme
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 348 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
 * [new branch]      notifyme -> notifyme

The contents I added to my Puppetfile look like this:

Puppetfile
1
2
mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git"

Perform an R10k synchronization

To pull the new dynamic environment down to the Puppet master, do another R10k synchronization with r10k deploy environment -pv:

1
2
3
4
5
6
7
8
[root@master1 production]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
<snip for brevity>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment notifyme
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/notifyme/modules
<more snipping>

I only included the relevant messages, but you can see that it pulled in a new environment called ‘notifyme’ that ALSO pulled in a module called ‘notifyme’

Rename the branch to avoid confusion

Suddenly I realize that this may get confusing having both an environment called ‘notifyme’ with a module/class called ‘notifyme’. No worries, how about we rename that branch?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
└(~/src/puppet_repository)▷ git branch -m notifyme garysawesomeenvironment

└(~/src/puppet_repository)▷ git push origin :notifyme
To https://github.com/glarizza/puppet_repository.git
 - [deleted]         notifyme

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 348 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
 * [new branch]      garysawesomeenvironment -> garysawesomeenvironment

That bit of git renamed the ‘notifyme’ branch to ‘garysawesomeenvironment’. The next git command is a bit tricky – when you git push to a remote, it’s supposed to be:

git push name_of_origin local_branch_name:remote_branch_name

In our case, the name of our origin is LITERALLY ‘origin’, but we actually want to DELETE a remote branch. The way to delete a local branch is with git branch -d branch_name, but the way to delete a REMOTE branch is to push NOTHING to it. So consider the following command:

git push origin :notifyme

We’re pushing to the origin named ‘origin’, but providing NO local branch name and pushing that bit of nothing to the remote branch of ‘notifyme’. This kills (deletes) the remote branch.

Finally, we push to our origin named ‘origin’ again and push the contents of the local branch ‘garysawesomeenvironment’ to the remote branch of ‘garysawesomeenvironment’ which in turn CREATES that branch if it doesn’t exist. Whew. Let’s run another damn synchronization:

1
2
3
4
5
6
7
8
[root@master1 production]# `r10k deploy environment -pv`
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
<more snippage>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying notifyme into /etc/puppetlabs/puppet/environments/garysawesomeenvironment/modules
<more of that snipping shit>
R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

Cool, let’s check out our environments folder on our VM:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@master1 production]# ls -lah /etc/puppetlabs/puppet/environments/
total 24K
drwxr-xr-x 6 root root 4.0K Feb 19 13:34 .
drwxr-xr-x 7 root root 4.0K Feb 19 12:09 ..
drwxr-xr-x 4 root root 4.0K Feb 19 11:44 development
drwxr-xr-x 5 root root 4.0K Feb 19 13:33 garysawesomeenvironment
drwxr-xr-x 5 root root 4.0K Feb 19 11:43 master
drwxr-xr-x 5 root root 4.0K Feb 19 11:42 production

[root@master1 production]# cd /etc/puppetlabs/puppet/environments/garysawesomeenvironment/

[root@master1 garysawesomeenvironment]# git branch
* garysawesomeenvironment
  master

Run Puppet to test the new environment

Perfect! Now classify your node to include the ‘notifyme’ class, and let’s run Puppet to see what we get when we try to join the environment called ‘garysawesomeenvronment’:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snipping facts loading for brevity>
Info: Caching catalog for master1
Info: Applying configuration version '1392845863'
Notice: This is the notifyme module and its master branch
Notice: /Stage[main]/Notifyme/Notify[This is the notifyme module and its master branch]/message: defined 'message' as 'This is the notifyme module and its master branch'
Notice: Finished catalog run in 11.10 seconds

Cool! Now let’s try to run Puppet with another environment, say ‘production’:

1
2
3
4
5
6
[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snipping facts loading for brevity>
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class notifyme for master1 on node master1
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

We get an error because that module hasn’t been loaded by R10k for that environment.

Tie a module version to an environment

Okay, so we added a module to a new environment, but what if we want to test out a specific commit, branch, or tag of a module and test it in this new environment? This is frequently what you’ll be doing – making a change to an existing module, pushing your change to a topic branch of that module’s repository, tying it to an environment (or creating a new environment by branching the main Puppet repository), and then testing the change.

Let’s go back to my ‘notifyme’ module that I’ve cloned to my laptop and push a change to a BRANCH of that module’s Github repository:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
└(~/src/puppet-notifyme)▷ git branch
* master

└(~/src/puppet-notifyme)▷ git checkout -b change_the_message
Switched to a new branch 'change_the_message'

└(~/src/puppet-notifyme)▷ vim manifests/init.pp
## Make changes to the notify message

└(~/src/puppet-notifyme)▷ git add manifests/init.pp

└(~/src/puppet-notifyme)▷ git commit
[change_the_message bc3975b] Change the Message
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin change_the_message:change_the_message
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 448 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
 * [new branch]      change_the_message -> change_the_message

└(~/src/puppet-notifyme)▷ git branch -a
* change_the_message
  master
  remotes/origin/change_the_message
  remotes/origin/master

└(~/src/puppet-notifyme)▷ git log
commit bc3975bb5c75ada86bfc2c45db628b5a156f85ce
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 13:55:26 2014 -0800

    Change the Message

    This commit changes the message to test my workflow.

What I’m showing you is the workflow that creates a new local branch called ‘change_the_message’ to the notifyme module, changes the message in my notify resource, commits the change, and pushes the changes to a remote branch ALSO called ‘change_the_message’.

Because I created a topic branch, I can provide that branch name in the Puppetfile located in the ‘garysawesomeenvironment’ branch of the main Puppet repo. THAT is the piece that ties together the specific version of the module with the Puppet environment we want on the Puppet master. Here’s that change:

Puppetfile
1
2
3
mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git",
  :ref => 'change_the_message'

Again, that change gets put into the ‘garysawesomeenvironment’ branch of the main Puppet repo and pushed up to the remote:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
└(~/src/puppet_repository)▷ vim Puppetfile
## Make changes

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[garysawesomeenvironment 89b139c] Update garysawesomeenvironment
 1 file changed, 2 insertions(+), 1 deletion(-)

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 411 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   5239538..89b139c  garysawesomeenvironment -> garysawesomeenvironment

└(~/src/puppet_repository)▷ git log -p
commit 89b139c8c2faa888a402b98ea76e4ca138b3463d
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 14:04:18 2014 -0800

    Update garysawesomeenvironment

    Tie this environment to the 'change_the_message' branch of my notifyme module.

diff --git a/Puppetfile b/Puppetfile
index 5e5d091..27fc06e 100644
--- a/Puppetfile
+++ b/Puppetfile
@@ -31,4 +31,5 @@ mod 'redis',
   :ref => 'feature/debian_support'

 mod "notifyme",
-  :git => "git://github.com/glarizza/puppet-notifyme.git"
+  :git => "git://github.com/glarizza/puppet-notifyme.git",
+  :ref => 'change_the_message'

Now let’s synchronize again!!

1
2
3
4
5
6
[root@master1 garysawesomeenvironment]# `r10k deploy environment -pv`
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
<snip>
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment garysawesomeenvironment
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
<snip>

Cool, let’s check our work on the VM:

1
2
3
4
5
[root@master1 garysawesomeenvironment]# pwd
/etc/puppetlabs/puppet/environments/garysawesomeenvironment
[root@master1 garysawesomeenvironment]# git branch
* garysawesomeenvironment
  master

And finally, let’s run Puppet:

1
2
3
4
5
6
7
8
root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392847743'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 12.10 seconds

TADA! We’ve successfully tied a specific version of a module to a specific dynamic environment, deployed it to a master, and tested it out! Smell that? That’s the smell of awesome. Or Jeff in the next cubicle eating a burrito. Either way, I like it.

Merge your changes with master/production

It’s green – fuck it; ship it! NOW you’re speaking ‘agile’! Assuming everything went according to plan, let’s merge our changes in with the production environment and synchronize. This is up to your company’s workflow docs (whether you use pull requests, a merge master, or poke Patrick and tell him to tell Andy to merge in your change). I’m using git and Github, so let’s merge.

First, do the Module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
└(~/src/puppet-notifyme)▷ git checkout master
Switched to branch 'master'

└(~/src/puppet-notifyme)▷ git merge change_the_message
Updating d44a790..bc3975b
Fast-forward
 manifests/init.pp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin master:master
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
   d44a790..bc3975b  master -> master

└(~/src/puppet-notifyme)▷ cat manifests/init.pp
class notifyme {
  notify { "This is the changed message in the change_the_message branch": }
}

So now we have an issue, and that issue is that the production environment has YET to have the ‘notifyme’ module added to it. If we merge the contents of the ‘garysawesomeenvironment’ branch with the ‘production’ branch of the main Puppet repo, then we’re going to be pointing at the ‘change_the_message’ branch of the ‘notifyme’ module (because that was our last commit).

Because of this, I can’t do a straight merge, can I? For posterity’s sake (in the event that someone in the future wants to look for that branch on my Github repo), I’m going to keep that branch alive. In a production environment, I most likely would NOT have additional branches open for all my component modules as that would get pretty annoying/confusing. Understand that this is a one-off case because I’m doing a demo. BECAUSE of this, I’m going to modify the Puppetfile in the ‘production’ branch of the main Puppet repo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
└(~/src/puppet_repository)▷ git checkout production
Switched to branch 'production'

└(~/src/puppet_repository)▷ vim Puppetfile
## Make changes here

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[production a74f269] Add notifyme module to Production environment
 1 file changed, 4 insertions(+)

└(~/src/puppet_repository)▷ git push origin production:production
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 362 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   5ecefc8..a74f269  production -> production

└(~/src/puppet_repository)▷ git log -p
commit a74f26975102f3786eedddace89bda086162d801
Author: Gary Larizza <gary@puppetlabs.com>
Date:   Wed Feb 19 14:24:05 2014 -0800

    Add notifyme module to Production environment

diff --git a/Puppetfile b/Puppetfile
index 0b1da68..9168a81 100644
--- a/Puppetfile
+++ b/Puppetfile
@@ -29,3 +29,7 @@ mod "property_list_key",
 mod 'redis',
   :git => 'git://github.com/glarizza/puppet-redis',
   :ref => 'feature/debian_support'
+
+mod 'notifyme',
+  :git => 'git://github.com/glarizza/puppet-notifyme'
+

Alright, we’ve updated the production environment, now synchronize again (I’ll spare you and do it WITHOUT verbose mode):

1
[root@master1 garysawesomeenvironment]# r10k deploy environment -p

Okay, now run Puppet with the PRODUCTION environment:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snipping fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392848588'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 12.66 seconds

Beautiful, we’re synchronized!!!

Making a change to an EXISTING module in an environment

Okay, so we saw previously how to add a NEW module to an environment, but what if we already HAVE a module in an environment and we want to make an update/change to it? Well, it’s largely the same process:

  • Cut a branch to the module
  • Commit your code and push it up to the module’s repo
  • Cut a branch to the main Puppet repo
  • Push that branch up to the main Puppet repo
  • Perform an R10k synchronization to sync the environments
  • Test your changes
  • Merge the changes with the master branch of the module
  • DONE!

Let’s go back and change that notify message again, shall we?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
└(~/src/puppet-notifyme)▷ git checkout -b 'another_change'
Switched to a new branch 'another_change'

└(~/src/puppet-notifyme)▷ vim manifests/init.pp
## Make changes to the message

└(~/src/puppet-notifyme)▷ git add manifests/init.pp

└(~/src/puppet-notifyme)▷ git commit
[another_change 608166e] Change the message that already exists!
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin another_change:another_change
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 426 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
 * [new branch]      another_change -> another_change

Okay, let’s re-use ‘garysawesomeenvironment’ because I like the name, but tie it to the new ‘another_change’ branch of the ‘notifyme’ module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
└(~/src/puppet_repository)▷ git checkout garysawesomeenvironment
Switched to branch 'garysawesomeenvironment'

└(~/src/puppet_repository)▷ vim Puppetfile
## Make change to Puppetfile to tie it to 'another_change' branch

└(~/src/puppet_repository)▷ git add Puppetfile

└(~/src/puppet_repository)▷ git commit
[garysawesomeenvironment ce84a30] Tie garysawesomeenvironment to 'another_change'
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet_repository)▷ git push origin garysawesomeenvironment:garysawesomeenvironment
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 386 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/glarizza/puppet_repository.git
   89b139c..ce84a30  garysawesomeenvironment -> garysawesomeenvironment

The Puppetfile for that branch now has an entry for the ‘notifyme’ module that looks like this:

Puppetfile
1
2
3
mod "notifyme",
  :git => "git://github.com/glarizza/puppet-notifyme.git",
  :ref => 'another_change'

Okay, synchronize again!

1
[root@master1 garysawesomeenvironment]# r10k deploy environment -p

And now run Puppet in the ‘garysawesomeenvironment’ environment:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment garysawesomeenvironment
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392849521'
Notice: This changes the message that already exists!!!!
Notice: /Stage[main]/Notifyme/Notify[This changes the message that already exists!!!!]/message: defined 'message' as 'This changes the message that already exists!!!!'
Notice: Finished catalog run in 12.54 seconds

There’s the message that I changed in the ‘another_change’ branch of my ‘notifyme’ module! What’s it look like if I run in the ‘production’ environment, though?

1
2
3
4
5
6
7
8
9

ot@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392848588'
Notice: This is the changed message in the change_the_message branch
Notice: /Stage[main]/Notifyme/Notify[This is the changed message in the change_the_message branch]/message: defined 'message' as 'This is the changed message in the change_the_message branch'
Notice: Finished catalog run in 14.11 seconds

There’s the old message that’s in the ‘master’ branch of the ‘notifyme’ module (which is where the ‘production’ branch Puppetfile is pointing). To merge the changes into the production environment, we now only have to do one thing: that’s merge the changes in the ‘another_change’ branch of the ‘notifyme’ module to the ‘master’ branch – that’s it! Why? Because the Puppetfile in the production branch of the main Puppet repo (and thus the production Puppet ENVIRONMENT) is already POINTING at the master branch of the ‘notifyme’ module. Let’s do the merge:

1
2
3
4
5
6
7
8
9
10
11
12
13
└(~/src/puppet-notifyme)▷ git checkout master
Switched to branch 'master'

└(~/src/puppet-notifyme)▷ git merge another_change
Updating bc3975b..608166e
Fast-forward
 manifests/init.pp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

└(~/src/puppet-notifyme)▷ git push origin master:master
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/glarizza/puppet-notifyme.git
   bc3975b..608166e  master -> master

Another R10k synchronization is needed on the master:

1
[root@master1 garysawesomeenvironment]# r10k deploy environment -p

And now let’s run Puppet in the production environment:

1
2
3
4
5
6
7
8
[root@master1 garysawesomeenvironment]# puppet agent -t --environment production
Info: Retrieving plugin
<snip fact loading>
Info: Caching catalog for master1
Info: Applying configuration version '1392850004'
Notice: This changes the message that already exists!!!!
Notice: /Stage[main]/Notifyme/Notify[This changes the message that already exists!!!!]/message: defined 'message' as 'This changes the message that already exists!!!!'
Notice: Finished catalog run in 11.82 seconds

There’s the message that was previously in the ‘another_change’ branch that’s been merged to the ‘master’ branch (and thus is entered into the production Puppet environment).

OR, use tags

One more note – for production environments that want a BIT more stability (rather than hoping that someone follows the policy of pushing commits to a BRANCH of a module rather than pushing directly to master – by accident or otherwise – and allowing that commit to make DIRECTLY it into production), the better way is to tie all modules to some sort of release version. For modules released to the Puppet Forge, that’s a version, for modules stored in git repositories, that would be a tag. Tying all modules in your production environment (and thus production Puppetfile) to specific tags in git repositories IS a “best practice” to ensure that the code that’s executed in production has some sort of ‘safe guard’.

TL;DR: Example tied to ‘master’ branch above was demo, and not necessarily recommended for your production needs.

Holy crap, that’s a lot to take in…

Yeah, tell me about it. And, believe it or not, I’m STILL not done with everything that I want to talk about regarding R10k – there’s still more info on:

  • Using R10k with a monolithic modules repo
  • Incorporating Hiera data
  • Triggering R10k with MCollective
  • Tying R10k to CI workflow

Those will come in a later post once I have time to decide how to tackle them. Until then, this should give you more than enough information to get started with R10k in your own environment.

If you have any questions/comments/corrections, PLEASE enter them in the comments below and I’ll be happy to respond when I’m not flying from gig to gig! :) Cheers!

EDIT: 2/19/2014 – correct librarian-puppet assumption thanks to Reid Vandewiele

Building a Functional Puppet Workflow Part 2: Roles and Profiles

In my first post, I talked about writing functional component modules. Well, I didn’t really do much detailing other than pointing out key bits of information that tend to cause problems. In this post, I’ll describe the next layer to the functional Puppet module workflow.

People usually stop once they have a library of component modules (whether hand-written, taken from Github, or pulled from The Forge). The idea is that you can classify all of your nodes in site.pp, the Puppet Enterprise Console, The Foreman, or with some other ENC, so why not just declare all your classes for every node when you need them?

Because that’s a lot of extra work and opportunities for fuckups.

People recognized this, so in the EARLY days of Puppet they would create node blocks in site.pp and use inheritance to inherit from those blocks. This was the right IDEA, but probably not the best PLACE for it. Eventually, ‘Profiles’ were born.

The idea of ‘Roles and Profiles’ originally came from a piece that Craig Dunn wrote while he worked for the BBC, and then Adrien Thebo also wrote a piece that documents the same sort of pattern. So why am I writing about it a THIRD time? Well, because I feel it’s only a PIECE of an overall puzzle. The introduction of Hiera and other awesome tools (like R10k, which we will get to on the next post) still make Roles and Profiles VIABLE, but they also extend upon them.

One final note before we move on – the terms ‘Roles’ and ‘Profiles’ are ENTIRELY ARBITRARY. They’re not magic reserve words in Puppet, and you can call them whatever the hell you want. It’s also been pointed out that Craig MIGHT have misnamed them (a ROLE should be a model for an individual piece of tech, and a PROFILE should probably be a group of roles), but, like all good Puppet Labs employees – we suck at naming things.

Profiles: technology-specific wrapper classes

A profile is simply a wrapper class that groups Hiera lookups and class declarations into one functional unit. For example, if you wanted Wordpress installed on a machine, you’d probably need to declare the apache class to get Apache setup, declare an apache::vhost for the Wordpress directory, setup a MySQL database with the appropriate classes, and so on. There are a lot of components that go together when you setup a piece of technology, it’s not just a single class.

Because of this, a profile exists to give you a single class you can include that will setup all the necessary bits for that piece of technology (be it Wordpress, or Tomcat, or whatever).

Let’s look at a simple profile for Wordpress:

profiles/manifests/wordpress.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class profiles::wordpress {

  ## Hiera lookups
  $site_name               = hiera('profiles::wordpress::site_name')
  $wordpress_user_password = hiera('profiles::wordpress::wordpress_user_password')
  $mysql_root_password     = hiera('profiles::wordpress::mysql_root_password')
  $wordpress_db_host       = hiera('profiles::wordpress::wordpress_db_host')
  $wordpress_db_name       = hiera('profiles::wordpress::wordpress_db_name')
  $wordpress_db_password   = hiera('profiles::wordpress::wordpress_db_password')
  $wordpress_user          = hiera('profiles::wordpress::wordpress_user')
  $wordpress_group         = hiera('profiles::wordpress::wordpress_group')
  $wordpress_docroot       = hiera('profiles::wordpress::wordpress_docroot')
  $wordpress_port          = hiera('profiles::wordpress::wordpress_port')

  ## Create user
  group { 'wordpress':
    ensure => present,
    name   => $wordpress_group,
  }
  user { 'wordpress':
    ensure   => present,
    gid      => $wordpress_group,
    password => $wordpress_user_password,
    name     => $wordpress_group,
    home     => $wordpress_docroot,
  }

  ## Configure mysql
  class { 'mysql::server':
    root_password => $wordpress_root_password,
  }

  class { 'mysql::bindings':
    php_enable => true,
  }

  ## Configure apache
  include apache
  include apache::mod::php
  apache::vhost { $::fqdn:
    port    => $wordpress_port,
    docroot => $wordpress_docroot,
  }

  ## Configure wordpress
  class { '::wordpress':
    install_dir => $wordpress_docroot,
    db_name     => $wordpress_db_name,
    db_host     => $wordpress_db_host,
    db_password => $wordpress_db_password,
  }
}

Name your profiles according to the technology they setup

Profiles are technology-specific, so you’ll have one to setup wordpress, and tomcat, and jenkins, and…well, you get the picture. You can also namespace your profiles so that you have profiles::ssh::server and profiles::ssh::client if you want. You can even have profiles::jenkins::tomcat and profiles::jenkins::jboss or however you need to namespace according to the TECHNOLOGIES you use. You don’t need to include your environment in the profile name (a la profiles::dev::tomcat) as the bits of data that make the dev environment different from production should come from HIERA, and thus aren’t going to be different on a per-profile basis. You CAN setup profiles according to your business unit if multiple units use Puppet and have different setups (a la security::profiles::tomcat versus ops::profiles::tomcat), but the GOAL of Puppet is to have one main set of modules that every group uses (and the Hiera data being different for every group). That’s the GOAL, but I’m pragmatic enough to understand that not everywhere is a shiny, happy ‘DevOps Garden.’

Do all Hiera lookups in the profile

You’ll see that I declared variables and set their values with Hiera lookups. The profile is the place for these lookups because the profile collects all external data and declares all the classes you’ll need. In reality, you’ll USUALLY only see profiles looking up parameters and declaring classes (i.e. declaring users and groups like I did above will USUALLY be left to component classes).

I do the Hiera lookups first to make it easy to debug from where those values came. I don’t rely on ‘Automatic Parameter Lookup’ in Puppet 3.x.x because it can be ‘magic’ for people who aren’t aware of it (for people new to Puppet, it’s much easier to see a function call and trace back what it does rather than experience Puppet doing something unseen and wondering what the hell happened).

Finally, you’ll notice that my Hiera lookups have NO DEFAULT VALUES – this is BY DESIGN! For most people, their Hiera data is PROBABLY located in a separate repository as their Puppet module data. Imagine making a change to your profile to have it lookup a bit of data from Hiera, and then imagine you FORGOT to put that data into Hiera. What happens if you provide a default value to Hiera? The catalog compiles, that default value gets passed down to the component module, and gets enforced on disk. If you have good tests, you MIGHT see that the component you configured has a bit of data that’s not correct, but what if you don’t have a great post-Puppet testing workflow? Puppet will correctly set this default value, according to Puppet everything is green and worked just fine, but now your component is setup incorrectly. That’s one of the WORST failures – the ones that you don’t catch. Now, imagine you DON’T provide a default value. In THIS case, Puppet will raise a compilation error because a Hiera lookup didn’t return a value. You’ll catch your error before anything gets pushed to Production and you can catch the screwup. This is a MUCH better solution.

Use parameterized class declarations and explicitly pass values you care about

The parameterized class declaration syntax can be dangerous. The difference between the include function and the parameterized class syntax is that the include function is idempotent. You can do the following in a Puppet manifest, and Puppet doesn’t raise an error:

1
2
3
include apache
include apache
include apache

This is because the include function checks to see if the class is in the catalog. If it ISN’T, then it adds it. If it IS, then it exits cleanly. The include function is your pal.

Consider THIS manifest:

1
2
3
class { 'apache': }
include apache
include apache

Does this work? Yep. The parameterized class syntax adds the class to the catalog, the include function detects this and exits cleanly twice. What about THIS manifest:

1
2
3
include apache
class { 'apache': }
include apache

Does THIS work? Nope! Puppet raises a compilation error because a class was declared more than once in a catalog. Why? Well, consider that Puppet is ‘declarative’…all the way up until it isn’t. Puppet’s PARSER reads from the top of the file to the bottom of the file, and we have a single-pass parser when it comes to things like setting variables and declaring classes. When the parser hits the first include function, it adds the class to the catalog. The parameterized class syntax, however, is a honey badger: it doesn’t give a shit. It adds a class to the catalog regardless of whether it already exists or not. So why would we EVER use the parameterized class declaration syntax? We need to use it because the include function doesn’t allow you to pass parameters when you declare a class.

So wait – why did I spend all this time explaining why the parameterized class syntax is more dangerous than the include function ONLY to recommend its use in profiles? For two reasons:

  • We need to use it to pass parameters to classes
  • We’re wrapping its use in a class that we can IN TURN declare with the include function

Yes, we can get the best of BOTH worlds, the ability to pass parameters and the use of our pal the include function, with this wrapper class. We’ll see the latter usage when we come to roles, but for now let’s focus on passing parameter values.

In the first section, we set variables with Hiera lookups, now we can pass those variables to classes we’re declaring with the parameterized class syntax. This allows the declaration of the class to be static, but the parameters we pass to that class to change according to the Hiera hierarchy. We’ve explicitly called the hiera function, so it makes it easier to debug, and we’re explicitly passing parameter values so we know definitively which parameters are being passed (and thus are overriding default values) to the component module. Finally, since our component modules do NOT use Hiera at all, we can be sure that if we’re not passing a parameter that it’s getting its value from the default set in the module’s ::params class.

Everything we do here is meant to make things easier to debug when it’s 3am and things aren’t working. Any asshole can do crazy shit in Puppet, but a seasoned sysadmin writes their code for ease of debugging during 3am pages.

An annoying Puppet bug – top-level class declarations and profiles

Oh, ticket 2053, how terrible are you? This is one of those bug numbers that I can remember by heart (like 8040 and 86). Puppet has the ability to do ‘relative namespacing’, which allows you to declare a variable called $port in a class called $apache and refer to it as $port instead of fully-namespacing the variable, and thus having to call it $apache::port inside the apache class. It’s a shortcut – you can STILL refer to the variable as $apache::port in the class – but it comes in handy. The PROBLEM occurs when you create a profile, as we did above, called profiles::wordpress and you try to declare a class called wordpress. If you do the following inside the profiles::wordpress class, what class is being declared:

1
include wordpress

If you think you’re declaring a wordpress class from within a wordpress module in your Puppet modulepath, you would be wrong. Puppet ACTUALLY thinks you’re trying to declare profiles::wordpress because you’re INSIDE the profiles::wordpress class and it’s doing relative namespacing (i.e. in the same way you refer to $port and ACTUALLY mean $apache::port it thinks you’re referring to wordpress and ACTUALLY mean profiles::wordpress.

Needless to say, this causes LOTS of confusion.

The solution here is to declare a class called ::wordpress which tells Puppet to go to the top-level namespace and look for a module called wordpress which has a top-level class called wordpress. It’s the same reason that we refer to Facter Fact values as $::osfamily instead of $osfamily in class definitions (because you can declare a local variable called $osfamily in your class). This is why in the profile above you see this:

1
2
3
4
5
6
class { '::wordpress':
  install_dir => $wordpress_docroot,
  db_name     => $wordpress_db_name,
  db_host     => $wordpress_db_host,
  db_password => $wordpress_db_password,
}

When you use profiles and roles, you’ll need to do this namespacing trick when declaring classes because you’re frequently going to have a profile::<sometech> that will declare the <sometech> top-level class.

Roles: business-specific wrapper classes

How do you refer to your machines? When I ask you about that cluster over there, do you say “Oh, you mean the machines with java 1.6, apache, mysql, etc…”? I didn’t think so. You usually have names for them, like the “internal compute cluster” or “app builder nodes” or “DMZ repo machines” or whatever. These names are your Roles. Roles are just the mapping of your machine’s names to the technology that should be ON them. In the past we had descriptive hostnames that afforded us a code for what the machine ‘did’ – roles are just that mapping for Puppet.

Roles are namespaced just like profiles, but now it’s up to your organization to fill in the blanks. Some people immediately want to put environments into the roles (a la roles::uat::compute_cluster), but that’s usually not necessary (as MOST LIKELY the compute cluster nodes have the SAME technology on them when they’re in dev versus when they’re in prod, it’s just the DATA – like database names, VIP locations, usernames/passwords, etc – that’s different. Again, these data differences will come from Hiera, so there should be no reason to put the environment name in your role). You still CAN put the environment name in the role if it makes you feel better, but it’ll probably be useless.

Roles ONLY include profiles

So what exactly is in the role wrapper class? That depends on what technology is on the node that defines that role. What I can tell you for CERTAIN is that roles should ONLY use the include function and should ONLY include profiles. What does this give us? This gives us our pal the include function back! You can include the same profile 100 times if you want, and Puppet only puts it in the catalog once.

Every node is classified with one role. Period.

The beautiful thing about roles and profiles is that the GOAL is that you should be able to classify a node with a SINGLE role and THAT’S IT. This makes classification simple and static – the node gets its role, the role includes profiles, profiles call out to Hiera for data, that data is passed to component modules, and away we go. Also, since classification is static, you can use version control to see what changes were introduced to the role (i.e. what profiles were added or removed). In my opinion, if you need to apply more than one role to a node, you’ve introduced a new role (see below).

Roles CAN use inheritance…if you like

I’ve seen people implement roles a couple of different ways, and one of them is to use inheritance to build a catalog. For example, you can define a base roles class that includes something like a base security profile (i.e. something that EVERY node in your infrastructure should have). Moving down the line, you COULD namespace according to function like roles::app for your application server machines. The roles::app class could inherit from the roles class (which gets the base security profile), and could then include the profiles necessary to setup an application server. Next, you could subclass down to roles::app::site_foo for an application server that supports some site in your organization. That class inherits from the roles::app class, and then adds profiles that are specific to that site (maybe they use Jboss instead of Tomcat, and thus that’s where the differentiation occurs). This is great because you don’t have a lot of repeated use of the include function, but it also makes it hard to definitively look at a specific role to see exactly what’s being declared (i.e. all the profiles). You have to weigh what you value more: less typing or greater visibility. I will err on the side of greater visibility (just due to that whole 3am outage thing), but it’s up to you to decide what to optimize for.

A role similar, yet different, from another role is: a new role

EVERYBODY says to me “Gary, I have this machine that’s an AWFUL LOT like this role over here, but…it’s different.” My answer to them is: “Great, that’s another role.” If the thing that’s different is data (i.e. which database to connect to, or what IP address to route traffic through), then that difference should be put in HIERA and the classification should remain the same. If that difference is technology-specific (i.e. this server uses JBoss instead of Tomcat) then first look and see if you can isolate how you know this machine is different (maybe it’s on a different subnet, maybe it’s at a different location, something like that). If you can figure that out and write a Fact for it (or use similar conditional logic to determine this logically), then you can just drop that conditional logic in your role and let it do the heavy lifting. If, in the end, this bit of data is totally arbitrary, then you’ll need to create another role (perhaps a subclass using the above namespacing) and assign it to your node.

The hardest thing about this setup is naming your roles. Why? Every site is different. It’s hard for me to account for differences in your setup because your workplace is dysfunctional (seriously).

Review: what does this get you?

Let’s walk through every level of this setup from the top to the bottom and see what it gets you. Every node is classified to a single role, and, for the most part, that classification isn’t going to change. Now you can take all the extra work off your classifier tool and put it back into the manifests (that are subject to version control, so you can git blame to your heart’s content and see who last changed the role/profile). Each role is going to include one or more profile, which gives us the added idempotent protection of the include function (of course, if profiles have collisions with classes you’ll have to resolve those. Say one or more profiles tries to include an apache class – simply break that component out into a separate profile, extract the parameters from Hiera, and include that profile at a higher level). Each profile is going to do Hiera lookups which should give you the ability to provide different data for different host types (i.e. different data on a per-environment level, or however you lay out your Hiera hierarchy), and that data will be passed directly to class that is declared. Finally, each component module will accept parameters as variables internal to that module, default parameters/variables to sane values in the ::params class, and use those variables when declaring each resource throughtout its classes.

  • Roles abstract profiles
  • Profiles abstract component modules
  • Hiera abstracts configuration data
  • Component modules abstract resources
  • Resources abstract the underlying OS implementation

Choose your level of comfortability

The roles and profiles pattern also buys you something else – the ability for less-skilled and more-skilled Puppet users to work with the same codebase. Let’s say you use some GUI classifier (like the Puppet Enterprise Console), someone who’s less skilled at Puppet looks and sees that a node is classified with a certain role, so they open the role file and see something like this:

1
2
3
include profiles::wordpress
include profiles::tomcat
include profiles::git::repo_server

That’s pretty legible, right? Someone who doesn’t regularly use Puppet can probably make a good guess as to what’s on the machine. Need more information? Open one of the profiles and look specifically at the classes that are being declared. Need to know the data being passed? Jump into Hiera. Need to know more information? Dig into each component module and see what’s going on there.

When you have everything abstracted correctly, you can have developers providing data (like build versions) to Hiera, junior admins grouping nodes for classification, more senior folk updating profiles, and your best Puppet people creating/updating component modules and building plugins like custom facts/functions/whatever.

Great! Now go and refactor…

If you’ve used Puppet for more than a month, you’re probably familiar with the “Oh shit, I should have done it THAT way…let me refactor this” game. I know, it sucks, and we at Puppet Labs haven’t been shy of incorporating something that we feel will help people out (but will also require some refactoring). This pattern, though, has been in use by the Professional Services team at Puppet Labs for over a year without modification. I’ve used this on sites GREAT and small, and every site with which I’ve consulted and implemented this pattern has been able to both understand its power and derive real value within a week. If you’re contemplating a refactor, you can’t go wrong with Roles and Profiles (or whatever names you decide to use).

Building a Functional Puppet Workflow Part 1: Module Structure

Working as a professional services engineer for Puppet Labs, my life consists almost entirely of either correcting some of the worst code atrocities you’ve seen in your life, or helping people get started with Puppet so that they don’t need to call us again due to: A.) Said code atrocities or B.) Refactor the work we JUST helped them start. It wasn’t ALWAYS like this – I can remember some of my earliest gigs, and I almost feel like I should go revisit them if only to correct some of the previous ‘best practices’ that didn’t quite pan out.

This would be exactly why I’m wary of ‘Best Practices’ – because one person’s ‘Best Practice’ is another person’s ‘What the fuck did you just do?!’

Having said that, I’m finding myself repeating a story over and over again when I train/consult, and that’s the story of ‘The Usable Puppet Workflow.’ Everybody wants to know ‘The Right Way™’, and I feel like we finally have a way that survives a reasonable test of time. I’ve been promoting this workflow for over a year (which is a HELL of a long time in Startup time), and I’ve yet to really see an edge case it couldn’t handle.

(If you’re already savvy: yes, this is the Roles and Profiles talk)

I’ll be breaking this workflow down into separate blog posts for every component, and, as always, your comments are welcome…

It all starts with the component module

The first piece of a functional Puppet deployment starts with what we call ‘component modules’. Component modules are the lowest level in your deployment, and are modules that configure specific pieces of technology (like apache, ntp, mysql, and etc…). Component modules are well-encapsulated, have a reasonable API, and focus on doing small, specific things really well (i.e. the *nix way).

I don’t want to write thousands of words on building component modules because I feel like others have done this better than I. As examples, check out RI’s Post on a simple module structure, Puppet Labs’ very own docs on the subject, and even Alessandro’s Puppetconf 2012 session. Instead, I’d like to provide some pointers on what I feel makes a good component module, and some ‘gotchas’ we’ve noticed.

Parameters are your API

In the current world of Puppet, you MUST define the parameters your module will accept in the Puppet DSL. Also, every parameter MUST ultimately have a value when Puppet compiles the catalog (whether by explicitly passing this parameter value when declaring the class, or by assuming a default value). Yes, it’s funny that, when writing a Puppet class, if you typo a VARIABLE Puppet will not alert you to this (in a NON use strict-ian sort of approach) and will happily accept a variable in an undefined state, but the second you don’t pass a value to your class parameter you’re in for a rude compilation error. This is the way of Puppet classes at the time of this writing, so you’re going to see Puppet classes with LINES of defined parameters. I expect this to change in the future (please let this change in the near future), but for now, it’s a necessary evil.

The parameters you expose to your top-level class (i.e. given class names like apache and apache::install, I’m talking specifically about apache) should be treated as an API to your module. IDEALLY, they’re the ONLY THING that a user needs to modify when using your module. Also, whenever possible, it should be the case that a user need ONLY interact with the top-level class when using your module (of course, defined resource types like apache::vhost are used on an ad-hoc basis, and thus are the exception here).

Inherit the ::params class

We’re starting to make enemies at this point. It’s been a convention for modules to use a ::params class to assign values to all variables that are going to be used for all classes inside the module. The idea is that the ::params class is the one-stop-shop to see where a variable is set. Also, to get access to a variable that’s set in a Puppet class, you have to declare the class (i.e. use the include() function or inherit from that class). When you declare a class that has both variables AND resources, those resources get put into the catalog, which means that Puppet ENFORCES THE STATE of those resources. What if you only needed a variable’s value and didn’t want to enforce the rest of the resources in that class? There’s no good way in Puppet to do that. Finally, when you inherit from a class in Puppet that has assigned variable values, you ALSO get access to those variables in the parameter definition section of your class (i.e. the following section of the class:

class apache (
  $port = $apache::params::port,
  $user = $apache::params::user,
) inherits apache::params {

See how I set the default value of $apache::port to $apache::params::port? I could only access the value of the variable $apache::params::port in that section by inheriting from the apache::params class. I couldn’t insert include apache::params below that section and be allowed access to the variable up in the parameter defaults section (due to the way that Puppet parses classes).

FOR THIS REASON, THIS IS THE ONLY RECOMMENDED USAGE OF INHERITANCE IN PUPPET!

We do NOT recommend using inheritance anywhere else in Puppet and for any other reason because there are better ways to achieve what you want to do INSTEAD of using inheritance. Inheritance is a holdover from a scarier, more lawless time.

NOTE: Data in Modules – There’s a ‘Data in Modules’ pattern out there that attempts to eliminate the ::params class. I wrote about it in a previous post, and I recommend you read that post for more info (it’s near the bottom).

Do NOT do Hiera lookups in your component modules!

This is something that’s really only RECENTLY been pushed. When Hiera was released, we quickly recognized that it would be the answer to quite a few problems in Puppet. In the rush to adopt Hiera, many people started adding Hiera calls to their modules, and suddenly you had ‘Hiera-compatible’ modules out there. This caused all kinds of compatibility problems, and it was largely because there wasn’t a better module structure and workflow by which to integrate Hiera. The pattern that I’ll be pushing DOES INDEED use Hiera, BUT it confines all Hiera calls to a higher-level wrapper class we call a ‘profile’. The reasons for NOT using Hiera in your module are:

  • By doing Hiera calls at a higher level, you have a greater visibility on exactly what parameters were set by Hiera and which were set explicitly or by default values.
  • By doing Hiera calls elsewhere, your module is backwards-compatible for those folks who are NOT using Hiera

Remember – your module should just accept a value and use it somewhere. Don’t get TOO smart with your component module – leave the logic for other places.

Keep your component modules generic

We always get asked “How do I know if I’m writing a good module?” We USED to say “Well, does it work?” (and trust me, that was a BIG hurdle). Now, with data separation models out there like Hiera, I have a couple of other questions that I ask (you know, BEYOND asking if it compiles and actually installs the thing it’s supposed to install). The best way I’ve found to determine if your module is ‘generic enough’ is if I asked you TODAY to give me your module, would you give it to me, or would you be worried that there was some company-specific data locked in there? If you have company-specific data in your module, then you need to refactor the module, store the data in Hiera, and make your module more generic/reusable. Also, does your module focus on installing one piece of technology, or are you declaring packages for shared libraries or other components (like gcc, apache, or other common components)? You’re not going to win any prizes for having the biggest, most monolithic module out there. Rather, if your module is that large and that complex, you’re going to have a hell of a time debugging it. Err on the side of making your modules smaller and more task-specific. So what if you end up needing to declare 4 classes where you previously declared 1? In the roles and profiles pattern we will show you in the next blog post, you can abstract that away ANYHOW.

Don’t play the “what if” game

I’ve had more than a couple of gigs where the customer says something along the lines of “What if we need to introduce FreeBSD/Solaris/etc… nodes into our organization, shouldn’t I account for them now?” This leads more than a few people down a path of entirely too-complex modules that become bulky and unwieldy. Yes, your modules should be formatted so that you can simply add another case in your ::params class for another OS’s parameters, and yes, your module should be formatted so that your ::install or ::config class can handle another OS, but if you currently only manage Redhat, and you’ve only EVER managed Redhat, then don’t start adding Debian parameters RIGHT NOW just because you’re afraid you might inherit Ubuntu machines. The goal of Puppet is to automate the tasks that eat up the MAJORITY of your time so you can focus on the edge cases that really demand your time. If you can eventually automate those edge cases, then AWESOME! Until then, don’t spend the majority of your time trying to automate the edge cases only to drown under the weight of deadlines from simple work that you COULD have already automated (but didn’t, because you were so worried about the exceptions)!

Store your modules in version control

This should go without saying, but your modules should be stored in version control (a la git, svn, hg, whatever). We tend to prefer git due to its lightweight branching and merging (most of our tooling and solutions will use git because we’re big git users), but you’re free to use whatever you want. The bigger question is HOW to store your modules in version control. There are usually two schools of thought:

  • One repository per module
  • All modules in a single repository

Each model has its pros and cons, but we tend to recommend one module per repository for the following reasons:

  • Individual repos mean individual module development histories
  • Most VCS solutions don’t have per-folder ACLs for a single repositories; having multiple repos allows per-module security settings.
  • With the one-repository-per-module solution, modules you pull down from the Forge (or Github) must be committed to your repo. Having multiple repositories for each module allow you to keep everything separate

NOTE: This becomes important in the third blog post in the series when we talk about moving changes to each Puppet Environment, but it’s important to introduce it NOW as a ‘best practice’. If you use our recommended module/environment solution, then one-module-per-repo is the best practice. If you DON’T use our solution, then the single repository per for all modules will STILL work, but you’ll have to manage the above issues. Also note that even if you currently have every module in a single repository, you can STILL use our solution in part 3 of the series (you’ll just need to perform a couple of steps to conform).

Best practices are shit

In general, ‘best practices’ are only recommended if they fit into your organizational workflow. The best and worst part of Puppet is that it’s infinitely customizable, so ‘best practices’ will invariably be left wanting for a certain subset of the community. As always, take what I say under consideration; it’s quite possible that I could be entirely full of shit.