Shit Gary Says

...things I don't want to forget

Building a Functional Puppet Workflow Part 1: Module Structure

Working as a professional services engineer for Puppet Labs, my life consists almost entirely of either correcting some of the worst code atrocities you’ve seen in your life, or helping people get started with Puppet so that they don’t need to call us again due to: A.) Said code atrocities or B.) Refactor the work we JUST helped them start. It wasn’t ALWAYS like this – I can remember some of my earliest gigs, and I almost feel like I should go revisit them if only to correct some of the previous ‘best practices’ that didn’t quite pan out.

This would be exactly why I’m wary of ‘Best Practices’ – because one person’s ‘Best Practice’ is another person’s ‘What the fuck did you just do?!’

Having said that, I’m finding myself repeating a story over and over again when I train/consult, and that’s the story of ‘The Usable Puppet Workflow.’ Everybody wants to know ‘The Right Way™’, and I feel like we finally have a way that survives a reasonable test of time. I’ve been promoting this workflow for over a year (which is a HELL of a long time in Startup time), and I’ve yet to really see an edge case it couldn’t handle.

(If you’re already savvy: yes, this is the Roles and Profiles talk)

I’ll be breaking this workflow down into separate blog posts for every component, and, as always, your comments are welcome…

It all starts with the component module

The first piece of a functional Puppet deployment starts with what we call ‘component modules’. Component modules are the lowest level in your deployment, and are modules that configure specific pieces of technology (like apache, ntp, mysql, and etc…). Component modules are well-encapsulated, have a reasonable API, and focus on doing small, specific things really well (i.e. the *nix way).

I don’t want to write thousands of words on building component modules because I feel like others have done this better than I. As examples, check out RI’s Post on a simple module structure, Puppet Labs’ very own docs on the subject, and even Alessandro’s Puppetconf 2012 session. Instead, I’d like to provide some pointers on what I feel makes a good component module, and some ‘gotchas’ we’ve noticed.

Parameters are your API

In the current world of Puppet, you MUST define the parameters your module will accept in the Puppet DSL. Also, every parameter MUST ultimately have a value when Puppet compiles the catalog (whether by explicitly passing this parameter value when declaring the class, or by assuming a default value). Yes, it’s funny that, when writing a Puppet class, if you typo a VARIABLE Puppet will not alert you to this (in a NON use strict-ian sort of approach) and will happily accept a variable in an undefined state, but the second you don’t pass a value to your class parameter you’re in for a rude compilation error. This is the way of Puppet classes at the time of this writing, so you’re going to see Puppet classes with LINES of defined parameters. I expect this to change in the future (please let this change in the near future), but for now, it’s a necessary evil.

The parameters you expose to your top-level class (i.e. given class names like apache and apache::install, I’m talking specifically about apache) should be treated as an API to your module. IDEALLY, they’re the ONLY THING that a user needs to modify when using your module. Also, whenever possible, it should be the case that a user need ONLY interact with the top-level class when using your module (of course, defined resource types like apache::vhost are used on an ad-hoc basis, and thus are the exception here).

Inherit the ::params class

We’re starting to make enemies at this point. It’s been a convention for modules to use a ::params class to assign values to all variables that are going to be used for all classes inside the module. The idea is that the ::params class is the one-stop-shop to see where a variable is set. Also, to get access to a variable that’s set in a Puppet class, you have to declare the class (i.e. use the include() function or inherit from that class). When you declare a class that has both variables AND resources, those resources get put into the catalog, which means that Puppet ENFORCES THE STATE of those resources. What if you only needed a variable’s value and didn’t want to enforce the rest of the resources in that class? There’s no good way in Puppet to do that. Finally, when you inherit from a class in Puppet that has assigned variable values, you ALSO get access to those variables in the parameter definition section of your class (i.e. the following section of the class:

class apache (
  $port = $apache::params::port,
  $user = $apache::params::user,
) inherits apache::params {

See how I set the default value of $apache::port to $apache::params::port? I could only access the value of the variable $apache::params::port in that section by inheriting from the apache::params class. I couldn’t insert include apache::params below that section and be allowed access to the variable up in the parameter defaults section (due to the way that Puppet parses classes).

FOR THIS REASON, THIS IS THE ONLY RECOMMENDED USAGE OF INHERITANCE IN PUPPET!

We do NOT recommend using inheritance anywhere else in Puppet and for any other reason because there are better ways to achieve what you want to do INSTEAD of using inheritance. Inheritance is a holdover from a scarier, more lawless time.

NOTE: Data in Modules – There’s a ‘Data in Modules’ pattern out there that attempts to eliminate the ::params class. I wrote about it in a previous post, and I recommend you read that post for more info (it’s near the bottom).

Do NOT do Hiera lookups in your component modules!

This is something that’s really only RECENTLY been pushed. When Hiera was released, we quickly recognized that it would be the answer to quite a few problems in Puppet. In the rush to adopt Hiera, many people started adding Hiera calls to their modules, and suddenly you had ‘Hiera-compatible’ modules out there. This caused all kinds of compatibility problems, and it was largely because there wasn’t a better module structure and workflow by which to integrate Hiera. The pattern that I’ll be pushing DOES INDEED use Hiera, BUT it confines all Hiera calls to a higher-level wrapper class we call a ‘profile’. The reasons for NOT using Hiera in your module are:

  • By doing Hiera calls at a higher level, you have a greater visibility on exactly what parameters were set by Hiera and which were set explicitly or by default values.
  • By doing Hiera calls elsewhere, your module is backwards-compatible for those folks who are NOT using Hiera

Remember – your module should just accept a value and use it somewhere. Don’t get TOO smart with your component module – leave the logic for other places.

Keep your component modules generic

We always get asked “How do I know if I’m writing a good module?” We USED to say “Well, does it work?” (and trust me, that was a BIG hurdle). Now, with data separation models out there like Hiera, I have a couple of other questions that I ask (you know, BEYOND asking if it compiles and actually installs the thing it’s supposed to install). The best way I’ve found to determine if your module is ‘generic enough’ is if I asked you TODAY to give me your module, would you give it to me, or would you be worried that there was some company-specific data locked in there? If you have company-specific data in your module, then you need to refactor the module, store the data in Hiera, and make your module more generic/reusable. Also, does your module focus on installing one piece of technology, or are you declaring packages for shared libraries or other components (like gcc, apache, or other common components)? You’re not going to win any prizes for having the biggest, most monolithic module out there. Rather, if your module is that large and that complex, you’re going to have a hell of a time debugging it. Err on the side of making your modules smaller and more task-specific. So what if you end up needing to declare 4 classes where you previously declared 1? In the roles and profiles pattern we will show you in the next blog post, you can abstract that away ANYHOW.

Don’t play the “what if” game

I’ve had more than a couple of gigs where the customer says something along the lines of “What if we need to introduce FreeBSD/Solaris/etc… nodes into our organization, shouldn’t I account for them now?” This leads more than a few people down a path of entirely too-complex modules that become bulky and unwieldy. Yes, your modules should be formatted so that you can simply add another case in your ::params class for another OS’s parameters, and yes, your module should be formatted so that your ::install or ::config class can handle another OS, but if you currently only manage Redhat, and you’ve only EVER managed Redhat, then don’t start adding Debian parameters RIGHT NOW just because you’re afraid you might inherit Ubuntu machines. The goal of Puppet is to automate the tasks that eat up the MAJORITY of your time so you can focus on the edge cases that really demand your time. If you can eventually automate those edge cases, then AWESOME! Until then, don’t spend the majority of your time trying to automate the edge cases only to drown under the weight of deadlines from simple work that you COULD have already automated (but didn’t, because you were so worried about the exceptions)!

Store your modules in version control

This should go without saying, but your modules should be stored in version control (a la git, svn, hg, whatever). We tend to prefer git due to its lightweight branching and merging (most of our tooling and solutions will use git because we’re big git users), but you’re free to use whatever you want. The bigger question is HOW to store your modules in version control. There are usually two schools of thought:

  • One repository per module
  • All modules in a single repository

Each model has its pros and cons, but we tend to recommend one module per repository for the following reasons:

  • Individual repos mean individual module development histories
  • Most VCS solutions don’t have per-folder ACLs for a single repositories; having multiple repos allows per-module security settings.
  • With the one-repository-per-module solution, modules you pull down from the Forge (or Github) must be committed to your repo. Having multiple repositories for each module allow you to keep everything separate

NOTE: This becomes important in the third blog post in the series when we talk about moving changes to each Puppet Environment, but it’s important to introduce it NOW as a ‘best practice’. If you use our recommended module/environment solution, then one-module-per-repo is the best practice. If you DON’T use our solution, then the single repository per for all modules will STILL work, but you’ll have to manage the above issues. Also note that even if you currently have every module in a single repository, you can STILL use our solution in part 3 of the series (you’ll just need to perform a couple of steps to conform).

Best practices are shit

In general, ‘best practices’ are only recommended if they fit into your organizational workflow. The best and worst part of Puppet is that it’s infinitely customizable, so ‘best practices’ will invariably be left wanting for a certain subset of the community. As always, take what I say under consideration; it’s quite possible that I could be entirely full of shit.

Seriously, What Is This Provider Doing?

Clarke’s third law states: “Any sufficiently advanced technology is indistinguishable from magic.” In the case of Ruby and Puppet provider interaction, I’m inclined to believe it. If you want proof, take a look at some of the native Puppet types – no amount of ‘Expecto Patronum’ will free you from the Ruby metaprogramming dementors that hover around lib/puppet/provider/exec-land.

In my first post tackling Puppet types and providers, I introduced the concept of Puppet types and the utility they provide. In the second post, I brought you to the great plain of Puppet providers and introduced the core methods necessary for creating a very basic Puppet provider with a single property (HINT: if you’ve not read either of those posts, or you’ve never dealt with basic types and providers, you might want to stop here and read up a bit on the topics). The problems with a provider like the one created in that post were:

  • puppet resource support wasn’t implemented, so you couldn’t query for existing instances of the type on the system (and their corresponding values)
  • The getter method would be called for EVERY instance of the type on the system, which would mean shelling-out multiple times during a run
  • Ditto for the setter method (if changes to multiple instances of the type were necessary)
  • That type was VERY basic (i.e. ensurable with a single property)

Unfortunately, when most of us have the need of a Puppet type and provider, we usually require multiple properties and reasonably complex system interaction. When it comes to creating both a getter and a setter method for every property (including the potential performance hit that could come from shelling-out many times during a Puppet run), ain’t nobody got time for that. And finally, puppet resource is a REALLY handy tool for querying the current state of your resources on a system. These problems all have solutions, but up until recently there was just one more problem:

Good luck finding documentation for those solutions.

NOTE: The Puppet Types and Providers book written by Nan and Dan is a great resource that provides a bit of a deeper dive than I’ll be doing in this post – DO check it out if you want to know more

Something, something, puppet resource

The puppet resource command (or ralsh, as it used to be known), is a very handy command for querying a system and returning the current state of resources for a specific Puppet type. Try it out if you never have (note that the following is being run on CentOS 6.4):

[root@linux ~]# puppet resource user
user { 'abrt':
  ensure           => 'present',
  gid              => '173',
  home             => '/etc/abrt',
  password         => '!!',
  password_max_age => '-1',
  password_min_age => '-1',
  shell            => '/sbin/nologin',
  uid              => '173',
}
user { 'adm':
  ensure           => 'present',
  comment          => 'adm',
  gid              => '4',
  groups           => ['sys', 'adm'],
  home             => '/var/adm',
  password         => '*',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/sbin/nologin',
  uid              => '3',
}
< ... and more users below ... >

The puppet resource command returns a list of all users on the system and their current property values (note you can only see the password hash if you’re running Puppet with sufficient privileges). You can even query puppet resource for the values of a specific resource:

[root@gary ~]# puppet resource user glarizza
user { 'glarizza':
  ensure           => 'present',
  gid              => '502',
  home             => '/home/glarizza',
  password         => '$1$hsUuCygh$kgLKG5epuRaXHMX5KmxrL1',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/bin/bash',
  uid              => '502',
}

puppet resource seems magical, and you might think that if you create a custom type and sync it to your machine then puppet resource will automatically work for you.

And you would be wrong.

puppet resource will only work if you’ve implemented a special method in your provider called self.instances.

self.instances

The self.instances method is pretty sparsely documented, so let’s go straight to the source…code, that is:

lib/puppet/provider.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  # Returns a list of system resources (entities) this provider may/can manage.
  # This is a query mechanism that lists entities that the provider may manage on a given system. It is
  # is directly used in query services, but is also the foundation for other services; prefetching, and
  # purging.
  #
  # As an example, a package provider lists all installed packages. (In contrast, the File provider does
  # not list all files on the file-system as that would make execution incredibly slow). An implementation
  # of this method should be made if it is possible to quickly (with a single system call) provide all
  # instances.
  #
  # An implementation of this method should only cache the values of properties
  # if they are discovered as part of the process for finding existing resources.
  # Resource properties that require additional commands (than those used to determine existence/identity)
  # should be implemented in their respective getter method. (This is important from a performance perspective;
  # it may be expensive to compute, as well as wasteful as all discovered resources may perhaps not be managed).
  #
  # An implementation may return an empty list (naturally with the effect that it is not possible to query
  # for manageable entities).
  #
  # By implementing this method, it is possible to use the `resources´ resource type to specify purging
  # of all non managed entities.
  #
  # @note The returned instances are instance of some subclass of Provider, not resources.
  # @return [Array<Puppet::Provider>] a list of providers referencing the system entities
  # @abstract this method must be implemented by a subclass and this super method should never be called as it raises an exception.
  # @raise [Puppet::DevError] Error indicating that the method should have been implemented by subclass.
  # @see prefetch
  def self.instances
    raise Puppet::DevError, "Provider #{self.name} has not defined the 'instances' class method"
  end

You’ll find that method around lines 348 – 377 of the lib/puppet/provider.rb file in Puppet’s source code (as of this writing, which is a Friday… on a flight from DC to Seattle). To summarize, implementing self.instances in your provider means that you need to return an array of provider instances that have been discovered on the current system and all the current property values (we call these values the ‘is’ values for the properties, since each value IS the current value of the property on the system). It’s recommended to only implement self.instances if you can gather all resource property values in a reasonably ‘cheap’ manner (i.e. a single system call, read from a single file, or some similar low-IO means). Implementing self.instances not only gives you the ability to run puppet resource (which also affords you a quick-and-dirty way of testing your provider without creating unit tests by simply running puppet resource in debug mode and checking the output), but it also allows the ‘resources’ resource to work its magic (If you’ve never heard of the ‘resources’ resource, check this link for more information on this terribly/awesomely named resource type).

An important note about scope and self.instances

The self.instances method is a method of the PROVIDER, which is why it is prefixed with self. Even though it may be located in the provider file itself, and even though it sits among other methods like create, exists?, and destroy (which are methods of the INSTANCE of the provider), it does NOT have the ability to directly access or call those methods. It DOES have the ability to access other methods of the provider directly (i.e. other methods prefixed with self.). This means that if you were to define a method like:

1
2
3
def self.proxy_type
  'web'
end

You could access that directly from self.instances by simply calling it:

1
type_of_proxy = proxy_type()

Let’s say you had a method of the INSTANCE of the provider, like so:

1
2
3
def system_type
  'OS X'
end

You COULD NOT access this method from self.instances directly (there are always hacky ways around EVERYTHING in Ruby, sure, but there is no easy/straightforward way to access this method).

And here’s where it gets confusing…

Methods of the INSTANCE of the provider CAN access provider methods directly. Given our previous example, what if the system_type method wanted to access self.proxy_type for some reason? It could be done like so:

1
2
3
4
def system_type
  type_of_proxy = self.class.proxy_type()
  'OS X'
end

A method of the instance of the provider can access provider methods by simply calling the class method on itself (which returns the provider object). This is a one-way street for method creation that needs to be heeded when designing your provider.

Building a provider that uses self.instances (or: more Mac problems)

In the previous two posts on types/providers, I created a type and provider for managing bypass domains for network proxies on OS X. For this post, let’s create a provider for actually MANAGING the proxy settings for a given network interface. Here’s a quick type for managing a web proxy on a network interface on OS X:

puppet-mac_proxy/lib/puppet/type/mac_web_proxy.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Puppet::Type.newtype(:mac_web_proxy) do
  desc "Puppet type that models a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
    munge do |value|
      value.downcase
    end
    def insync?(is)
      is.downcase == should.downcase
    end
  end

  newproperty(:proxy_server) do
    desc "Proxy Server setting for the interface"
  end

  newparam(:authenticated_username) do
    desc "Username for proxy authentication"
  end

  newparam(:authenticated_password) do
    desc "Password for proxy authentication"
  end

  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end
end

This type has three properties, is ensurable, and a namevar called ‘name’. As for the provider, let’s start with self.instances and get the web proxy values for all interfaces. To do that we’re going to need to know how to get a list of all network interfaces, and also how to get the current proxy state for every interface. Fortunately, both of those tasks are accomplished with the networksetup binary:

▷ networksetup -listallnetworkservices
An asterisk (*) denotes that a network service is disabled.
Bluetooth DUN
Display Ethernet
Ethernet
FireWire
Wi-Fi
iPhone USB
Bluetooth PAN

▷ networksetup -getwebproxy Ethernet
Enabled: No
Server: proxy.corp.net
Port: 1234
Authenticated Proxy Enabled: 0

Cool, so one binary will do both tasks and they’re REASONABLY low-cost to run.

Helper methods

To keep things separated and easier to test, let’s create separate helper methods for each task. Since these methods are going to be called by self.instances, they will be provider methods.

The first method will simply return an array of network interfaces:

1
2
3
4
5
def self.get_list_of_interfaces
  interfaces = networksetup('-listallnetworkservices').split("\n")
  interfaces.shift
  interfaces.sort
end

Remember from above that the networksetup -listallnetworkservices command returns an info line before each interface, so this code strips that line off and returns a sorted list of interfaces based on a one-line-per-interface assumption.

The next method we need will accept a network interface name as an argument, will run the networksetup -getwebproxy (interface) command, and will use its output to return all the current property values (including the ensure value) for every instance of the type on the system (i.e. every interface’s proxy settings and whether the proxy is enabled, which means the resource is ensured as ‘present’, or disabled, which means the resource is ensured as ‘absent’.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def self.get_proxy_properties(int)
  interface_properties = {}

  begin
    output = networksetup(['-getwebproxy', int])
  rescue Puppet::ExecutionFailure => e
    raise Puppet::Error, "#mac_web_proxy tried to run `networksetup -getwebproxy #{int}` and the command returned non-zero. Failing here..."
  end

  output_array = output.split("\n")
  output_array.each do |line|
    line_values = line.split(':')
    line_values.last.strip!
    case line_values.first
    when 'Enabled'
      interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
    when 'Server'
      interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
    when 'Port'
      interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
    when 'Authenticated Proxy Enabled'
      interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
    end
  end

  interface_properties[:provider] = :ruby
  interface_properties[:name]     = int.downcase
  interface_properties
end

A couple of notes on the method itself – first, the networksetup command must exit zero on success or non-zero on failure (which it does). If ever the networksetup command were to return non-zero, we’re raising our own Puppet::Error, documenting what happened, and bailing out.

This method is going to return a hash of properties and values that is going to be used by self.instances – so the case statement needs to account for that. HOWEVER you populate that hash is up to you (in my case, I’m checking for specific output that networksetup returns), but make sure that the hash has a value for the :ensure key at the VERY least.

Assembling self.instances

Once the helper provider methods have been defined, self.instances becomes reasonably simple:

1
2
3
4
5
6
def self.instances
  get_list_of_interfaces.collect do |int|
    proxy_properties = get_proxy_properties(int)
    new(proxy_properties)
  end
end

Remember that self.instances must return an array of provider instances, and each one of these instances must include the namevar and ensure value at the very least. Since self.get_proxy_properties returns a hash containing all the property ‘is’ values for a resource, declaring a new provider instance is as easy as calling the new() method on the return value of self.get_proxy_properties for every network interface. In the end, the return value of the collect method on get_list_of_interfaces will be an array of provider instances.

Existance, @property_hash, and more magical methods

Even though we have assembled a functional self.instances method, we don’t have complete implementation that will work with puppet resource. The problem is that Puppet can’t yet determine the existance of a resource (even though the resource’s ensure value has been set by self.instances). If you were to execute the code with puppet resource mac_web_proxy, you would get the error:

Error: Could not run: No ability to determine if mac_web_proxy exists

To satisfy Puppet, we need to implement an exists?() method for the instance of the provider. Fortunately, we don’t need to re-implement any existing logic and can instead use @property_hash

A @property_hash is born…

I’ve omitted one last thing that is borne out of self.instances, and that’s the @property_hash instance variable. @property_hash is populated by self.instances as an instance variable that’s available to methods of the INSTANCE of the provider (i.e. methods that ARE NOT prefixed with self.) containing all the ‘is’ values for a resource. Do you need to get the ‘is’ value for a property? Just use @property_hash[:property_name]. Since the exists? method is a method of the instance of the provider, and it’s essentially the same thing as the ensure value for a resource, let’s implement exists? by doing a check on the ensure value from the @property_hash variable:

1
2
3
def exists?
  @property_hash[:ensure] == :present
end

Perfect, now exists? will return true or false accordingly and Puppet will be satisfied.

Getter methods – the slow way

Puppet may be happy that you have an exists? method, but puppet resource won’t successfully run until you have a method that returns an ‘is’ value for every property of the type (i.e. the proxy_server, proxy_authenticated, and proxy_port attributes for the mac_web_proxy type). These ‘is value methods’ are called ‘getter’ methods: they’re methods of the instance of the provider, and are named exactly the same as the properties they represent.

You SHOULD be thinking: “Hey, we already have @property_hash, why can’t we just use it again? We can, and you COULD implement all the getter methods like so:

1
2
3
def proxy_server
  @property_hash[:proxy_server]
end

If you did that, you would be TECHNICALLY correct, but it would seem to be a waste of lines in a provider (especially if you have many properties).

Getter methods – the quicker ‘method’

Because uncle Luke hated excess lines of code, he made available a method called mk_resource_methods which works very similarly to Ruby’s attr_accessor method. Adding mk_resource_methods to your provider will AUTOMATICALLY create getter methods that pull values out of @property_hash in the similar way that I just demonstrated (it will also create SETTER methods too, but we’ll look at those later). Long story short – don’t make getter/setter methods if you’re using self.instances – just implement mk_resource_methods.

JUST enough for puppet resource

Putting everything that we’ve learned up until now, we should have a provider that looks like this:

lib/puppet/provider/mac_web_proxy/ruby.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Puppet::Type.type(:mac_web_proxy).provide(:ruby) do
  commands :networksetup => 'networksetup'

  mk_resource_methods

  def self.get_list_of_interfaces
    interfaces = networksetup('-listallnetworkservices').split("\n")
    interfaces.shift
    interfaces.sort
  end

  def self.get_proxy_properties(int)
    interface_properties = {}

    begin
      output = networksetup(['-getwebproxy', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug "#get_proxy_properties had an error -> #{e.inspect}"
      return {}
    end

    output_array = output.split("\n")
    output_array.each do |line|
      line_values = line.split(':')
      line_values.last.strip!
      case line_values.first
      when 'Enabled'
        interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
      when 'Server'
        interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
      when 'Port'
        interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
      when 'Authenticated Proxy Enabled'
        interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
      end
    end

    interface_properties[:provider] = :ruby
    interface_properties[:name]     = int.downcase
    Puppet.debug "Interface properties: #{interface_properties.inspect}"
    interface_properties
  end

  def self.instances
    get_list_of_interfaces.collect do |int|
      proxy_properties = get_proxy_properties(int)
      new(proxy_properties)
    end
  end

  def exists?
    @property_hash[:ensure] == :present
  end
end

Here’s a tree of the module I’ve assembled on my machine:

└(~/src/puppet-mac_web_proxy)▷ tree .
.
└── lib
   └── puppet
       ├── provider
       │   └── mac_web_proxy
       │       └── ruby.rb
       └── type
           └── mac_web_proxy.rb

To test out puppet resource, we need to make Puppet aware of our new custom module. To do that, let’s set the $RUBYLIB environmental variable. $RUBYLIB is queried by Puppet and is added to its load path when looking for additional Puppet plugins. You will need to set $RUBYLIB to the path of the lib directory in the custom module that you’ve assembled. Because my custom module is located in ~/src/puppet-mac_web_proxy, I’m going to set $RUBYLIB like so:

export RUBYLIB=~/src/puppet-mac_web_proxy/lib

You can execute that command from the command line, or set it in your ~/.{bash,zsh}rc and source that file.

Finally, with all the files in place and $RUBYLIB set, it’s time to officially run puppet resource (I’m going to do it in --debug mode to see the debug output that I’ve written into the code):

└(~/src/blogtests)▷ envpuppet puppet resource mac_web_proxy --debug
Debug: Executing '/usr/sbin/networksetup -listallnetworkservices'
Debug: Executing '/usr/sbin/networksetup -getwebproxy Bluetooth DUN'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"bluetooth dun"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Bluetooth PAN'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"bluetooth pan"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Display Ethernet'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>"foo.bar.baz", :proxy_port=>"80", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"display ethernet"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Ethernet'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>"proxy.corp.net", :proxy_port=>"1234", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"ethernet"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy FireWire'
Debug: Interface properties: {:ensure=>:present, :proxy_server=>"stuff.bar.blat", :proxy_port=>"8190", :proxy_authenticated=>nil, :provider=>:ruby, :name=>"firewire"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy Wi-Fi'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"wi-fi"}
Debug: Executing '/usr/sbin/networksetup -getwebproxy iPhone USB'
Debug: Interface properties: {:ensure=>:absent, :proxy_server=>nil, :proxy_port=>nil, :proxy_authenticated=>nil, :provider=>:ruby, :name=>"iphone usb"}
mac_web_proxy { 'bluetooth dun':
  ensure => 'absent',
}
mac_web_proxy { 'bluetooth pan':
  ensure => 'absent',
}
mac_web_proxy { 'display ethernet':
  ensure => 'absent',
}
mac_web_proxy { 'ethernet':
  ensure => 'absent',
}
mac_web_proxy { 'firewire':
  ensure       => 'present',
  proxy_port   => '8190',
  proxy_server => 'stuff.bar.blat',
}
mac_web_proxy { 'iphone usb':
  ensure => 'absent',
}
mac_web_proxy { 'wi-fi':
  ensure => 'absent',
}

Note that you will only see ‘is’ values if you have a proxy set on any of your network interfaces (obviously, if you’ve not setup a proxy, then it will show as ‘absent’ on every interface. You can setup a proxy by opening System Preferences, clicking on the Network icon, choosing an interface from the list on the left, clicking the Advanced button in the lower right corner of the window, clicking the ‘Proxies” tab at the top of the window, clicking the checkbox next to the “Web Proxy (HTTP)” choice, and entering a proxy URL and port. NOW do you get why we automate this bullshit?). Also, your list of network interfaces may not match mine if you have more or less interfaces than I do.

TADA! puppet resource WORKS! ISN’T THAT AWESOME?! WHY AM I TYPING IN CAPS?!

Prefetching, flushing, caching, and other hard shit

Okay, so up until now we’ve implemented one half of the equation – we can query ‘is’ values and puppet resource works. What about using this ‘more efficient’ method of getting values for a type on the OTHER end of the spectrum? What if instead of calling setter methods one-by-one to set values for all resources of a type in a catalog we had a way to do it all at once? Well, such a way exists, and it’s called the flush method…but we’re getting slightly ahead of ourselves. Before we get to flushing, we need to point out that self.instances is ONLY used by puppet resource – THAT’S IT (and it’s only used by self.instances when you GET values from the system, not when you SET values on the system…and if you never knew that puppet resource could actually SET values on the system, well, I guess you got another surprise today). If we want puppet agent or puppet apply to use the behavior that self.instances implements, we need to create another method: self.prefetch

self.prefetch

If you thought self.instances didn’t have much documentation, wait until you see self.prefetch. After wading the waters of self.prefetch, I’m PRETTY SURE its implementation might have come to uncle Luke after a long night in Reed’s chem lab where he might have accidently synthesized mescaline.

Let’s look at the codebase:

lib/puppet/provider.rb
1
2
3
4
5
6
7
8
9
# @comment Document prefetch here as it does not exist anywhere else (called from transaction if implemented)
# @!method self.prefetch(resource_hash)
# @abstract A subclass may implement this - it is not implemented in the Provider class
# This method may be implemented by a provider in order to pre-fetch resource properties.
# If implemented it should set the provider instance of the managed resources to a provider with the
# fetched state (i.e. what is returned from the {instances} method).
# @param resources_hash [Hash<{String => Puppet::Resource}>] map from name to resource of resources to prefetch
# @return [void]
# @api public

That’s right, documentation for self.prefetch in the Puppet codebase is 9 lines of comments in lib/puppet/provider.rb, which is awesome. So when is self.prefetch used to provide information to Puppet and when is self.instances used?

Puppet Subcommand Provider Method Execution Mode
puppet resource self.instances getting values
puppet resource self.prefetch setting values
puppet agent self.prefetch getting values
puppet agent self.prefetch setting values
puppet apply self.prefetch getting values
puppet apply self.prefetch setting values

.

This doesn’t mean that self.instances is really only handy for puppet resource – that’s definitely not the case. In fact, frequently you will find that self.instances is used by self.prefetch to do some of the heavy lifting. Even though self.prefetch works VERY SIMILARLY to the way that self.instances works for puppet resource (and by that I mean that it’s going to gather a list of instances of a type on the system, and it’s also going to populate @property_hash for puppet apply, puppet agent, and when when puppet resource is setting values), it’s not an exact one-for-one match with self.instances. The self.prefetch method for a type is called once per run when Puppet encounters a resource of that type in the catalog. The argument to self.prefetch is a hash of all managed resources of that type that are encountered in a compiled catalog for that node (the hash’s key will be the namevar of the resource, and the value will be an instance of Puppet::Type – in this case, Puppet::Type::Mac_web_proxy). Your task is to implement a self.prefetch method that gets an array of instances of the provider that are discovered on the system, iterates through the hash passed to self.prefetch (containing all the resources of the type that were discovered in the catalog), and passes the correct instance of the provider that was discovered on the system to the provider= method of the correct instance of the type that was discovered in the catalog.

What the actual fuck?!

Okay, let’s break that apart to try and discover exactly what’s going on here. Assume that I’ve setup a proxy for the ‘FireWire’ interface on my laptop, and I want to try and manage that resource with puppet apply (i.e. something that uses self.prefetch). The resource in the manifest used to manage the proxy will look something like this:

1
2
3
4
5
mac_web_proxy { 'firewire':
  ensure       => 'present',
  proxy_port   => '8080',
  proxy_server => 'proxy.server.org',
}

When self.prefetch is called by Puppet, it’s going to be passed a hash looking something like this:

1
{ "firewire" => Mac_web_proxy[firewire] }

Because only one resource is encountered in the catalog, only one key/value pair shows up in the hash that’s passed as the argument to self.prefetch.

The job of self.prefetch is to find the current state of Mac_web_proxy['firewire'] on the system, create a new instance of the mac_web_proxy provider that contains the ‘is’ values for the Mac_web_proxy['firewire'] resource, and assign this provider instance as the value of the provider= method to the instance of the mac_web_proxy TYPE that is the VALUE of the ‘firewire’ key of the hash that’s passed to self.prefetch.

No, really, that’s what it’s supposed to do. I’m not even sure what’s real anymore

You’ll remember that self.instances gives us an array of resources that were discovered on the system, so we have THAT part of the implementation written. We also have the hash of resources that were encountered in the catalog – so we have THAT part done too. Our only job is to connect the dots (la la la la), programmatically speaking. This should just about do it:

1
2
3
4
5
6
7
def self.prefetch(resources)
  instances.each do |prov|
    if resource = resources[prov.name]
      resource.provider = prov
    end
  end
end

I want to make a confession right now – I’ve only ever copied and pasted this code into every provider I’ve ever written that needed self.prefetch implemented. It wasn’t until someone actually asked me what it DID that I had to walk the path of figuring out EXACTLY what it did. Based on the last couple of paragraphs – can you blame me?

This code iterates through the array of resources returned by self.instances, tries to assign a variable resource based on referencing a key in the resources hash using the name of the resource (remember, resources is a hash containing all resources in the catalog), and, if this assignment works (i.e. it isn’t nil, which is what happens when you reference a key in a Ruby hash that doesn’t exist), then we’re calling the provider= method on the instance of the type that was referenced in the resources hash, and passing it the resource that was discovered on the system by self.instances.

Wow.

Why DID we do all of that? We did it all for the @provider_hash. Doing this will populate @provider_hash in all methods of the instance of the provider (i.e. exists?, create, destroy, etc..) just like self.instances did for puppet resource.

Flush it; Ship it

As I alluded to above, the opposite side of the coin to prefetching (which is a way to query the state for all resources at once) is flushing (or specifically the flush method). The flush method is called once per resource whenever the ‘is’ and ‘should’ values for a property differ (and synchronization needs to occur). The flush method does not take the place of property setter methods, but, rather, is used in conjunction with them to determine how to synchronize resource property values. In this vein, it’s a single trigger that can be used to set all property values for an individual resource simultaneously.

There are a couple of strategies for implementing flush, but one of the more popular ones in use is to create an instance variable that will hold values to be synchronized, and then determine inside flush how best to make as-few-as-possible calls to the system to synchronize all the property values for an individual resource.

Our resource type is unique because the networksetup binary that we’ll be using to synchronize values allows us to set most every property value with a single command. Because of this, we really only need that instance variable for one property – the ensure value. But let’s start with the initialization of that instance variable for the flush method:

1
2
3
4
def initialize(value={})
  super(value)
  @property_flush = {}
end

The initialize method is magic to Ruby – it’s invoked when you instantiate a new object. In our case, we want to create a new instance variable – @property_flush – that will be available to all methods of the instance of the provider. This instance variable will be a hash and will contain all the ‘should’ values that will need to be synchronized for a resource. The super method in Ruby sends a message to the parent of the current object, asking it to invoke a method of the same name (e.g. intialize). Basically, the initialize method is doing the exact same thing as it has always done with one exception – making the instance variable available to all methods of the instance of the provider.

The only ‘setter’ method you need

This provider is going to be unique not only because the networksetup binary will set values for ALL properties, but because to change/set ANY property values you have to change/set ALL the property values at the same time. Typically, you’ll see providers that will need to pass arguments to a binary in order to set individual values. For example, if you had a binary fooset that took arguments of --bar and --baz to set values respectively for bar and baz properties of a resource, you might see the following setter and flush methods for bar and baz:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def bar=(value)
  @property_flush[:bar] = value
end

def baz=(value)
  @property_flush[:baz] = value
end

def flush
  array_arguments = []
  if @property_flush
    array_arguments << '--bar' << @property_flush[:bar] if @property_flush[:bar]
    array_arguments << '--baz' << @property_flush[:baz] if @property_flush[:baz]
  end
  if ! array_arguments.empty?
    fooset(array_arguments, resource[:name])
  end
end

That’s not the case for networksetup – in fact, one of the ONLY places in our code where we’re going to throw a value inside @property_flush is going to be in the destroy method. If our intention is to ensure a proxy absent (or, in this case, disable the proxy for a network interface), then we can short-circuit the method we’re going to create to set proxy values by simply checking for a value in @property_flush[:ensure]. Here’s what the destroy method looks like:

1
2
3
def destroy
  @property_flush[:ensure] = :absent
end

Next, we need a method that will set values for our proxy. This method will handle all interaction to networksetup. So, how do you set proxy values with networksetup?

networksetup -setwebproxy <networkservice> <domain> <port number> <authenticated> <username> <password>

The three properties to our mac_web_proxy type are proxy_port, proxy_server, and proxy_authenticated which map to the ‘<port number>’, ‘<domain>’, and ‘<authenticated>’ values in this command. To change any of these values means we have to pass ALL of these values (again, which is why our flush implementation may be unique from other flush implementations). Here’s what the set_proxy method looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def set_proxy
  if @property_flush[:ensure] == :absent
      networksetup(['-setwebproxystate', resource[:name], 'off'])
      return
  end

  if (resource[:proxy_server].nil? or resource[:proxy_port].nil?)
    raise Puppet::Error, "Proxy types other than 'auto' require both a proxy_server and proxy_port setting"
  end
  if resource[:proxy_authenticated] != :true
    networksetup(
      [
        '-setwebproxy',
        resource[:name],
        resource[:proxy_server],
        resource[:proxy_port]
      ]
    )
  else
    networksetup(
      [
        '-setwebproxy',
        resource[:name],
        resource[:proxy_server],
        resource[:proxy_port],
        'on',
        resource[:authenticated_username],
        resource[:authenticated_password]
      ]
    )
  end
  networksetup(['-setwebproxystate', resource[:name], 'on'])
end

This helper method does all the validation checks for required properties, executes the correct command, and enables the proxy. Now, let’s implement flush:

1
2
3
4
5
6
7
def flush
  set_proxy

  # Collect the resources again once they've been changed (that way `puppet
  # resource` will show the correct values after changes have been made).
  @property_hash = self.class.get_proxy_properties(resource[:name])
end

The last line re-populates @property_hash with the current resource values, and is necessary for puppet resource to return correct values after it makes a change to a resource during a run.

The final method

We’ve implemented logic to query the state of all resources, to prefetch those states, to make changes to all properties at once, and to destroy a resource if it exists, but we’ve yet to implement logic to CREATE a resource if it doesn’t exist and it should. Well, this is a bit of a lie – the logic is in the code, but we don’t have a create method, so Puppet’s going to complain:

1
2
3
def create
  @property_flush[:ensure] = :present
end

Technically, this method doesn’t have to do a DAMN thing. Why? Remember how the flush method is triggered when a resource’s ‘is’ values differ from its ‘should’ values? Also, remember how the flush method only calls the set_proxy method? And, finally, remember how set_proxy only checks if @property_flush[:ensure] == :absent (and if it doesn’t, then it goes about its merry way running networksetup)? Right, well add these things up and you’ll realize that the create method is essentially meaningless based on our implementation (but if you OMIT create, then Puppet’s going to throw a shit-fit in the shape of of a Puppet::Error exception):

Error: /Mac_web_proxy[firewire]/ensure: change from absent to present failed: Could not set 'present' on ensure: undefined method `create' for Mac_web_proxy[firewire]:Puppet::Type::Mac_web_proxy

So make Puppet happy and write the goddamn create method, okay?

The complete provider:

Wow, that was a wild ride, huh? If you’ve been coding along, you should have created a file that looks something like this:

lib/puppet/provider/mac_web_proxy/ruby.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
Puppet::Type.type(:mac_web_proxy).provide(:ruby) do
  commands :networksetup => 'networksetup'

  mk_resource_methods

  def initialize(value={})
    super(value)
    @property_flush = {}
  end

  def self.get_list_of_interfaces
    interfaces = networksetup('-listallnetworkservices').split("\n")
    interfaces.shift
    interfaces.sort
  end

  def self.get_proxy_properties(int)
    interface_properties = {}

    begin
      output = networksetup(['-getwebproxy', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug "#get_proxy_properties had an error -> #{e.inspect}"
      return {}
    end

    output_array = output.split("\n")
    output_array.each do |line|
      line_values = line.split(':')
      line_values.last.strip!
      case line_values.first
      when 'Enabled'
        interface_properties[:ensure] = line_values.last == 'No' ? :absent : :present
      when 'Server'
        interface_properties[:proxy_server] = line_values.last.empty? ? nil : line_values.last
      when 'Port'
        interface_properties[:proxy_port] = line_values.last == '0' ? nil : line_values.last
      when 'Authenticated Proxy Enabled'
        interface_properties[:proxy_authenticated] = line_values.last == '0' ? nil : line_values.last
      end
    end

    interface_properties[:provider] = :ruby
    interface_properties[:name]     = int.downcase
    Puppet.debug "Interface properties: #{interface_properties.inspect}"
    interface_properties
  end

  def self.instances
    get_list_of_interfaces.collect do |int|
      proxy_properties = get_proxy_properties(int)
      new(proxy_properties)
    end
  end

  def create
    @property_flush[:ensure] = :present
  end

  def exists?
    @property_hash[:ensure] == :present
  end

  def destroy
    @property_flush[:ensure] = :absent
  end

  def self.prefetch(resources)
    instances.each do |prov|
      if resource = resources[prov.name]
        resource.provider = prov
      end
    end
  end

  def set_proxy
    if @property_flush[:ensure] == :absent
        networksetup(['-setwebproxystate', resource[:name], 'off'])
        return
    end

    if (resource[:proxy_server].nil? or resource[:proxy_port].nil?)
      raise Puppet::Error, "Both the proxy_server and proxy_port parameters require a value."
    end
    if resource[:proxy_authenticated] != :true
      networksetup(
        [
          '-setwebproxy',
          resource[:name],
          resource[:proxy_server],
          resource[:proxy_port]
        ]
      )
    else
      networksetup(
        [
          '-setwebproxy',
          resource[:name],
          resource[:proxy_server],
          resource[:proxy_port],
          'on',
          resource[:authenticated_username],
          resource[:authenticated_password]
        ]
      )
    end
    networksetup(['-setwebproxystate', resource[:name], 'on'])
  end

  def flush
    set_proxy

    # Collect the resources again once they've been changed (that way `puppet
    # resource` will show the correct values after changes have been made).
    @property_hash = self.class.get_proxy_properties(resource[:name])
  end
end

Undoubtedly there are better ways to write this Ruby code, no? Also, I’m SURE I have some errors/bugs in that code. It’s those things that keep me in a job…

Final Thoughts

So, I write these posts not to belittle or mock anyone who works on Puppet or wrote any of its implementation (except the amazing/terrifying bastard who came up with self.prefetch). Anybody who contributes to open source and who builds a tool to save some time for a bunch of sysadmins is fucking awesome in my book.

No, I write these posts so that you can understand the ‘WHY’ piece of the puzzle. If you fuck up the ‘HOW’ of the code, you can spend some time in Google and IRB to figure it out, but if you don’t understand the ‘WHY’ then you’re probably not going to even bother.

Also, selfishly, I move from project to project so quickly that it’s REALLY easy to forget both why AND how I did what I did. Posts like these give me someplace to point people when they ask me “What’s self.prefetch?” that ISN’T just the source code or a liquor store.

This isn’t the last post in the series, by the way. I haven’t even TOUCHED on writing unit tests for this code, so that’s going to be a WHOLE other piece altogether. Also, while this provider manages a WEB proxy for a network interface, understand that there are MANY MORE kinds of proxies for OS X network interfaces (including socks and gopher!). A future post will show you how to refactor the above into a parent provider that can be inherited to allow for code re-use among all the proxy providers that I need to create.

As always, you’re more than welcome to comment, ask questions, or simply bitch at me both on this blog as well as on Twitter: @glarizza. Hopefully this post helped you out and you learned a little bit more about how Puppet providers do their dirty work…

Namaste, bitches.

When to Hiera (Aka: How Do I Module?)

I’m convinced that writing Puppet modules is the ultimate exercise in bikeshedding: if it works, someone’s probably going to tell you that you could have done it better, if you’re using the methods suggested today, they’re probably going to be out-of-date in about 6 months, and good luck writing something that someone else can use cleanly without needing to change it.

I can help you with the last two.

Data and Code Separation == bliss?

I wrote a blog post about 2 years ago detailing why separating your data from your Puppet code was a good idea. The idea is still valid, which means it’s probably one of the better ideas I’ve ever stolen (Does anyone want any HD-DVDs?). Hunter Haugen and I put together a quick blog post on using Hiera to solve the data/code problem because there wasn’t a great bit of documentation on Hiera at that point in time. Since then, Hiera’s been widely accepted as “a good idea” and is in use in production Puppet environments around the world. In most every environment, usage of Hiera by more than just one person eventually gives birth to the question that inspired this post:

“What the hell does and does NOT belong in Hiera?”

Puppet data models

The params class pattern

Many Puppet modules out there since Puppet 2.6 have begun using this pattern:

puppetlabs-mysql/manifests/server.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class mysql::server (
  $config_file             = $mysql::params::config_file,
  $manage_config_file      = $mysql::params::manage_config_file,
  $old_root_password       = $mysql::params::old_root_password,
  $override_options        = {},
  $package_ensure          = $mysql::params::server_package_ensure,
  $package_name            = $mysql::params::server_package_name,
  $purge_conf_dir          = $mysql::params::purge_conf_dir,
  $remove_default_accounts = false,
  $restart                 = $mysql::params::restart,
  $root_group              = $mysql::params::root_group,
  $root_password           = $mysql::params::root_password,
  $service_enabled         = $mysql::params::server_service_enabled,
  $service_manage          = $mysql::params::server_service_manage,
  $service_name            = $mysql::params::server_service_name,
  $service_provider        = $mysql::params::server_service_provider,
  # Deprecated parameters
  $enabled                 = undef,
  $manage_service          = undef
) inherits mysql::params {

  ## Puppet goodness goes here
}

If you’re not familiar, this is a Puppet class definition for mysql::server that has several parameters defined and defaulted to values that come out of the mysql::params class. The mysql::params class looks a bit like this:

puppetlabs-mysql/manifests/params.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class mysql::params {
  case $::osfamily {
    'RedHat': {
      if $::operatingsystem == 'Fedora' and (is_integer($::operatingsystemrelease) and $::operatingsystemrelease >= 19 or $::operatingsystemrelease == "Rawhide") {
        $client_package_name = 'mariadb'
        $server_package_name = 'mariadb-server'
      } else {
        $client_package_name = 'mysql'
        $server_package_name = 'mysql-server'
      }
      $basedir             = '/usr'
      $config_file         = '/etc/my.cnf'
      $datadir             = '/var/lib/mysql'
      $log_error           = '/var/log/mysqld.log'
      $pidfile             = '/var/run/mysqld/mysqld.pid'
      $root_group          = 'root'
    }

    'Debian': {
      ## More parameters defined here
    }
  }
}

This pattern puts all conditional logic for all the variables/parameters used in the module inside one class – the mysql::params class. It’s called the ‘params class pattern’ because we suck at naming things.

Pros:

  • All conditional logic is in a single class
  • You always know which class to seek out if you need to change any of the logic used to determine a variable’s value
  • You can use the include function because parameters for each class will be defaulted to the values that came out of the params class
  • If you need to override the value of a particular parameter, you can still use the parameterized class declaration syntax to do so
  • Anyone using Puppet version 2.6 or higher can use it (i.e. anyone who’s been using Puppet since about 2010).
Cons:
  • Conditional logic is repeated in every module
  • You will need to use inheritance to inherit parameter values in each subclass
  • It’s another place to look if you ALSO use Hiera inside the module
  • Data is inside the manifest, so business logic is also inside params.pp

Hiera defaults pattern

When Hiera hit the scene, one of the first things people tried to do was to incorporate it into existing modules. The logic at that time was that you could keep all parameter defaults inside Hiera, rid yourself of the params class, and then just make Hiera calls out for your data. This pattern looks like this:

puppetlabs-mysql/manifests/server.pp
1
2
3
4
5
6
7
8
9
class mysql::server (
  $config_file             = hiera('mysql::params::config_file', 'default value'),
  $manage_config_file      = hiera('mysql::params::manage_config_file', 'default value'),
  $old_root_password       = hiera('mysql::params::old_root_password', 'default value'),
  ## Repeat the above pattern
) {

  ## Puppet goodness goes here
}

Pros:

  • All data is locked up in Hiera (and its multiple backends)
  • Default values can be provided if a Hiera lookup fails

Cons:

  • You need to have Hiera installed, enabled, and configured to use this pattern
  • All data, including non-business logic, is in Hiera
  • If you use the default value, data could either come from Hiera OR the default (multiple places to look when debugging)

Hybrid data model

This pattern is for those people who want the portability of the params.pp class combined with the power of Hiera. Because it’s a hybrid, there are multiple ways that people have set it up. Here’s a general example:

puppetlabs-mysql/manifests/server.pp
1
2
3
4
5
6
class mysql::server (
  $config_file             = hiera('mysql::params::config_file', $mysql::params::config_file),
  $manage_config_file      = hiera('mysql::params::manage_config_file', $mysql::params::manage_config_file),
  $old_root_password       = hiera('mysql::params::old_root_password', $mysql::params::old_root_password),
  ## Repeat the above pattern
) inherits mysql::params {

Pros:

  • Data is sought from Hiera first and then defaulted back to the params class parameter
  • Keep non-business logic (i.e. OS specific data) in the params class and business logic in Hiera
  • Added benefits of both models

Cons:

  • Where did the variable get set – Hiera or the params class? Debugging can be hard
  • Requires Hiera to be setup to use the module
  • If you fudge a variable name in Hiera, you get the params class default – see Con #1

Hiera data bindings in Puppet 3.x.x

In Puppet 3.0.0, there was a concept introduced called Data Bindings. This created a federated data model automatically incorporating a Hiera lookup. Previously, the order that Puppet would use to determine the value of a parameter was to first use a value passed with the parameterized class declaration syntax (i.e. the below:).

parameterized class declaration
1
2
3
class { 'apache':
  package_name => 'httpd',
}

If a parameter was not passed with the parameterized class syntax (like the ‘package_name’ parameter above’), Puppet would then look for a default value inside the class definition (i.e. the below:).

parameter default in a class definition
1
2
3
4
5
class ntp (
  $ntpserver = 'default.ntpserver.org'
) {
  # Use $ntpserver in a file declaration...
}

If the value of ‘ntpserver’ wasn’t passed with a parameterized class declaration, then the value would be set to ‘default.ntpserver.org’, since that’s the default set in the above class definition.

Failing both of these conditions, Puppet would throw a parse error and say that it couldn’t determine a value for a class parameter.

As of Puppet 3.0.0, Puppet will now do a Hiera lookup for the fully namespaced value of a class parameter

Roles and Profiles

The roles and profiles pattern has been written about a number of times and is ALSO considered to be ‘a best practice’ when setting up your Puppet environment. What roles and profiles gets you is a ‘wrapper class’ that allows you to declare classes within this wrapper class:

profiles/manifests/wordpress.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class profiles::wordpress {
  # Data Lookups
  $site_name               = hiera('profiles::wordpress::site_name')
  $wordpress_user_password = hiera('profiles::wordpress::wordpress_user_password')
  $mysql_root_password     = hiera('profiles::wordpress::mysql_root_password')
  $wordpress_db_host       = hiera('profiles::wordpress::wordpress_db_host')
  $wordpress_db_name       = hiera('profiles::wordpress::wordpress_db_name')
  $wordpress_db_password   = hiera('profiles::wordpress::wordpress_db_password')

  ## Create user
  group { 'wordpress':
    ensure => present,
  }
  user { 'wordpress':
    ensure   => present,
    gid      => 'wordpress',
    password => $wordpress_user_password,
    home     => '/var/www/wordpress',
  }

  ## Configure mysql
  class { 'mysql::server':
    root_password => $wordpress_root_password,
  }

  class { 'mysql::bindings':
    php_enable => true,
  }

  ## Configure apache
  include apache
  include apache::mod::php
}

## Continue with declarations...

Notice that any variables that might have business specific logic are set with Hiera lookups. These Hiera lookups do NOT have default values, which means the hiera() function will throw a parse error if a value is not returned. This is IDEAL because we WANT TO KNOW if a Hiera lookup fails – this means we failed to put the data in Hiera and should be corrected BEFORE a state that might contain invalid data is enforced with Puppet.

You then have a ‘Role’ wrapper class that simply includes many of the ‘Profile’ wrapper classes:

roles/manifests/frontend.pp
1
2
3
4
5
6
7
class roles::frontend {
  include profiles::mysql
  include profiles::apache
  include profiles::java
  include profiles::jboss
  # include more profiles...
}

The idea being that Profiles abstract all the technical bits that need to declared to setup a piece of technology, and Roles will abstract all the business logic for what pieces of technology should be installed on a certain ‘class’ of machine. Basically, you can say that “all our frontend infrastructure should have mysql, apache, java, jboss…”. In this statement, the Role is ‘frontend infrastructure’ and the Profiles are ‘mysql, apache, java, jboss…’.

Pros:

  • Hiera data lookups are confined to a wrapper class OUTSIDE of the component modules (like mysql, apache, java, etc…)
  • Data lookups for parameters containing business logic are done with Hiera
  • Non-business specific data is pulled from the module (i.e. the params class)
  • Wrapper modules can be ‘included’ with the include function, helping to eliminate multiple class declarations using the parameterized class declaration syntax
  • Component modules are backward-compatible to Puppet 2.6 while wrapper modules still get to use a modern data lookup mechanism (Hiera)
  • Component modules do NOT contain any business specific logic, which means they’re portable

Cons:

  • Hiera must be setup to use the wrapper modules
  • Wrapper modules add another debug path for variable data
  • Wrapper modules add another layer of abstraction

Data in Puppet Modules

R.I. Pienaar (the original author of MCollective, Hiera, and much more) published a blog post recently on implementing a folder for Puppet modules that Hiera can traverse when it does data lookups. This construct isn’t new, there was a feature request for this behavior filed in October of 2012 with a subsequent pull request that implemented this functionality (they’re both worth reads for further information). The pull request didn’t get merged, and so R.I. implemented the functionality inside a module on the Puppet Forge. In a nutshell, it’s a hiera.yaml configuration file INSIDE THE MODULE that implements a module-specific hierarchy, and a ‘data’ folder (also inside the module) that allows for individual YAML files that Hiera could read. This hierarchy is consulted AFTER the site-specific hiera.yaml file is read (i.e. /etc/puppet/hiera.yaml or /etc/puppetlabs/puppet/hiera.yaml), and the in-module data files are consulted AFTER the site-specific Hiera data files are read (normally found in either /etc/puppet/hieradata or /etc/puppetlabs/puppet/hieradata).

The argument here is that there’s a data store for SITE-SPECIFIC Hiera data that should be kept outside of modules, but there’s not a MODULE-SPECIFIC data store that Hiera can use. The argument isn’t whether data that should be shared with other people belongs inside a site-specific Hiera datastore (protip: it doesn’t. Data that’s not business-specific should be shared with others and kept inside the module), the argument is that it shouldn’t be locked up inside the DSL where the barrier-to-entry is learning Puppet’s DSL syntax. Whereas /etc/puppet/hiera.yaml or /etc/puppetlabs/puppet/hiera.yaml sets up the hierarchy for all your site-specific data, there’s no per-module hiera.yaml file for all module-specific data, and there’s no place to put module-specific Hiera data.

But module-specific data goes inside the params class and business-specific data goes inside Hiera, right?

Sure, but for some people the Puppet DSL is a barrier. The argument is that there should be a lower barrier to entry to contribute parameter data to Puppet that doesn’t require you to learn the syntax of if/case/selector statements in the Puppet DSL. There’s also the argument that if you want to add support for an operatingsystem to your module, you have to modify the params class file and add another entry to the if/case/selector statement – wouldn’t it be easier to just add another YAML file into a data folder that doesn’t affect existing datafiles?

Great, ANOTHER hierarchy to traverse for data – that’s going to get confusing

Well, think about it right now – most EVERY params class of EVERY module (if it supports multiple operatingsystems) does some sort of conditional logic to determine values for parameters on a per-OS basis. That’s something that you need to traverse. And many modules use different conditional data to determine what paramters to use. Look at the mysql params class example above – it not only splits on $osfamily, but it also checks specific operatingsystems. That’s a conditional inside a conditional. You’re TRAVERSING conditional data right now to find a value – the only difference is that this method doesn’t use the DSL, it uses Hiera and YAML.

Sure, but this is outside of Puppet and you’re losing visibility inside Puppet with your data

You’re already doing that if you’re using the params class. In this case, visibility is moved to YAML files instead of separate Puppet classes.

Setting it up

You will first need to install R.I.’s module from the Puppet Forge. As of this writing, it’s version 0.0.1, so ensure you have the most recent version using the puppet module tool:

[root@linux modules]# puppet module install ripienaar/module_data
Notice: Preparing to install into /etc/puppetlabs/puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/puppet/modules
└── ripienaar-module_data (v0.0.1)

Next, you’ll need to setup a module to use the data-in-modules pattern. Take a look at the tree of a sample module:

[root@linux modules]# tree mysql/
mysql/
├── data
│   ├── hiera.yaml
│   └── RedHat.yaml
└── manifests
    └── init.pp

I created a sample mysql module based on the examples above. All of the module’s Hiera data (including the module-specific hiera.yaml file) goes in the data folder. This module should be placed in Puppet’s modulepath – and if you don’t know where Puppet’s modulepath is set, run the puppet config face to determine that:

[root@linux modules]# puppet config print modulepath
/etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules

In my case, I’m putting the module in /etc/puppetlabs/puppet/modules (since I’m running Puppet Enterprise). Here’s the hiera.yaml file from the sample mysql module:

mysql/data/hiera.yaml
1
2
:hierarchy:
  - "%{::osfamily}"

I’ve also included a YAML file for the $osfamily of RedHat:

mysql/data/RedHat.yaml
1
2
3
4
---
mysql::config_file: '/path/from/data_in_modules'
mysql::manage_config_file: true
mysql::old_root_password: 'password_from_data_in_modules'

Finally, here’s what the mysql class definition looks like from manifests/init.pp:

mysql/manifests/init.pp
1
2
3
4
5
6
7
8
9
class mysql (
  $config_file        = 'module_default',
  $manage_config_file = 'module_default',
  $old_root_password  = 'module_default'
) {
  notify { "The value of config_file: ${config_file}": }
  notify { "The value of manage_config_file: ${manage_config_file}": }
  notify { "The value of old_root_password: ${old_root_password}": }
}

Everything should be setup to notify the value of a couple of parameters. Now, to test it out…

Testing data-in-modules

Let’s include the mysql class with puppet apply to see where it’s looking for data:

[root@linux modules]# puppet apply -e 'include mysql'
Notice: The value of config_file: /path/from/data_in_modules
Notice: /Stage[main]/Mysql/Notify[The value of config_file: /path/from/data_in_modules]/message: defined 'message' as 'The value of config_file: /path/from/data_in_modules'
Notice: The value of manage_config_file: true
Notice: /Stage[main]/Mysql/Notify[The value of manage_config_file: true]/message: defined 'message' as 'The value of manage_config_file: true'
Notice: The value of old_root_password: password_from_data_in_modules
Notice: /Stage[main]/Mysql/Notify[The value of old_root_password: password_from_data_in_modules]/message: defined 'message' as 'The value of old_root_password: password_from_data_in_modules'
Notice: Finished catalog run in 0.62 seconds

Since I’m running on an operatingsystem whose family is ‘RedHat’ (i.e. CentOS), you can see that the values of all the parameters were pulled from the Hiera data files inside the module. Let’s temporarily change the $osfamily fact value and see what happens:

[root@linux modules]# FACTER_osfamily=Debian puppet apply -e 'include mysql'
Notice: The value of config_file: module_default
Notice: /Stage[main]/Mysql/Notify[The value of config_file: module_default]/message: defined 'message' as 'The value of config_file: module_default'
Notice: The value of old_root_password: module_default
Notice: /Stage[main]/Mysql/Notify[The value of old_root_password: module_default]/message: defined 'message' as 'The value of old_root_password: module_default'
Notice: The value of manage_config_file: module_default
Notice: /Stage[main]/Mysql/Notify[The value of manage_config_file: module_default]/message: defined 'message' as 'The value of manage_config_file: module_default'
Notice: Finished catalog run in 0.51 seconds

This time, when I specified a value of Debian for $osfamily, the parameter values were pulled from the declaration in the mysql class definition (i.e. from inside mysql/manifests/init.pp).

Testing outside of Puppet

One of the big pros of Hiera is that it comes with the hiera binary that can be run from the command line to test values. This works just fine for site-specific module data that’s defined in the central hiera.yaml file that’s usually defined in /etc/puppet or /etc/puppetlabs/puppet, but the data-in-modules pattern relies on a Puppet indirector to point to the current module’s data folder, and thus (as of right now) there’s not a simple way to run the hiera binary to pull data out of modules WITHOUT running Puppet. This is not a dealbreaker, and doesn’t stop anybody from hacking up something that WILL look inside modules for data, but as of right now it doesn’t yet exist. It also makes debugging for values that come out of modules a bit more difficult.

The scorecard for data-in-modules

Pros:

  • Parameters are defined in YAML and not Puppet DSL (i.e. you only need to know YAML and not the Puppet DSL)
  • Adding parameters is as simple as adding another YAML file to the module
  • Module authors provide module data that can be read by Puppet 3.x.x Hiera data bindings

Cons:

  • Must be using Puppet 3.0.0 or higher
  • Additional hierarchy and additional Hiera data file to consult when debugging
  • Not (currently) an easy/straightforward way to use the hiera binary to test values
  • Currently depends on a Puppet Forge module being installed on your system

What are you trying to say?

I am ALL ABOUT code portability, re-usability, and not building 500 apache modules. Ever since people have been building modules, they’ve been putting too much data inside modules (to the point where they can’t share them with anyone else). I can’t tell you how many times I’ve heard “We have a module for that, but I can’t share it because it has all our company-specific data in it.”

Conversely, I’ve also seen organizations put EVERYTHING in their site-specific Hiera datastore because “that’s the place for Puppet data.” They usually end up with 15+ levels in their Hiera hierarchies because they’re doing things like this:

hiera.yaml
1
2
3
4
5
6
7
8
9
10
11
12
---
:backends:
  - yaml

:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - "%{osfamily}"
  - "%{osfamily}/%{operatingsystem}"
  - "%{osfamily}/%{operatingsystem}/%{os_version_major}"
  - "%{osfamily}/%{operatingsystem}/%{os_version_minor}"
  # Repeat until you have 15 levels of WTF

This leads us back again to “What does and DOESN’T go in Hiera?” I usually say the following:

Data in site-specific Hiera datastore

  • Business-specific data (i.e. internal NTP server, VIP address, per-environment java application versions, etc…)
  • Sensitive data
  • Data that you don’t want to share with anyone else

Data that does NOT go in the site-specific Hiera datastore

  • OS-specific data
  • Data that EVERYONE ELSE who uses this module will need to know (paths to config files, package names, etc…)

Basically, if I ask you if I can publish your module to the Puppet Forge, and you object because it has business-specific or sensitive data in it, then you probably need to pull that data out of the module and put it in Hiera.

The recommendations that I give when I go on-site with Puppet users is the following:

  • Use Roles/Profiles to create wrapper-classes for class declaration
  • Do ALL Hiera lookups for site-specific data inside your ‘Profile’ wrapper classes
  • All module-specific data (like paths to config files, names of packages to install, etc…) should be kept in the module in the params class
  • All ‘Role’ wrapper classes should just include ‘Profile’ wrapper classes – nothing else

But what about Data in Modules?

I went through all the trouble of writing up the Data in Modules pattern, but I didn’t recommend or even MENTION it in the previous section. The reason is NOT because I don’t believe in it (I actually think the future will be data outside of the DSL inside a Puppet module), the reason is because it’s not YET in Puppet’s core and because it’s not YET been widely tested. If you’re an existing Puppet user that’s been looking for a way to split data outside of the DSL, here is your opportunity. Use the pattern and PLEASE report back on what you like and don’t like about it. The functionality is in a module, so it’s easy to tweak. If you’re new to Puppet and are comfortable with the DSL, then the params class exists and is available to you.

To voice your opinion or to follow the progress of data in modules, follow or comment on this Puppet ticket.

Update

R.I. posted another article on the problem with params.pp that is worth reading. He gives compelling reasons on why he built Hiera, why params.pp WORKS, but also why he believes it’s not the future of Puppet. R.I. goes even further to explain that it’s not necessarily the Puppet DSL that is the barrier to entry, it’s that this sort of data belongs in a file for config data and not INSIDE THE CODE itself (i.e. inside the Puppet DSL). Providing data inside modules gives module authors a way to provide this configuration data in files that AREN’T the Puppet DSL (i.e. not inside the code).

Who Abstracted My Ruby?

Previously, on Lost, I said a lot of words about Puppet Types; you should totally check it out. In this second installment, you’re going to find out how to actually throw pure Ruby at Puppet in a way that makes you feel accomplished. And useful. And elitist. Well, possibly just elitist. Either way, read on – there’s much thought-leadership to be done…

In the last post, we learned that Types will essentially dictate the attributes that you’ll be passing in your resource declaration using the DSL. In the simplest and crudest explanation I could muster, types model how your declaration will look in the manifest. Providers are where the actual IMPLEMENTATION happens. If you’ve ever wondered how this:

1
2
3
package { 'httpd':
  ensure => installed,
}

eventually gets turned into this:

1
yum install -e 0 -d 0 -y httpd

your answer would be “It’s in the provider file”.

Dirty black magic

I’ve seen people do the craziest shit imaginable in the Puppet DSL simply because they’re:

  • Unsure how types and providers work
  • Afraid of Ruby
  • Confused by error messages
  • Afraid to ask for help

Sometimes you have a problem that can only be solved by interacting with data that’s returned by a binary (using some binary to get a value, and then using that binary to set a value, and so on…). I see people writing defined resource types with a SHIT TON of exec statements and conditional logic to model this data when a type and provider would not only BETTER model the problem but would also be shareable and re-useable by other folk. The issue is that while the DSL is REALLY easy to get started with, types and providers still feel like dirty black magic.

The reason is because they’re dirty black magic.

Hopefully, I can help get you over the hump and onto a working implementation. Let’s take a problem I had last week:

Do this if that, and then be done

I was working with a group who wanted to set a list of domains that would bypass their web proxy for a specific network interface on an OS X workstation. It sounds so simple, because it was. Due to the amount of time I had on-site, I wrote a class with some nasty exec statements, a couple of facts, and some conditional logic because that’s what you do when you’re in a hurry…but it doesn’t make it right. When I left, I hacked up a type and provider, and it’s a GREAT example because you probably have a similar problem. Let’s look at the information we have:

The list of network interfaces:

1
2
3
4
5
6
7
8
9
└▷ networksetup -listallnetworkservices
An asterisk (*) denotes that a network service is disabled.
Bluetooth DUN
Display Ethernet
Ethernet
FireWire
Wi-Fi
iPhone USB
Bluetooth PAN

Getting the list of bypass domains for an interface:

1
2
3
4
└▷ networksetup -getproxybypassdomains Ethernet
www.garylarizza.com
*.corp.net
10.13.1.3/24

The message displayed when no domains are set for an interface:

1
2
└▷ networksetup -getproxybypassdomains FireWire
There aren't any bypass domains set on FireWire.

Setting the list of bypass domains for an interface:

1
└▷ networksetup -setproxybypassdomains Ethernet '*.corp.net' '10.13.1.3/24' 'www.garylarizza.com'

Perfect – all of that is done with a single binary, and it’s pretty straightforward. Let’s look at the type I ended up creating for this problem:

lib/puppet/type/mac_proxy_bypassdomains.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Puppet::Type.newtype(:mac_proxy_bypassdomains) do
  desc "Puppet type that models bypass domains for a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
  end

  newproperty(:domains, :array_matching => :all) do
    desc "Domains which should bypass the proxy"
    def insync?(is)
      is.sort == should.sort
    end
  end
end

The type uses a namevar parameter called ‘name’, which is the name of the network interface. This means that we can set one list of bypass domains for every network interface. There’s a single property, ‘domains’ that accepts an array of domains that should bypass the proxy for the network interface. I’ve overridden the insync? method for the domains property to sort the array values on both ends – this means that the ORDER of the domains doesn’t matter, I only care that the domains specified exist on the system. Finally, the type is ensurable (which means that we can create a list of domains and remove/destroy the list of domains for a network interface).

Setup the provider

Okay, so we’ve defined the problem, seen how to interact with the system to get us the data that we need, setup a type to model the data, and now the last thing left to do is to wire up the provider to make the binary calls we need and return the data we want.

Typos are not your friend.

The first thing you will encounter is “Puppet’s predictable naming pattern” that is used by the Puppet autoloader. Typos are not fun, and omitting a single letter in either the filename or the provider name will render your provider (emotionally) unavailable to Puppet. Our type is called ‘mac_proxy_bypassdomains’, as types are generally named along the lines of ‘what does this data model?’ The provider name is generally the name of the underlying technology that’s doing the modeling. For the package type, the providers are named after the package management systems (e.g. yum, apt, pacman, zypper, pip), for the file type, the providers are loosely named for the operatingsystem kernel type on which files are to be created (e.g. windows, posix). In our example, I simply chose to name the provider ‘ruby’ because, as a Puppet Labs employee, I TOO suck at naming things.

Here’s a tree of my module to understand how the type and provider files are to be laid out:

Module tree
1
2
3
4
5
6
7
8
9
├── Modulefile
├── README.markdown
└── lib
    └── puppet
        ├── provider
        │   ├── mac_proxy_bypassdomains
        │   │   └── ruby.rb
        └── type
            └── mac_proxy_bypassdomains.rb

As you can see from above, the name of both the type and provider must EXACTLY match the filename of their corresponding files. Also, the provider file lives in a directory named after the type. There are MANY things that can be typoed here (filenames, foldernames, type/provider names in their files), so be absolutely sure that you’ve named your files correctly.

The reason for all this naming bullshit is because of the way Puppet syncs down plugin files (coincidentally, with a process known as Pluginsync). Everything in the lib directory in a Puppet module is going to get synced down to your nodes inside the vardir directory on the node itself. The vardir is a known library path to Puppet, and all files in the vardir are treated as if they had lived in Puppet’s source code (in the same relative paths). Because the Puppet source code has all type files in the lib/puppet/type directory, all CUSTOM types must go in the module’s lib/puppet/type directory for confirmity. This is repeated for EVERY custom Puppet/Facter plugin (including custom facts, custom functions, and etc…).

More scaffolding

Let’s layout the shell of our provider, first, to ensure that we haven’t typoed anything. Here’s the provider declaration:

lib/puppet/type/mac_proxy_bypassdomains/ruby.rb
1
2
3
Puppet::Type.type(:mac_proxy_bypassdomains).provide(:ruby) do
  # Provider work goes here
end

Note that the name of the type and the name of the provider are symbolized (i.e. they’re prepended with a colon). Like I mentioned above, they must be spelled EXACT or Puppet will complain very loudly. You may see variants on that declaration line because there are multiple ways in Ruby to extend a class object. The method I’ve listed above is the ‘generally accepted best-practice’, which is to say it’s the way we’re doing it this month.

Congrats! You have THE SHELL of a provider that has yet to do a single goddamn thing! Technically, you’re further than about 90% of other Puppet users at this point! Let’s go the additional 20% (since we’re basing this on a mangement metric of 110%) by wiring up the methods and making the damn thing work!

Are you (en)sure about this?

We’ve explained before that a type is ‘ensurable’ when you can check for its existance on a system, create it when it doesn’t exist (and it SHOULD exist), and destroy it when it does exist (and it SHOULDN’T exist). The bare minimum amount of methods necessary to make a type ensurable is three, and they’re called exists?, create, and destroy.

Method: exists?

The exists? method is a predicate method – that means it should either return the boolean true or false value based on whether the bypass domain list exists. Puppet will always call the exists? provider method to determine if that ‘thing’ (in this case, ‘thing’ means ‘a list of domains to bypass for a specific network interface’) exists before calling any other methods. How do we know if this thing exists? Like I showed before, you need to run the networksetup -getproxybypassdomains command and pass the interface name. If it returns ‘There aren’t any bypass domains set on (interface name)’, then the list doesn’t exist. Let’s do some binary execution…

Calling binaries from Puppet

Puppet provides some helper syntax around basic actions that most providers perform. MOST providers are going to need to call out to an external binary (e.g. yum, apt, etc…) at some point, and so Puppet allows you to create your own method JUST for a system binary. The commands method abstracts all the dirtyness of making a method for each system binary you want to call. The way you use the commands method is like so:

1
commands :networksetup => 'networksetup'

The commands method accepts a hash whose key must be a symbolized name. The CONVENTION is to use a symbolized name that matches the binary name, but it’s not REQUIRED to do so. The value for that symbolized key MUST be the binary name. Note that I’ve not passed a full path to the binary. Why? Well, Puppet will automatically do a path lookup for that binary and store its full path for use when the binary is invoked. We don’t REQUIRE you to pass the full path because sometimes the same binary exists in different locations for different operatingsystems. Instead of creating a provider for each OS you manage with Puppet, we abstract away the path stuff. You CAN still pass a full path as a value, but if you elect to do that an the binary doesn’t exist at that path, Puppet will disqualify the provider and you’ll be quite upset.

In the event that Puppet CANNOT find this binary, it will disqualify the entire provider, and you’ll get a message saying as much in the debug output of your Puppet run. Because of that, the commands method is a good way to confine your provider to a specific system or class of system.

When the commands method is successfully invoked, you will get a new provider method named after the SYMBOLIZED key, and not necessarily the binary name (unless you made them the same). After the above command is evaluated, Puppet will now have a networksetup() method in our provider. The argument to the networksetup method should be an array of arguments that are passed to the binary. It’s c-style, so each element is going to be individually quoted. You can run into issues here if you pass values containing quotes as part of your argument array. Read that again – quoting your values is totally acceptable (e.g. [‘foo’, ‘bar’]), but passing a value that contains quotes can potentially cause problems (e.g. [“‘foo’”, “‘bar’”]).

You’re probably thinking “Why the hell would I go through this trouble when I can use the %x{} syntax in ruby to execute a shell command?!” And to that I would say “Quit yelling at me” and also “Because: testing.” When you write spec tests for your provider (which will be covered in a later blog post, since it’s its OWN path of WTF), you’re going to need to mock out calls to the system during your tests (i.e. sometimes you may be running the tests on a system that doesn’t have the binary you’re meant to be calling in your provider. You don’t want the tests to fail due to the absence of a binary file). The %x{} construct in Ruby is hard to mock out, but a method of our provider is a relatively easy thing to mock out. Also – see the path problem above. We don’t STOP you from doing %x{} in your code (it will still totally work), but we give you a couple of good reasons to NOT do it.

Objects are a provider’s best friend

Within your provider, you’re going to be doing lots of system calls and data manipulation. Often we’re asked whether you do that ugliness inside the main methods (i.e. inside the exists? method directly), or if you create a helper method for some of this data manipulation. The answer I usually give is that you should probably create a helper method if:

  • The code is going to be called more than once
  • The code does something that would be tricky to test (like reading from a file)
  • Complexity would be reduced by creating a helper method

The act of getting a list of domains for a specific interface is definitely going to be utilized in more than one place in our provider (we’ll use it in the exists? method as well as in a ‘getter’ method for the domains property). Also, you could argue that it might be tricky to test since it’s going to be a binary call that’s going to return some data. Because of this, let’s create a helper method that returns a list of domains for a specific interface:

1
2
3
4
5
6
7
8
9
10
11
def get_proxy_bypass_domains(int)
  begin
    output = networksetup(['-getproxybypassdomains', int])
  rescue Puppet::ExecutionFailure => e
    Puppet.debug("#get_proxy_bypass_domains had an error -> #{e.inspect}")
    return nil
  end
  domains = output.split("\n").sort
  return nil if domains.first =~ /There aren\'t any bypass domains set/
  domains
end

Ruby convention is to use underscores (i.e. versus camelCase or hyphens) in method names. You want to give your methods very descriptive names based on what it is that they DO. In this case, get_proxy_bypass_domains seems adequately descriptive. Also, you should err on the side of readability when you’re writing code. You can get pretty creative with Ruby metaprogramming, but that can quickly become hard to follow (and then you’re just a dick). Finally, error-handling is a good thing. If you’re going to do any error-handling, though, be very specific about the errors you catch/rescue. When you have a rescue block, make sure you catch a specific exception class (in the case above, we’re catching a Puppet::ExecutionFailure – which means the binary is returning a non-zero exit code).

The code above will return an array containing all the domains, or it will return nil if domains aren’t found or the networksetup binary had an issue.

Using the helper method above, here’s what the final exists? method looks like:

1
2
3
def exists?
  get_proxy_bypass_domains(resource[:name]) != nil
end

All provider methods have the ability to access the ‘should’ values for the resource (and by that I mean the values that are set in the Puppet maniest on the Puppet master server, or locally if you’re using puppet apply). Those values reside in the resource method that responds with a hash. In the code above, resource[:name] will return the network interface name (e.g. Ethernet, FireWire, etc…) that was specified in the Puppet manifest. The exists method will return true of a list of domains exists for an interface, or it will return false if a list of domains does not exist (i.e. get_proxy_bypass_domains returns nil).

Method: create

The create method is called when exists? returns false and a resource has an ensure value set to present. Because of this, you don’t need to call the exists? method explicitly in create – it’s already been evaluated. Remember from above that the -setproxybypassdomains argument to the networksetup binary will set a domain list, so the create method is going to be very short-and-sweet:

1
2
3
def create
  networksetup(['-setproxybypassdomains', resource[:name], resource[:domains]])
end

In the end, the create method will call the networksetup binary with the -setproxybypassdomains argument, pass the interface name (from resource[:name]) and pass an array of domain values (which comes from resource[:domains]). That’s it; it’s done!

Method: destroy

The destroy method is easier than the create method:

1
2
3
def destroy
  networksetup(['-setproxybypassdomains', nil])
end

Here, we’re calling networksetup with the -setproxybypassdomains argument and passing nothing else. This will initialize the list and set it to be empty.

Synchronizing properties

Getter method: domains

At this point our type is ensurable, which means we can create and destroy resources. What we CAN’T do, however, is change the value of any properties that are out-of-sync. A property is out-of-sync when the value discovered by Puppet on the node differs from the value in the catalog (i.e. set by the Puppet manifest using the DSL on the Puppet master). Just like exists? is called to determine if a resource exists, Puppet needs a way to get the current value for a property on a node. The method that gets this value is called the ‘getter method’ for a property, and its name must match the name of the property. Because we have a property called domains, the provider must have a domains method that returns a value (in this case, an array of domains to be bypassed by the proxy). We’ve already written a helper method that does this work for us, so the domains getter method is pretty easy:

1
2
3
def domains
  get_proxy_bypass_domains(resource[:name])
end

Tada! Just call the helper method and pass the interface name. Boom – instant array of values. The getter method will return the ‘is’ value, because that’s what the value IS (currently on the node). Get it? Anyone? The IS value is the other side of the coin to the ‘should’ value (that comes from the Puppet manifest) because that’s what the value SHOULD be set on the node.

Setter method: domains=

If the getter method (e.g. domains) returns a value that doesn’t match the value in the catalog, then Puppet changes the value on the node and sets it to the value in the catalog. It does this by calling the ‘setter’ method for the property, which is the name of the property and the equals ( = ) sign. In this case, the setter method for the domains property must be called domains=. It looks like this:

1
2
3
def domains=(value)
  networksetup(['-setproxybypassdomains', resource[:name], value])
end

Setter methods are always passed a single argument – the ‘should’ value of the property. In our example, we’re calling the networksetup binary with the -setproxybypassdomains argument, passing the name of the interface, and then passing the ‘should’ value – or the array of domains. It’s easy, it’s one line, and I love it when a plan comes together

Putting the whole damn thing together

I’ve broken down the provider line by line, but here’s the entire file:

lib/puppet/provider/mac_proxy_bypassdomains/ruby.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Puppet::Type.type(:mac_proxy_bypassdomains).provide(:ruby) do
  commands :networksetup => 'networksetup'

  def get_proxy_bypass_domains(int)
    begin
      output = networksetup(['-getproxybypassdomains', int])
    rescue Puppet::ExecutionFailure => e
      Puppet.debug("#get_proxy_bypass_domains had an error -> #{e.inspect}")
      return nil
    end
    domains = output.split("\n").sort
    return nil if domains.first =~ /There aren\'t any bypass domains set/
    domains
  end

  def exists?
    get_proxy_bypass_domains(resource[:name]) != nil
  end

  def destroy
    networksetup(['-setproxybypassdomains', nil])
  end

  def create
    networksetup(['-setproxybypassdomains', resource[:name], resource[:domains]])
  end

  def domains
    get_proxy_bypass_domains(resource[:name])
  end

  def domains=(value)
    networksetup(['-setproxybypassdomains', resource[:name], value])
  end
end

Testing the type/provider

And that’s it, we’re done! The last thing to do is to test it out. You can test out your provider in one of two ways: the first is to add the module to the modulepath of your Puppet master and include it that way, or test it locally by setting the $RUBYLIB environmental variable to point to the lib directory of your module (which is the more preferred method since it won’t serve it out to all of your nodes without it being tested). Because this module is on my system at /users/glarizza/src/puppet-mac_proxy, here’s how my $RUBYLIB is set:

1
export RUBYLIB=/users/glarizza/src/puppet-mac_proxy/lib

Next, we need to create a resource declaration to try and set a couple of bypass domains. I’ll create a tests directory and simple test file in tests/mac_proxy_bypassdomains.pp:

tests/mac_proxy_bypassdomains.pp
1
2
3
4
mac_proxy_bypassdomains { 'Ethernet':
  ensure  => 'present',
  domains => ['www.garylarizza.com','*.puppetlabs.com','10.13.1.3/24'],
}

Finally, let’s run Puppet and test it out:

1
2
3
4
└▷ puppet apply ~/src/puppet-mac_proxy/tests/mac_proxy_bypassdomains.pp
Notice: Compiled catalog for satori.local in environment production in 0.06 seconds
Notice: /Stage[main]//Mac_proxy_bypassdomains[Ethernet]/domains: domains changed [] to 'www.garylarizza.com *.puppetlabs.com 10.13.1.3/24'
Notice: Finished catalog run in 3.47 seconds

NOTE: If you run this as a local user, you will be prompted by OS X to enter an administrative password for a change. Since Puppet will ultimately be run as root on OS X when we’re NOT testing out code, this shouldn’t be required during a normal Puppet run. To test this out (i.e. that you don’t always have to enter an admin password in a pop-up window), you’ll need to sudo -s to change to root, set the $RUBYLIB as the root user, and then run Puppet again.

And that’s it – looks like our code worked! To check and make sure it will notice a change, open System Preferences, then the Network pane, click on the Ethernet interface, then the Advanced button, then the Proxies tab, and finally note the ‘Bypass proxy settings…’ text box at the bottom of the screen (now do you see why we automate this shit?!). Make a change to the entries in there and run Puppet again – it should correct it for you

Wait…so that was it? Really? We’re done?

Yeah, that was a whole type and provider. Granted, it has only one property and it’s not too complicated, but that’s the point. We’ve still got some latent bugs (the network interface passed must be capitalized exactly like OS X expects it, we could do some better error handling, etc…), and the type doesn’t work with puppet resource (yet), but we’ll handle all of these things in the next blog post (or two…or three).

Until then, take this time to crack open a type and a provider for something that’s been pissing you off and FIX it! Better yet, push it up to Github, tweet about it, and post it up on The Forge so the rest of the community can use it!

Like always, feel free to comment, tweet me (@glarizza), email me (gary AT puppetlabs DOT com), or use the social media platform of choice to get a hold of me (Snapchats may or may not get a response. Maybe.) Cheers!

Fun With Puppet Providers - Part 1 of Whatever

I don’t know why I write blog posts – everybody in open-source software knows that the code IS the documentation. If you’ve ever tried to write a Puppet type/provider, you know this fact better than ANYONE. To this day, when someone asks me for the definitive source on this activity I usually refer them first to Nan Liu and Dan Bode’s awesome Types and Providers book (which REALLY is a fair bit of quality information), and THEN to the source code for Puppet. Everything else falls in-between those sources (sadly).

As someone who truly came from knowing absolute fuckall about Ruby and only marginally more than that about Puppet, I’ve walked through the valley of the shadow of self.instances and have survived to tell the tale. That’s what this post is about – hopefully some GOOD information if you want to start writing your own Puppet type and provider. I also wrote this because this knowledge has been passed down from Puppet employee to Puppet employee, and I wanted to break the priesthood being held on type and provider magic. If you don’t hear from me after tomorrow, well, then you know what happened…

Because 20 execs in a defined type…

What would drive someone to write a custom type and provider for Puppet anyhow? Afterall, you can do ANYTHING IMAGINABLE in the Puppet DSL*! After drawing back my sarcasm a bit, let me explain where the Puppet DSL tends to fall over and the idea of a custom type and provider starts becoming more than just an incredibly vivid dream:

  • You have more than a couple of exec statements in a single class/defined type that have multiple conditional properties like ‘onlyif’ and/or ‘unless’.
  • You need to use pure Ruby to manipulate data and parse it through a system binary
  • Your defined type has more conditional logic than your pre-nuptual agreement
  • Any combination of similar arguments related to the above

If the above sounds familiar to you, then you’re probably ready to build your own custom Puppet type and provider. Do note that custom types and providers are written in Ruby and not the Puppet DSL. This can initially feel very scary, but get over it (there are much scarier things coming).

* Just because you can doesn’t mean you don’t, in fact, suck.

I’m not your Type

This blog post is going to focus on types and type-interaction, while later posts will focus on providers and ultimately dirty provider tricks to win friends and influence others. Type and provider interaction can be totally daunting for newcomers, let ALONE just naming files correctly due to Puppet’s predictable (note: anytime I write the word “predictable”, just substitute the phrase “annoying pain in the ass”) naming pattern. Let’s break it down a bit for you – somebody que Dre…

(NOTE: I’m going to ASSUME you understand the fundamentals of a Puppet run already. If you’re pretty hazy on that concept, checkout docs.puppetlabs.com for more information)

Types are concerned about your looks

The type file defines all the properties and parameters that can be used by your new custom resource. Think of the type file like the opening stanza to a new Puppet class – we’re describing all the tweakable knobs and buttons to the new thing we’re creating. The type file also gives you some added validation abilities, which is very handy.

It’s important to understand that there is a BIG difference between a ‘property’ and a ‘parameter’ with regard to a type (even though they’re both assigned values identically in a resource declaration). Think of it this way: a property is something that can be inspected and changed by Puppet, while a parameter is just helper data that Puppet uses to do its job. A property would be something like a file’s mode. You can inspect a file and determine its mode, and you can even CHANGE a file’s mode on disk. The file resource type also has a parameter called ‘backup’. Its sole job is to tell Puppet whether to backup the file to the filebucket before making changes. This data is useful for Puppet during a run, but you can’t inspect a file on disk and know definitively whether Puppet is going to back it up or not (and it goes without saying that if you can’t determine this aspect about a file on disk just by inspecting it, than you also can’t CHANGE this aspect about a file on disk either). You’ll see later where the property/parameter distinction becomes very important.

Recently I built a type modeling the setting of proxy data for network interfaces on OS X, so we’ll use that as a demonstration of a type. It looks like the following:

lib/puppet/type/mac_web_proxy.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Puppet::Type.newtype(:mac_web_proxy) do
  desc "Puppet type that models a network interface on OS X"

  ensurable

  newparam(:name, :namevar => true) do
    desc "Interface name - currently must be 'friendly' name (e.g. Ethernet)"
    munge do |value|
      value.downcase
    end
    def insync?(is)
      is.downcase == should.downcase
    end
  end

  newproperty(:proxy_server) do
    desc "Proxy Server setting for the interface"
  end

  newparam(:authenticated_username) do
    desc "Username for proxy authentication"
  end

  newparam(:authenticated_password) do
    desc "Password for proxy authentication"
  end

  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end
end

First note the type file’s path in the grey titlebar of the graphic: lib/puppet/type/mac_web_proxy.rb This path is relative to the module that you’re building, and it’s VERY important that it be named EXACTLY this way to appease Puppet’s predictable naming pattern. The name of the file directly correllates to the name of the type listed in the Puppet::Type.newtype() method.

Next, let’s look at a sample parameter declaration – for starters, let’s look at the ‘authenticated_password’ parameter declaration on line 24 of the above type. The newparam() method is called and the lone argument passed is the symbolized name of our parameter (i.e. it’s prepended with a colon). This parameter provides the password to use when setting up an authenticated web proxy on OS X. It’s a parameter because as far as I know, there’s no way for me to query the system for this password (it’s obfuscated in the GUI and I’m not entirely certain where it’s stored on-disk). If there were a way for us to query this value from the system, then we could turn it into a property (since we could both ‘GET’ as well as ‘SET’ the value). As of right now, it exists as helper data for when I need to setup an authenticated proxy.

Having seen a parameter, let’s look at the ‘proxy_server’ property that’s declared on line 16 of the type file above. We’re able to both query the system for this value, as well as change/set the value by using the networksetup binary, so it’s able to be ‘synchronized’ (according to Puppet). Because of this, it must be a property.

Just enough validation

The second major function of the type file is to provide methods to validate property and parameter data that is being passed. There are two methods to validate this data, and one method that allows you to massage the data into an acceptable format (which is called ‘munging’).

validate()

The first method, named ‘validate’, is widely believed to be the only successfully-named method in the entire Puppet codebase. Validate accepts a block and allows you to perform free-form validation in any way you prefer. For example:

lib/puppet/type/user.rb
1
2
3
validate do |value|
  raise ArgumentError, "Passwords cannot include ':'" if value.is_a?(String) and value.include?(":")
end

This example, pulled straight from the Puppet codebase, will raise an error if a password contains a colon. In this case, we’re looking for a specific exception and are raising errors accordingly.

newvalues()

The second method, named ‘newvalues’, accepts a regex that property/parameter values need to match (if you’re one of the 8 people in the world that speak regex fluently), or a list of acceptable values. From the example above:

lib/puppet/type/mac_web_proxy.rb
1
2
3
4
5
6
7
8
9
  newproperty(:proxy_authenticated) do
    desc "Proxy Server setting for the interface"
    newvalues(:true, :false)
  end

  newproperty(:proxy_port) do
    desc "Proxy Server setting for the interface"
    newvalues(/^\d+$/)
  end

munge()

The final method, named ‘munge’ accepts a block like newvalues but allows you to convert an unacceptable value into an acceptable value. Again, this is from the example above:

lib/puppet/type/mac_web_proxy.rb
1
2
3
munge do |value|
  value.downcase
end

In this case, we want to ensure that the parameter value is lower case. It’s not necessary to throw an error, but rather it’s acceptable to ‘munge’ the value to something that is more acceptable without alerting the user.

Important type considerations

You could write half a book just on how types work (and, again, check out the book referenced above which DOES just that), but there are a couple of final considerations that will prove helpful when developing your type.

Defaulting values

The defaultto method provides a default value should the user not provide one for your property/parameter. It’s a pretty simple construct, but it’s important to remember when you write spec tests for your type (which you ARE doing, right?) that there will ALWAYS be values for properties/parameters that utilize defaultto. Here’s a quick example:

Defaultto example
1
2
3
4
newparam(:enable_lacp) do
  defaultto :true
  newvalues(:true, :false)
end

Ensurable types

A resource is considered ‘ensurable’ when its presence can be verified (i.e. it exists on the system), it can be created when it doesn’t exist and it SHOULD, and it can be destroyed when it exists and it SHOULDN’T. The simplest way to tell Puppet that a resource type is ensurable is to call the ensurable method within the body of the type (i.e. outside of any property/parameter declarations). Doing this will automatically create an ‘ensure’ property that accepts values of ‘absent’ and ‘present’ that are automatically wired to the ‘exists?’, ‘create’ and ‘destroy’ methods of the provider (something I’ll write about in the next post). Optionally, you can choose to pass a block to the ensurable method and define acceptable property values as well as the methods of the provider that are to be called. That would look something like this:

lib/puppet/type/package.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ensurable do
  newvalue(:present) do
    provider.install
  end

  newvalue(:absent) do
    provider.uninstall
  end

  newvalue(:purged) do
    provider.purge
  end

  newvalue(:held) do
    provider.hold
  end
end

This means that instead of calling the create method to create a new resource that SHOULD exist (but doesn’t), Puppet is going to call the install method. Conversely, it will call the uninstall method to destroy a resource based on this type. The ensure property will also accept values of ‘purged’ and ‘held’ which will be wired up to the purge and hold methods respectively.

Namevars are unique little snowflakes

Puppet has a concept known as the ‘namevar’ for a resource. If you’re hazy about the concept check out the documentation, but basically it’s the parameter that describes the form of uniqueness for a resource type on the system. For the package resource type, the ‘name’ parameter is the namevar because the way you tell one package from another is its name. For the file resource, it’s the ‘path’ parameter, because you can differentiate unique files from each other according to their path (and not necessarily their filename, since filenames don’t have to be unique on systems).

When designing a type, it’s important to consider WHICH parameter will be the namevar (i.e. how can you tell unique resources from one another). To make a parameter the namevar, you simply set the :namevar attribute to :true like below:

1
2
3
newparam(:name, :namevar => :true) do
  # Type declaration attributes here...
end

Handling array values

Nearly every property/parameter value that is declared for a resource is ‘stringified’, or cast to a string. Sometimes, however, it’s necessary to accept an array of elements as the value for a property/parameter. To do this, you have to explicitly tell Puppet that you’ll be passing an array by setting the :array_matching attribute to :all (if you don’t set this attribute, it defaults to :first, which means that if you pass an array as a value for a property/parameter, Puppet will only accept the FIRST element in that array).

1
2
3
newproperty(:domains, :array_matching => :all) do
  # Type declaration attributes here... 
end

If you set :array_matching to :all, EVERY value passed for that parameter/property will be cast to an array (which means if you pass a value of ‘foo’, you’ll get an array with a single element – the string of ‘foo’).

Documenting your property/parameter

It’s a best-practice to document the purpose of your property or parameter declaration, and this can be done by passing a string to the desc method within the body of the property/parameter declaration.

1
2
3
4
newproperty(:domains, :array_matching => :all) do
  desc "Domains which should bypass the proxy"
# Type declaration attributes here...
end

Synchronization tricks

Puppet uses a method called insync? to determine whether a property value is synchronized (i.e. if Puppet needs to change its value, or it’s set appropriately). You usually have no need to change the behavior of this method since most of the properties you create for a type will have string values (and the == operator does a good job of checking string equality). For structured data types like arrays and hashes, however, that can be a bit trickier. Arrays, for example, are ordered construct – they have a definitive idea of what the first element and the last element of the array are. Sometimes you WANT to ensure that values are in a very specific order, and sometimes you don’t necessarily care about the ORDER that values for a property are set – you just want to make sure that all of them are set.

If the latter cases sounds like what you need, then you’ll need to override the behavior of the insync? method. Take a look at the below example:

1
2
3
4
5
6
newproperty(:domains, :array_matching => :all) do
  desc "Domains which should bypass the proxy"
  def insync?(is)
    is.sort == should.sort
  end
end

In this case, I’ve overridden the insync? method to first sort the ‘is’ value (or, the value that was discovered by Puppet on the target node) and compare it with the sorted ‘should’ value (or, the value that was specified in the Puppet manifest when the catalog was compiled by the Puppet master). You can do WHATEVER you want in here as long as insync? returns either a true or a false value. If insync? returns true, then Puppet determines that everything is in sync and no changes are necessary, whereas if it returns false then Puppet will trigger a change.

And this was the EASY part!

Wow this went longer than I expected… and types are usually the ‘easier’ bit since you’re only describing the format to be used by the Puppet admin in manifests. There are some hacky type tricks that I’ve not yet covered (i.e. features, ‘inheritance’, and other meta-bullshit), but those will be saved for a final ‘dirty tips and tricks’ post. In the next section, I’ll touch on providers (which is where all interaction with the system takes place), so stay tuned for more brain-dumping-goodness…

From the Archive: Using Crankd

Supporting laptops in a managed environment is tricky (and doubly so if you allow them to be taken off your corporate network). While you can be reasonably assured that your desktops will remain on and connected during the workday, it’s not uncommon for laptops to go to sleep, change wireless access points, and even change between an Ethernet or AirPort connection several times during the day. It’s important to have a tool that can “tweak” certain settings in response to these changes.

This is where crankd comes in.

Crankd is a cool utility that’s part of the Pymacadmin (http://code.google.com/p/pymacadmin/) suite of tools co-authored by Chris Adams and Nigel Kersten. Specifically crankd is a Python daemon that lets you trigger shell scripts, or execute Python methods, based upon state changes in SystemConfiguration, NSWorkspace and FSEvents.

Use Cases

It’s easier to see how crankd can help you with a couple of scenarios:

  1. Your laptops, like all of the other machines in your organization, are bound to your corporate LDAP servers. When they’re on network, they will query the LDAP servers for things like authentication information. Unless your corporate LDAP directory is accessible outside your corporate network, your laptops may exhibit the “spinning wheel of death” when they attempt to contact a suddenly-unreachable LDAP directory at the neighborhood Starbucks. A solution to this is to remove the LDAP servers from your Search (and Contacts) path whenever the laptop is taken off-network and add the LDAP servers when you come back on-network.

  2. Perhaps you’re using Puppet, Munki, Chef, StarDeploy, Filewave, Absolute Manage, Casper, or any other configuration management system that needs to contact a centralized server for configuration information. Usually these tools will have your machine contact their servers once an hour or so, but this can be a problem if the machine is constantly sleeping and waking. Plus if you take your machine off-network, you don’t want it trying to contact a server that might not be reachable from the outside world. It would be nice to have your laptop “phone home” when it establishes a network connection on your corporate network, and skip this step when the laptop is taken outside your organization.

  3. OS X allows you to set a preferred order for your network connections, but it would be nice to disable the AirPort when your laptop establishes an Ethernet connection.

  4. Finally, maybe you have the need to perform an action whenever your laptop sleeps (or wakes), changes a network connection, mounts a volume, or runs a specific Application (whether it’s located in the Applications directory or anywhere else on your machine).

All of these situations can be made trivial through the help of crankd.

How do I get it working?

Crankd is a daemon, so it’s running in the background while you work. It uses an XML plist file that tells it which scripts (or which Python methods) to execute in response to specific state changes (like a network connection going up or down or a volume being mounted). Since it’s a small Python library, the files aren’t huge and the entire finished installation is around 100 Kb (or larger with your custom code/scripts). Lets download crankd and experiment with its settings:

  1. Download the Pymacadmin source. You can do this through Google Code or Github – I’ll demonstrate the Github method. Navigate to http://github.com/acdha/pymacadmin, click the Downloads button, and download either the .tar.gz or the .zip version of the source code. Drag it to your desktop and then double-click on the file to expand it. It should open a folder named “acdha-pymacadmin-

  2. Install crankd Upon opening the pymacadmin folder, you should see a series of folders, readme files, and an “install-crankd.sh” installation script. Let’s open Terminal.app and navigate to the pymacadmin folder that we expanded on our desktop (you can type “cd” into Terminal.app and then drag and drop the folder into the Terminal window. Hit the Return button on your keyboard to change to the directory.). The install-crankd.sh script is executable, so run it by typing “sudo ./install-crankd.sh” into the Terminal window and hitting Return. Enter your password when it prompts you

  3. Setup a plist file for crankd If you’ve never worked with crankd before, it’s best to let it setup a configuration plist for you. If you don’t specify a configuration plist with the “—config” argument, or you don’t have a com.googlecode.pymacadmin.crankd.plist file in your /Users//Library/Preferences folder, crankd will automatically create a sample plist for you. Let’s do that by typing “/usr/local/sbin/crankd.py” into Terminal and hitting the Return button. Take a look at the sample configuration plist file:

<?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE plist PUBLIC “–//Apple Computer//DTD PLIST 1.0//EN” “http://www.apple.com/DTDs/PropertyList-1.0.dtd”>

<key>NSWorkspace</key>
<dict>
  <key>NSWorkspaceDidMountNotification</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "A new volume was mounted!"</string>
  </dict>
  <key>NSWorkspaceDidWakeNotification</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "The system woke from sleep!"</string>
  </dict>
  <key>NSWorkspaceWillSleepNotification</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "The system is about to go to sleep!"</string>
  </dict>
</dict>
<key>SystemConfiguration</key>
<dict>
  <key>State:/Network/Global/IPv4</key>
  <dict>
    <key>command</key>
    <string>/bin/echo "Global IPv4 config changed"</string>
  </dict>
</dict>

This XML file has two main keys – one for NSWorkspace events (such as mounted volumes and sleeping/waking your laptop), and one for SystemConfiguration events (such as network state changes). Followed by a key for the specific event that we’re monitoring, a key specifying whether we’ll be executing a command or a Python method in response to this event, and a string (or an array of strings, as we’ll see later) specifying the actual command that’s to be executed. For all of the events in the sample plist, we’re going to be echoing a message to the console.

  1. Start crankd Once crankd has been installed and your configuration plist file is setup, you’re ready to let crankd monitor for state changes. Let’s start crankd with the sample plist that was created in the previous step by executing the following command in Terminal “/usr/local/sbin/crankd.py —config=/Users//Library/Preferences/com.googlecode.pymacadmin.crankd.plist” Remember to substitute your username for in that command (if you don’t know your username, you can type “whoami” into Terminal and hit the Return button). If everything was executed correctly, you should see the following lines displayed in Terminal:

Module directory /Users//Library/Application Support/crankd does not exist: Python handlers will need to use absolute pathnames INFO: Loading configuration from /Users//Library/Preferences/com.googlecode.pymacadmin.crankd.plist INFO: Listening for these NSWorkspace notifications: NSWorkspaceWillSleepNotification, NSWorkspaceDidWakeNotification, NSWorkspaceDidMountNotification INFO: Listening for these SystemConfiguration events: State:/Network/Global/IPv4

It might look like Terminal isn’t doing anything, but in all actuality crankd is listening for changes. You can make crankd come to life by either connecting to (or disconnecting from) an AirPort network, sleeping/waking your machine, or mounting a volume (by inserting a USB memory stick, for example). Performing any of these actions will cause crankd to echo messages to your Terminal window. Here’s the message I received when I disconnected from an AirPort network:

INFO: SystemConfiguration: State:/Network/Global/IPv4: executing /bin/echo “Global IPv4 config changed” Global IPv4 config changed

To quit this sample configuration of crankd, simply hold down the control button on your keyboard and press the C key. Congratulations, crankd is now up and running!

A more complex example

Let’s look at one of our previous situations

Puppet + Github = Laptop <3

Everybody wants to be special (I blame our moms). The price of special, when it comes to IT, is time. Consider how long you’ve spent on just your damn PROMPT and you’ll realize why automation gives any good sysadmin a case of the giggles. Like the cobbler’s kids, though, your laptop and development environment are always the last to be visited by the automation gnomes.

Until now.

Will Farrington gave a great talk at Puppetconf 2012 about managing an army of developer laptops using Puppet + some Github love that left more than a couple of people asking for his code. That request has unfortunately been denied.

Until now.

Boxen, and hand-tune no more

Enter Boxen (née ‘The Setup’), a full-fledged open source project from the guys at Github that melds Puppet with your Github credentials to create a framework for automating everything from applications, to dotfiles, and even printers and emacs extensions (that last bit’s a lie – no one should be using emacs).

How does it work? Think ‘Masterless Puppet’ (or, just a bunch of Puppet modules that are enforced by running puppet apply on your local machine). Boxen not only includes the framework ITSELF, but is a project on Github that hosts over 75 individual modules for managing things like rbenv, homebrew, git, mysql, postgres, riak, redis, npm, erlang, dropbox, skype, minecraft, heroku, 1password, iterm2, and much more. Odds are there’s a module for many of the common things you setup on your laptop. And what about things like dotfiles that are intrinsically personal? You can create your own repository and manage them like you would any other file/directory/symlink on the file system. The goal is to model every little piece of your laptop that makes it ‘unique’ from everyone else until you have your entire environment managed and the hardware becomes…well…disposable. How many times have you shied away from doing an upgrade because some component of your laptop required you to spend countless hours tinkering with it? If you’ve done the time, you should do something to make sure that you NEVER have to repeat that process manually ever again.

Boxen is also very self-contained. Packages and binaries that come out of Homebrew are installed into /opt/boxen/homebrew/bin, frameworks like Ruby and Node are installed into /opt/boxen/{rbenv,nvm}, and individual versions of those frameworks are kept separate from your system version of those frameworks. These details are important when you consider that you could purge the whole setup without having to rip out components scattered around your system.

You may be reading this and thinking “There’s no way in hell I can use this to manage every laptop in my organization!”, and you’re right. The POINT of Boxen is that it’s a tool written by developers for developers to automate THEIR systems. The goal of developing is to have as little friction between the process of writing code and deploying that code into production. A tool like Boxen allows you to more quickly GET to the state where your laptop is ready for you to start developing. If you want a tool to completely manage and lock down all the laptops on your system, look to using Puppet in agent/master mode or to a tool like Munki to manage all packages on your system. If you’re interested in giving your developers/users the freedom to manage their OWN ‘boxes’ because they know best what works for them, then Boxen is your tool.

There IS one catch – it’s targeted to OS X (10.8 to be exact).

Diary of an elitist

I was fortunate to have early-access to Boxen in order to kick its Ruby tyres. As someone who’s managed Macs with Puppet before (all the way down to the desktop/laptop level), I was embarrassed to admit that I had NOTHING about my laptop automated. Will unlocked the project and basically said “Have fun, break shit, fix it, and file pull requests” and away I went. To commit completely to the project, I did what any sane person would do.

I reformatted my laptop and started entirely from scratch.

(Let’s be clear here – you don’t have to do that. Initially Will reported problems getting Boxen running in VMs, but I never ran into an issue. I ran Boxen in VMware Fusion 5 a number of times to make sure the changes I made were going to do the right thing on a fresh install. I’d recommend going down THAT road if you’re hesitant of immediately throwing this on your pretty snowflake of a laptop.)

Installing Boxen was pretty easy – the only prerequisite was downloading the XCode Command-Line Tools (which included git), pulling down the Boxen repo, and running script/boxen. It was stupid simple. What you GOT, by default, was:

  • Homebrew
  • Git
  • Hub
  • DNSMasq w/ .dev resolver for localhost
  • NVM
  • RBenv
  • Full Disk Encryption requirement
  • NodeJS 0.4
  • NodeJS 0.6
  • NodeJS 0.8
  • Ruby 1.8.7
  • Ruby 1.9.2
  • Ruby 1.9.3
  • Ack
  • Findutils
  • GNU-Tar

Remember, this is all tunable and you don’t need to pull down ALL of these packages, but, since it was new, I decided to install everything and sort it out later. Yes, the initial setup took a good number of minutes, but think about everything that’s being installed. In the end, I had a full Ruby development environment with rbenv, multiple versions of Ruby, and a laptop that could be customized without much work at all.

Which end do I blow in?

The readme on the project page for Boxen describes how to clone the project into /opt/boxen/repo, so that’s the directory we’ll be working with. To see what will be enforced on your machine, check out manifests/site.pp to see something that looks like this:

manifests/site.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
require boxen::environment
require homebrew::repo

Exec {
  group       => 'staff',
  logoutput   => on_failure,
  user        => $luser,

  path => [
    "${boxen::config::home}/rbenv/shims",
    "${boxen::config::home}/homebrew/bin",
    '/usr/bin',
    '/bin',
    '/usr/sbin',
    '/sbin'
  ],

  environment => [
    "HOMEBREW_CACHE=${homebrew::cachedir}",
    "HOME=/Users/${::luser}"
  ]
}

File {
  group => 'staff',
  owner => $luser
}

Package {
  provider => homebrew,
  require  => Class['homebrew']
}

Repository {
  provider => git,
  extra    => [
    '--recurse-submodules'
  ],
  require  => Class['git']
}

Service {
  provider => ghlaunchd
}

This is largely scaffolding setting up the Boxen environment and resource defaults. If you’re familiar with Puppet, this should be recognizable to you, but for everyone else, let’s dissect one of the resource defaults:

1
2
3
4
File {
  group => 'staff',
  owner => $luser
}

This block basically means that any file you declare with Puppet should default to having its owner set as your username and its group set to ‘staff’ (which is standard in OS X). You can override this explicitly with a file declaration by providing the owner or group attribute, but if you omit it then it’s going to default to these values.

The rest of the defaults are customized for Boxen’s preferences (i.e. homebrew will be used to install all packages unless you specify otherwise, exec resources will log all output on failure, service resources will use githubs’s customized service provider, and etc…). Now let’s look below:

manifests/site.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
node default {
  # core modules, needed for most things
  include dnsmasq
  include git
  include hub
  include nginx
  include nvm
  include ruby

  # fail if FDE is not enabled
  if $::root_encrypted == false {
    fail('Please enable full disk encryption and try again')
  }

  # node versions
  include nodejs::0-4
  include nodejs::0-6
  include nodejs::0-8

  # default ruby versions
  include ruby::1-8-7
  include ruby::1-9-2
  include ruby::1-9-3

  # common, useful packages
  package {
    [
      'ack',
      'findutils',
      'gnu-tar'
    ]:
  }

  file { "${boxen::config::srcdir}/our-boxen":
    ensure => link,
    target => $boxen::config::repodir
  }
}

These are the things that Boxen has chosen to enforce ‘out of the box’. Knowing that Boxen was designed so that developers could customize their ‘boxes’ THEMSELVES, it makes sense that there’s not much that’s being enforced on everyone. In fact, the most significant thing being ‘thrust’ upon you is the fact that the machine must have full disk encryption enabled (which is a good idea anyways).

If you want to pare down what Boxen gives you by default, you can choose to comment out lines providing, for example, nvm and nodejs versions (if you don’t use node.js in your environment). I’m a Ruby developer, so all the Ruby builds (and rbenv) are very helpful to me, but you could also remove those if you were so inclined. The point is that this file contains the ‘knobs’ to dial your base Boxen setup up or down.

Customizing (or, my dotfiles are better than yours)

The whole point of Boxen is to customize your laptop and keep its customization automated. To do this, we’re going to need to make some Puppet class files.

CAUTION: PUPPET AHEAD

If you’ve not had experience with Puppet before, I can’t recommend the learning Puppet series enough. In the vein of “Puppet now, learn later”, I’m going to give you Puppet code that works for ME and only explain the trickier bits.

Boxen has some ‘magic’ code that’s going to automatically look for a class called people::<github username>, and so I’m going to create a file in modules/people/manifests called glarizza.pp. This file will contain Puppet code specific to MY laptop(s). Here’s a snippit of that file:

modules/people/manifests/glarizza.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class people::glarizza {

  notify { 'class people::glarizza declared': }

  # Changes the default shell to the zsh version we get from Homebrew
  # Uses the osx_chsh type out of boxen/puppet-osx
  osx_chsh { $::luser:
    shell   => '/opt/boxen/homebrew/bin/zsh',
    require => Package['zsh'],
  }

  file_line { 'add zsh to /etc/shells':
    path    => '/etc/shells',
    line    => "${boxen::config::homebrewdir}/bin/zsh",
    require => Package['zsh'],
  }

  ##################################
  ## Facter, Puppet, and Envpuppet##
  ##################################

  repository { "${::boxen_srcdir}/puppet":
    source => 'puppetlabs/puppet',
  }

  repository { "${::boxen_srcdir}/facter":
    source => 'puppetlabs/facter',
  }

  file { '/bin/envpuppet':
    ensure  => link,
    mode    => '0755',
    target  => "${::boxen_srcdir}/puppet/ext/envpuppet",
    require => Repository["${::boxen_srcdir}/puppet"],
  }
}

The notify resource is only to prove that when you run Boxen that this class is being declared – it only displays a message to the console when you run the boxen binary.

The osx_chsh resource is a custom defined type that Github has created to ensure a line shows up in /etc/shells as an acceptable shell. Because Boxen installs zsh from homebrew into /opt/boxen/homebrew, we need to ensure that /etc/shells is correct. Note the syntax of $boxen::config::homebrewdir which refers to a variable called $homebrewdir in the boxen::config class.

Next, I’ve setup a couple of resources to make sure the puppet and facter repositories are installed on my system. Github has also developed a lightweight repository resource that will simply ensure that a repo is cloned at a location on disk. $::boxen_srcdir is one of the custom Facter facts that Boxen provides in shared/boxen/lib/facter/boxen.rb in the Boxen repository.

The file resource sets up a symlink from /bin/envpuppet to /Users/glarizza/src/puppet/ext/envpuppet on my system. The attributes should be pretty self-explanatory, but the newest attribute of require says that the repository resource must come BEFORE this file resource is declared. This is a demonstration of Puppet’s ordering metaparameters that are described in the Learning Puppet series.

Since we briefly touched on $::boxen_srcdir, what are some other custom facts that come out of shared/boxen/lib/facter/boxen.rb?

shared/boxen/lib/facter/boxen.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
require "json"
require "boxen/config"

config   = Boxen::Config.load
facts    = {}
factsdir = File.join config.homedir, "config", "facts"

facts["github_login"]   = config.login
facts["github_email"]   = config.email
facts["github_name"]    = config.name

facts["boxen_home"]     = config.homedir
facts["boxen_srcdir"]   = config.srcdir

if config.respond_to? :reponame
  facts["boxen_reponame"] = config.reponame
end

facts["luser"]          = config.user

Dir["#{config.homedir}/config/facts/*.json"].each do |file|
  facts.merge! JSON.parse File.read file
end

facts.each { |k, v| Facter.add(k) { setcode { v } } }

This file will also give you $::luser, which will evaluate out to your system username, and $::github_name, which is equivalent to your Github username (note that this is what Boxen uses to find your class file in modules/people/manifests). If you’re looking for all the other values set by these custom facts, check out config/boxen/defaults.json after you run Boxen.

Using modules out of the Boxen namespace

Not only is Boxen its own project, but it’s a separate organization on Github that hosts a number of Puppet modules. Some of these modules are pretty simple (a single resource to install a package), but the point is that they’ve been provided FOR you – so use, fork, and improve them (but most of all, submit pull requests). The way you use them with Boxen may not be readily clear, so let’s walk through that with a simple module for installing Google Chrome.

  1. Add the module to your Puppetfile
  2. Classify the module in your Puppet setup
  3. Run Boxen

Add the module to your Puppetfile

Boxen uses a tool called librarian-puppet to source and install Puppet modules from Github. Librarian-puppet uses the Puppetfile file in the root of the Boxen repo to install modules. Let’s look at a couple of lines in that file:

Puppetfile
1
2
3
4
5
6
7
8
9
10
11
12
mod "boxen",    "0.1.8",  :github_tarball => "boxen/puppet-boxen"
mod "dnsmasq",  "0.0.1",  :github_tarball => "boxen/puppet-dnsmasq"
mod "git",      "0.0.3",  :github_tarball => "boxen/puppet-git"
mod "hub",      "0.0.1",  :github_tarball => "boxen/puppet-hub"
mod "homebrew", "0.0.17", :github_tarball => "boxen/puppet-homebrew"
mod "inifile",  "0.0.1",  :github_tarball => "boxen/puppet-inifile"
mod "nginx",    "0.0.2",  :github_tarball => "boxen/puppet-nginx"
mod "nodejs",   "0.0.2",  :github_tarball => "boxen/puppet-nodejs"
mod "nvm",      "0.0.5",  :github_tarball => "boxen/puppet-nvm"
mod "ruby",     "0.4.0",  :github_tarball => "boxen/puppet-ruby"
mod "stdlib",   "3.0.0",  :github_tarball => "puppetlabs/puppetlabs-stdlib"
mod "sudo",     "0.0.1",  :github_tarball => "boxen/puppet-sudo"

This evaluates out to the following syntax:

1
mod, <module name>, <version or tag>, <source>

The HARDEST thing about this file is finding the version number of modules on Github (HINT: it’s a tag). Once you’re given that information, it’s easy to pull up a module on Github, look at its tags, and then fill out the file. Let’s do that with a line for the Chrome module:

1
mod "chrome",     "0.0.2",   :github_tarball => "boxen/puppet-chrome"

Classify the module in your Puppet setup

In the previous section, we created modules/people/manifests/<github username>.pp. We COULD continue to fill this file with a ton of resources, but I tend to like to separate out resources into separate subclasses. Puppet has module naming conventions to ensure that it can FIND your subclasses, so I recommend browsing that guide before randomly naming files (HINT: Filenames ARE important and DO matter here). I want to create a people::glarizza::applications subclass, so I need to do the following:

1
2
3
4
## YES, make sure to replace YOUR USERNAME for 'glarizza'
$ cd /opt/boxen/repo
$ mkdir -p modules/people/manifests/glarizza
$ vim modules/people/manifests/glarizza/applications.pp

It’s totally fine that there’s a glarizza directory aside the glarizza.pp file – this is intentional and desired. Puppet’s not going to automatically declare anything in the people::glarizza::applications class until we TELL it to, so let’s open modules/people/manifests/glarizza.pp and add the following line at the top:

modules/people/manifests/glarizza.pp
1
include people::glarizza::applications

That tells Puppet to find the people::glarizza::applications class and make sure it ‘does’ everything in that file. Now, let’s create the people::glarizza::applications class:

modules/people/manifests/glarizza/applications.rb
1
2
3
class people::glarizza::applications {
  include chrome
}

Yep, all it takes is one line to include the module we will get from Boxen. Because of the way Boxen works, it will consult the Puppetfile FIRST, pull down any modules that are in the Puppetfile but NOT on the system, drop them into place so Puppet can find them, and then run Puppet normally.

Run Boxen

Once you have Boxen setup, you can just run boxen from the command line to have it enforce your configuration. By default, if there are any errors, it will log them as Github Issues on your fork of the main Boxen repository (this can be disabled with boxen --no-issue). As you’re just getting started, don’t worry about the errors. The good news is that once you fix things and perform a successful Boxen run, it will automatically close all open issues. If everything went well, you should now have Google Chrome in your /Applications directory!

¡Más Puppet!

You’ll find as you start customizing all the things that you’re usually managing one of the following resources:

  1. Packages
  2. Files
  3. Repositories
  4. Plist files

We’ve covered managing a repository and a file, but let’s look at a couple of the other popular resources:

Packages are annoying

I would be willing to bet that most of the things you end up managing will be packages. Using Puppet with Boxen, you have the ability to install four different kinds of packages:

  1. Applications inside a DMG
  2. Installer packages inside a DMG
  3. Homebrew Packages
  4. Applications inside a .zip file

Here’s an example of every type of package installer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  # Application in a DMG
  package { 'Gephi':
    ensure   => installed,
    source   => 'https://launchpadlibrarian.net/98903476/gephi-0.8.1-beta.dmg',
    provider => appdmg,
  }

  # Installer in a DMG
  package { 'Virtualbox':
    ensure => installed,
    source => 'http://download.virtualbox.org/virtualbox/4.1.22/VirtualBox-4.1.23-80870-OSX.dmg',
    provider => pkgdmg,
  }

  # Homebrew Package
  package { 'tmux':
    ensure => installed,
  }

  # Application in a .zip
  package { 'Github for Mac':
    ensure   => installed,
    source   => 'https://github-central.s3.amazonaws.com/mac%2FGitHub%20for%20Mac%2069.zip',
    provider => compressed_app
  }

Notice that the only thing that changes among these resources is the provider attribute. Remember from before that Boxen sets the default package provider to be ‘homebrew’, so for ‘tmux’ I omitted the provider attribute to utilize the default. Also, the ensure attribute is defaulted to ‘installed’, so technically I could remove it…but I tend to prefer to use it for people who will be reading my code later.

There’s no provider for .pkg files. Why? Well, packages on OS X are either bundles or flat-packages. Bundles LOOK like individual files, but they’re actually folders that contain everything necessary to expand and install the package. Flat packages are just that – an actual file that ends in .pkg that can be expanded to install whatever you want. Bundle packages are pretty common, but they’re also hard for curl to download them (being that it’s just a folder full of files) – this is why most installer packages you encounter on OS X are going to be enclosed in a .dmg Disk Image.

So which provider will you use? Well, if your file ends in .dmg then you’re going to be using either the pkgdmg or appdmg provider. How do you know which to use? Expand the .dmg file and look inside it. If it contains an application ending in .app that simply needs dragged into the /Applications folder on disk, then chose the appdmg provider (that’s essentially all it does – expand the .dmg file and ditto the .app file into /Applications). If the disk image contains a .pkg package installer, then you’ll chose the pkgdmg provider (which expands the .dmg file and uses installer to install the contents of the .pkg file silently in the background). If your file is a .zip file containing an Application (.app file), then you can use Github’s custom compressed_app provider that will unzip the file and ditto the app into /Applications. Finally, if you want to install a package from Homebrew, then the homebrew provider is pretty self-explanatory here.

(NOTE: There is ONE more package provider I haven’t covered here – the macports provider. It requires Macports to be installed on your system, and will use it to install a package. Macports vs. Homebrew arguments notwithstanding, if you’re into Macports then there’s a provider for you.)

Plists: because why NOT XML :\

Apple falls somewhere between “the registry” and “config files” on the timeline of tweaking system settings. Most settings are locked up in plist files that can be managed by hand or with plistbuddy or defaults. A couple of people have saved their customizations in with their dotfiles (Zach Holman has an example here), but Puppet is a great way for managing individual keys in your plist files. I’ve written a module that will manage any number of keys in a plist file. You can modify your Puppetfile to make sure Boxen picks up my module by adding the following line:

1
mod "property_list_key",  "0.1.0",   :github_tarball => "glarizza/puppet-property_list_key"

Next, you’ll need to add resources to your classes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
  # Disable Gatekeeper so you can install any package you want
  property_list_key { 'Disable Gatekeeper':
    ensure => present,
    path   => '/var/db/SystemPolicy-prefs.plist',
    key    => 'enabled',
    value  => 'no',
  }

  $my_homedir = "/Users/${::luser}"

  # NOTE: Dock prefs only take effect when you restart the dock
  property_list_key { 'Hide the dock':
    ensure     => present,
    path       => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    key        => 'autohide',
    value      => true,
    value_type => 'boolean',
    notify     => Exec['Restart the Dock'],
  }

  property_list_key { 'Align the Dock Left':
    ensure     => present,
    path       => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    key        => 'orientation',
    value      => 'left',
    notify     => Exec['Restart the Dock'],
  }

  property_list_key { 'Lower Right Hotcorner - Screen Saver':
    ensure     => present,
    path       => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    key        => 'wvous-br-corner',
    value      => 10,
    value_type => 'integer',
    notify     => Exec['Restart the Dock'],
  }

  property_list_key { 'Lower Right Hotcorner - Screen Saver - modifier':
    ensure     => present,
    path       => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    key        => 'wvous-br-modifier',
    value      => 0,
    value_type => 'integer',
    notify     => Exec['Restart the Dock'],
  }

  exec { 'Restart the Dock':
    command     => '/usr/bin/killall -HUP Dock',
    refreshonly => true,
  }

  file { 'Dock Plist':
    ensure  => file,
    require => [
                 Property_list_key['Lower Right Hotcorner - Screen Saver - modifier'],
                 Property_list_key['Hide the dock'],
                 Property_list_key['Align the Dock Left'],
                 Property_list_key['Lower Right Hotcorner - Screen Saver'],
                 Property_list_key['Lower Right Hotcorner - Screen Saver - modifier'],
               ],
    path    => "${my_homedir}/Library/Preferences/com.apple.dock.plist",
    mode    => '0600',
    notify     => Exec['Restart the Dock'],
  }

The important attributes are:

  1. path: The path to the plist file on disk
  2. key: The individual KEY in the plist file you want to manage
  3. value: The value that the key should have in the plist file
  4. value_type: The datatype the value should be (defaults to string, but could also be array, hash, boolean, or integer)

You MUST pass a path, key, and value or Puppet will throw an error.

The first resource above sets Gatekeeper in 10.8 and allows you to install packages from the web that HAVEN’T been signed (in 10.8, Apple won’t allow you to install unsigned packages or anything outside of the App Store without enabling this setting).

All of the other resources relate to making changes to the Dock. Because of the way the Dock is managed, you must HUP its process when making changes to your dock plist before they take effect. Also, the dock plist has to be owned by you or else the changes won’t take effect. Every dock plist resource has a notify metaparameter which means “any time this resource changes, run this exec resource”. That exec resource is a simple command that HUPs the dock process. It will ONLY be run if a resource notifies it – so if no changes are made in a Puppet run then the command won’t fire. Finally, the file resource to manage the dock plist ensures that permissions are set (and notifies the exec in case it needs to CHANGE permissions).

Again, this is purely dealing with Puppet – but plists are a major part of OS X and you’ll be dealing with them regularly!

But seriously, dotfiles

I know I’ve joked about it a couple of times, but getting your dotfiles into their correct location is a quick win. The secret is to lock them all up in a repository, and then symlink them where you need them. Let’s look at that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
  # My dotfile repository
  repository { "${my_sourcedir}/dotfiles":
    source => 'glarizza/dotfiles',
  }

  file { "${my_homedir}/.tmux.conf":
    ensure  => link,
    mode    => '0644',
    target  => "${my_sourcedir}/dotfiles/tmux.conf",
    require => Repository["${my_sourcedir}/dotfiles"],
  }

  file { "/Users/${my_username}/.zshrc":
    ensure  => link,
    mode    => '0644',
    target  => "${my_sourcedir}/dotfiles/zshrc",
    require => Repository["${my_sourcedir}/dotfiles"],
  }

  file { "/Users/${my_username}/.vimrc":
    ensure => link,
    mode   => '0644',
    target => "${my_sourcedir}/dotfiles/vimrc",
    require => Repository["${my_sourcedir}/dotfiles"],
  }

  # Yes, oh-my-zsh. Judge me.
  file { "/Users/${my_username}/.oh-my-zsh":
    ensure  => link,
    target  => "${my_sourcedir}/oh-my-zsh",
    require => Repository["${my_sourcedir}/oh-my-zsh"],
  }

It’s worth mentioning that Puppet does not do things procedurally. Just because the dotfiles repository is listed before every symlink DOES NOT mean that Puppet will evaluate and declare it first. You’ll need to specify order here, and that’s what the require metaparameter does.

Based on what I’ve already shown you, this code block should be very simple to follow. Because I’m using symlinks, the dotfiles should always be current. Because the dotfiles are under revision control, updating them all is as simple as making commits and updating your repository. If you’ve ever had to migrate these files to a new VM/machine, then you know how full of win this block of code is.

Don’t sweat petty (or pet sweaty)

When I show sysadmins/developers automation like this, they usually want to apply it to the HARDEST part of their day-job IMMEDIATELY. That’s a somewhat rational reaction, but it’s not going to give you the results you want. The cool thing ABOUT Boxen and Puppet is that it’s going to remove those little annoyances in your day that slowly sap your time. START by tackling those small annoyances to remove them and build your confidence (like the dotfiles example above). Yeah, you’ll only save a couple of minutes a day, but it grows exponentially. Also, when you solve a problem during the course of your day, MANAGE it with Boxen by putting it in your Puppet class (then, test it out on a VM or another machine to make sure it does what you expect).

Don’t worry that you’re not saving the world with a massive Puppet class – sometimes the secret to happiness is opening iTerm on a new machine and seeing your finely-crafted prompt shining in its cornflower-blue glory.

Now show me some cool stuff

So that’s a quick tour of the basics of Boxen and the kinds of things you can do from the start. I’m really excited for everyone to get their hands on Boxen and do more cool stuff with Puppet. I’ve done a bunch of work with Puppet for OS X, and that’s enough to know that there’s still PLENTY that can be improved in the codebase. A giant THANK YOU to John Barnette, Will Farrington, and the entire Github Boxen team for all their work on this tool (and letting me tinker with it before it hit the general public)! Feel free to comment below, email me (gary at puppetlabs), or yell at me on Twitter for more information!

Repeatable Puppet Development With Vagrant

I miss testing code in production. In smaller organizations, ‘testing’ and ‘development’ can sometimes consist of making changes directly on a server, walking to an active machine, and hoping things work. Once you were done, you MIGHT document what changes you made, but more often than not you kept that information in your head and referred to it later.

I lied – that is everything that sucks about manual configuration of machines.

The best way to get out of this rut is to get addicted to automating first the menial tasks on your machines, and then work your way up from there. We STILL have the problem, though, of doing this in production – that’s what this post is meant to address.

What we want is the ability to spin up a couple of test nodes for the purpose of testing our automation workflow BEFORE it gets committed and applied to our production nodes. This post details using Vagrant and Puppet to both establish a clean test environment and also test automation changes BEFORE applying them to your production environment.

Puppet is a Configuration Management tool that automates all the annoying aspects of manual configuration out of your infrastructure. The bulk of its usage is beyond the scope of THIS post, however we’re going to be using it as the means to describe the changes we want to make on our systems.

Vagrant is a magical project that uses minimal VM templates (boxes) to spin up clean virtualized environments on your workstation for the purpose of testing changes. Currently, it only supports a Virtualbox backend, but its creator, Mitchell Hashimoto, has teased a preview of upcoming VMware integration that SHOULD be coming any day now. In this post, Vagrant will be the means by which we spin up new VMs for development purposes

Getting setup

The only moving piece you need installed on your system is Vagrant. Fortunately, Mitchell provides native package installers on his website for downloading Vagrant. If you’ve never used Vagrant before, and you AREN’T a Ruby developer who maintains multiple Ruby versions on your system, then you’ll want to opt for the native package installer since it’s the easiest method to get Vagrant installed (and, on Macs, Vagrant embeds its own Ruby AND Rubygems binaries in the Package bundle…which is kind of cool).

IF, however, you are developing in Ruby and you use RVM or Rbenv to maintain multiple copies of Ruby on your system, then you’ll want to favor installing Vagrant via Rubygems a la:

1
$ gem install vagrant --no-ri --no-rdoc

If you have no idea how to use RVM or Rbenv – stick with the native installers :)

Puppet does NOT need to be on your workstation since we’re only going to be using it on the VMs that Vagrant spins up – so don’t worry about Puppet yet.

My kingdom for a box

Vagrant uses box files as templates from which to spin up a new virtual machine for development purposes. There are sites that host boxes available for download, OR, you could use an awesome project called Veewee to build your own. Again, building your box file is outside the scope of this article, so just make sure you download a box with an OS that’s to your liking. This box DOES NOT need to have Puppet preinstalled – in fact, it’s probably better that it doesn’t (because the version will probably be old, and we’re going to work around this anyways). I’m going to choose a CentOS 6.3 box that the SE team at Puppet Labs uses for demos, but, again, it’s up to you.

Vagrantfile, assemble!

Now that we’ve got the pieces we need, let’s start stitching together a repeatable workflow. To do that, we’ll need to create a directory for this project and a Vagrantfile to direct Vagrant on how it should setup your VM. I’m going to use ~/src/vagrant_projects for the purpose of this demo:

1
2
3
$ mkdir -p ~/src/vagrant_projects
$ cd ~/src/vagrant_projects
$ vim Vagrantfile

Let’s take a look at a sample Vagrantfile that I use to get Puppet installed on a box:

Vagrantfile
1
2
3
4
5
6
7
8
Vagrant::Config.run do |config|
  config.vm.box       = "centos-6.3-x86_64"
  config.vm.box_url   = "https://saleseng.s3.amazonaws.com/boxfiles/CentOS-6.3-x86_64-minimal.box"
  config.vm.host_name = "development.puppetlabs.vm"
  config.vm.network :hostonly, "192.168.33.10"
  config.vm.forward_port 80, 8084
  config.vm.provision :shell, :path => "centos_6_x.sh"
end

Stepping through this file line-by-line, the first two config.vm lines establish the box we want to use for our development VM as well as the URL to the box file where it can be downloaded (in the event that it does not exist on our system). Because, initially, this box will NOT be known to Vagrant, it will attempt to reach out to that address and download it (note that the URL to THIS PARTICULAR BOX is subject to change – please find a box file that works for you and substitute its URL in the config.vm.box_url config setting). The next three lines define the machine’s hostname, the network type, and the IP address for this VM. In this case, I’m using a host-only network and giving it an IP address on a made-up 192.168.33.0/24 subnet (feel free to use your own private IP range as long as it doesn’t conflict with anything). The next line is forwarding port 80 on the VM to port 8084 on my local laptop – this allows you to test out web services by simply navigating to http://localhost:8084 from your web browser. I’ll save explaining the last line for the next section.

NOTE: For more documentation on these settings, visit Vagrant’s documentation site as it’s quite good

Getting Puppet on your VM

The final line in the sample Vagrantfile runs what’s called the ‘Shell Provisioner’ for Vagrant. Essentially, it runs a shell script on the VM once it’s been booted and configured. What does this shell script do?

centos_6_x.shlink
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/env bash
# This bootstraps Puppet on CentOS 6.x
# It has been tested on CentOS 6.3 64bit

set -e

REPO_URL="http://yum.puppetlabs.com/el/6/products/i386/puppetlabs-release-6-6.noarch.rpm"

if [ "$EUID" -ne "0" ]; then
  echo "This script must be run as root." >&2
  exit 1
fi

if which puppet > /dev/null 2>&1; then
  echo "Puppet is already installed"
  exit 0
fi

# Install puppet labs repo
echo "Configuring PuppetLabs repo..."
repo_path=$(mktemp)
wget --output-document=${repo_path} ${REPO_URL} 2>/dev/null
rpm -i ${repo_path} >/dev/null

# Install Puppet...
echo "Installing puppet"
yum install -y puppet > /dev/null

echo "Puppet installed!"

As you can see, it sets up the Puppet Labs el6 repository containing the current packages for Puppet/Facter/Hiera/PuppetDB/etc and installs the most recent version of Puppet and Facter that are in the repository. This will ensure that you have the most recent version of Puppet on your VM, and you don’t need to worry about creating a new box every time Puppet releases a new version.

This code came from Mitchell’s puppet-bootstrap repo where he maintains a list of scripts that will bootstrap Puppet onto many of the common operating systems out there. This code was current as of the initial posting date of this blog, but make sure to check that repo for any updates. If you’re maintaining your OWN provisioning script, consider filing pull requests against Mitchell’s repo so we can ALL benefit from good code and don’t have to keep creating ‘another wheel’ just to provision Puppet on VMs!

Spin up your VM

Once you’ve created a Vagrantfile in a directory, the next logical thing to do is to test out Vagrant and fire up your VM. Let’s first check the status of the vm:

1
2
3
4
5
6
7
8
$ vagrant status

Current VM states:

default                  not created

The environment has not yet been created. Run `vagrant up` to
create the environment.

As expected, this VM has yet to be created, so let’s do that by doing a vagrant up

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
$ vagrant up

[default] Box centos-6.3-x86_64 was not found. Fetching box from specified
URL...
[vagrant] Downloading with Vagrant::Downloaders::HTTP...
[vagrant] Downloading box:
https://saleseng.s3.amazonaws.com/boxfiles/CentOS-6.3-x86_64-minimal.box
[vagrant] Extracting box...
[vagrant] Verifying box...
[vagrant] Cleaning up downloaded box...
[default] Importing base box 'centos-6.3-x86_64'...
[default] The guest additions on this VM do not match the install version of
VirtualBox! This may cause things such as forwarded ports, shared
folders, and more to not work properly. If any of those things fail on
this machine, please update the guest additions and repackage the
box.

Guest Additions Version: 4.1.18
VirtualBox Version: 4.1.23
[default] Matching MAC address for NAT networking...
[default] Clearing any previously set forwarded ports...
[default] Forwarding ports...
[default] -- 22 => 2222 (adapter 1)
[default] -- 80 => 8084 (adapter 1)
[default] Creating shared folders metadata...
[default] Clearing any previously set network interfaces...
[default] Preparing network interfaces based on configuration...
[default] Booting VM...
[default] Waiting for VM to boot. This can take a few minutes.
[default] VM booted and ready for use!
[default] Configuring and enabling network interfaces...
[default] Setting host name...
[default] Mounting shared folders...
[default] -- v-root: /vagrant
[default] Running provisioner: Vagrant::Provisioners::Shell...
Configuring PuppetLabs repo...
warning: 
/tmp/tmp.FvW0K7FJWU: Header V4 RSA/SHA1 Signature, key ID 4bd6ec30: NOKEY
Installing puppet
warning: 
rpmts_HdrFromFdno: Header V4 RSA/SHA1 Signature, key ID 4bd6ec30: NOKEY
Importing GPG key 0x4BD6EC30:
 Userid : Puppet Labs Release Key (Puppet Labs Release Key) <info@puppetlabs.com>
 Package: puppetlabs-release-6-6.noarch (installed)
 From   : /etc/pki/rpm-gpg/RPM-GPG-KEY-puppetlabs
Warning: RPMDB altered outside of yum.
Puppet installed!

Vagrant first noticed that we did not have the CentOS box on our machine, so it downloaded, extracted, and verified the box before importing it and creating our custom VM. Next, it configured the VM’s network settings according to our Vagrantfile, and finally it provisioned the box using the script we passed in the Vagrantfile.

We’ve now got a VM running and Puppet is installed. Let’s ssh to our VM and check the Puppet Version:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ vagrant ssh

Last login: Tue Jul 10 22:56:01 2012 from 10.0.2.2
[vagrant@development ~]$ puppet --version
3.0.2
[vagrant@development ~]$ hostname
development.puppetlabs.vm
[vagrant@development ~]$ exit
logout
Connection to 127.0.0.1 closed.

$ vagrant destroy -f
[default] Forcing shutdown of VM...
[default] Destroying VM and associated drives...

Cool – so we demonstrated that we could ssh into the VM, check the Puppet version, check the hostname to ensure that Vagrant had set it correctly, exit out, and then we finally destroyed the VM with vagrant destroy -f. The next step is to actually configure Puppet to DO something with this VM…

Using Puppet to setup your node

The act of GETTING a clean VM is all well and good (and is probably magic enough for most people out there), but the purpose of this post is to demonstrate a workflow for testing out Puppet code changes. In the previous step we showed how to get Puppet installed, but we’ve yet to demonstrate how to use Vagrant’s built-in Puppet provisioner to configure your VM. Let’s use the example of a developer wanting to spin up a LAMP stack. To manually configure that would require installing a number of packages, editing a number of config files, and then making sure services were installed (among other things). We’re going to use some of the Puppet modules from the Puppet Forge to tackle these tasks and make Vagrant automatically configure our VM.

Scaffolding Puppet

We need a way to pass our Puppet code to the VM Vagrant creates. Fortunately, Vagrant has a way to define Shared Folders that can be shared from your workstation and mounted on your VM at a particular mount point. Let’s modify our Vagrantfile to account for this shared folder:

Vagrantfile
1
2
3
4
5
6
7
8
9
10
11
Vagrant::Config.run do |config|
  config.vm.box       = "centos-6.3-x86_64"
  config.vm.box_url   = "https://saleseng.s3.amazonaws.com/boxfiles/CentOS-6.3-x86_64-minimal.box"
  config.vm.host_name = "development.puppetlabs.vm"
  config.vm.network :hostonly, "192.168.33.10"
  config.vm.forward_port 80, 8084
  config.vm.provision :shell, :path => "centos_6_x.sh"

  # Puppet Shared Folder
  config.vm.share_folder "puppet_mount", "/puppet", "puppet"
end

The syntax for the config.vm.share_folder line is that the first argument is a logical name for the shared folder mapping, the second argument is the path IN THE VM where this folder will be mounted (so, a folder called ‘puppet’ in the root of the filesystem), and the last argument is the path to the folder ON YOUR WORKSTATION that will be mounted in the VM (it can be a full or relative path – which is what we’ve done here). This folder hasn’t been created yet, so let’s create it (and a couple of subfolders):

1
2
$ cd ~/src/vagrant_projects
$ mkdir -p puppet/{manifests,modules}

This command will create the puppet directory in the same directory that contains our Vagrantfile, and then two subdirectories, manifests and modules, that will be used by the Puppet provisioner later. Now that we’ve told Vagrant to create our shared folder, and we’ve created the folder structure, let’s bring up the VM with vagrant up again, ssh into the VM with vagrant ssh, and then check to see that the folder has been mounted.

1
2
3
4
5
6
7
8
9
$ vagrant up

<output suppressed - see above for example output>

$ vagrant ssh

Last login: Tue Jul 10 22:56:01 2012 from 10.0.2.2
[vagrant@development ~]$ ls /puppet
manifests  modules

Great! We’ve setup a shared folder. To further test it out, you can try dropping a file in the puppet directory or one of its subdirectories – it should immediately show up on the VM without having to recreate the VM (because it’s a shared folder). There are pros and cons with this workflow – the main pro is that changes you make on your workstation will immediately be reflected in the VM, and the main con is that you can’t symlink folders INSIDE the shared folder on your workstation because of the nature of symlinks.

Installing the necessary Puppet Modules

Since we’ve already spun up a new VM and ssh’d into it, let’s use our VM to download modules we’re going to need to setup our LAMP stack:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[vagrant@development ~]$ puppet module install puppetlabs/apache --target-dir /puppet/modules/
Notice: Preparing to install into /puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/puppet/modules
└─┬ puppetlabs-apache (v0.5.0-rc1)
  ├── puppetlabs-firewall (v0.0.4)
  └── puppetlabs-stdlib (v3.2.0)

[vagrant@development ~]$ puppet module install puppetlabs/mysql --target-dir /puppet/modules/
Notice: Preparing to install into /puppet/modules ...
Notice: Downloading from https://forge.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
/puppet/modules
└── puppetlabs-mysql (v0.6.1)

[vagrant@development ~]$ ls /puppet/modules/
apache  concat  firewall  mysql  stdlib

The puppet binary has a module subcommand that will connect to the Puppet Forge to download Puppet modules and their dependencies. The commands we used will install Puppet Labs’ apache and mysql modules (and their dependencies). We’re also passing the --target-dir argument that will tell the puppet module subcommand to install the module into our shared directory (instead of Puppet’s default module path).

I’m choosing to use puppet module to install these modules, but there are a multitude of other methods you can use (from downloading the modules directly out of Github to using a tool like librarian-puppet). The point is that we need to ultimately get the modules into the modules directory in our shared puppet folder – however you want to do that works for me :)

Once the modules are in puppet/modules, we’re good. You only ever need to do this step ONCE. Because this folder is a shared folder, you can now vagrant up and vagrant destroy to your heart’s content – Vagrant will not remove the content in our shared folder when a VM is destroyed. Remember, too, that any changes made to those modules from either the VM or on your Workstation will be IMMEDIATELY available to both.

Since we’re now done with the VM for now, let’s destroy it with vagrant destroy

1
$ vagrant destroy

Classifying your development VM

The modules we installed are a framework that we will use to configure the node. The act of directing the actions that Puppet should take on a particular node is called ‘Classification’. Puppet uses a file called site.pp to map Puppet code with the corresponding ‘node’ (or, in our case, our VM) that should receive it. Let’s create a site.pp file and open it for editing:

1
2
$ cd ~/src/vagrant_projects
$ vim puppet/manifests/site.pp

Let’s create a site.pp that will setup the LAMP stack on our development.puppetlabs.vm that we create with Vagrant:

~/src/vagrant_projects/manifests/site.pp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
node 'development.puppetlabs.vm' {
  # Configure mysql
  class { 'mysql::server':
    config_hash => { 'root_password' => '8ZcJZFHsvo7fINZcAvi0' }
  }
  include mysql::php

  # Configure apache
  include apache
  include apache::mod::php
  apache::vhost { $::fqdn:
    port    => '80',
    docroot => '/var/www/test',
    require => File['/var/www/test'],
  }

  # Configure Docroot and index.html
  file { ['/var/www', '/var/www/test']:
    ensure => directory
  }

  file { '/var/www/test/index.php':
    ensure  => file,
    content => '<?php echo \'<p>Hello World</p>\'; ?> ',
  }

  # Realize the Firewall Rule
  Firewall <||>
}

Again, the point of this post is not about writing Puppet code but more about testing the Puppet code you write. The above node declaration will setup MySQL with a root password of ‘puppet’, setup Apache and a VHost for development.puppetlabs.vm with a docroot out of /var/www/test, setup an index.php file for Apache, and setup a Firewall rule to allow access through to port 80 on our VM.

Setting up the Puppet provisioner for Vagrant

We’re going to have to modify our Vagrantfile one more time to tell Vagrant to use the Puppet provisioner to execute our Puppet code and setup our VM:

Vagrantfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Vagrant::Config.run do |config|
  config.vm.box       = "centos-6.3-x86_64"
  config.vm.box_url   = "https://saleseng.s3.amazonaws.com/boxfiles/CentOS-6.3-x86_64-minimal.box"
  config.vm.host_name = "development.puppetlabs.vm"
  config.vm.network :hostonly, "192.168.33.10"
  config.vm.forward_port 80, 8084
  config.vm.provision :shell, :path => "centos_6_x.sh"

  # Puppet Shared Folder
  config.vm.share_folder "puppet_mount", "/puppet", "puppet"

  # Puppet Provisioner setup
  config.vm.provision :puppet do |puppet|
    puppet.manifests_path = "puppet/manifests"
    puppet.module_path    = "puppet/modules"
    puppet.manifest_file  = "site.pp"
  end
end

Notice the block for the Puppet provisioner that sets up the manifest path (i.e. where to find site.pp), the module path (i.e. where to find our Puppet modules), and the name of our manifest file (i.e. site.pp). Again, this is all documented on the Vagrant documentation page should you need to use it for reference.

This bumps the number of provisioners in our Vagrantfile to two, but which one goes first? Vagrant will iterate through the Vagrantfile procedurally, so the Shell provisioner will always get checked first and then the Puppet provisioner will get checked second. This allows us to be certain that Puppet will always be installed before attempting to use the Puppet provisioner. You could continue to add as many provisioning blocks as you like – Vagrant will iterate through them procedurally as it encounters them.

Give the entire workflow a try

Now that we have our Vagrantfile finalized, our Puppet directory structure setup, our Puppet modules installed, and our site.pp file set to classify our new VM, let’s actually let Vagrant do what it does best and setup our VM:

1
$ vagrant up

You should see Vagrant use the Shell provisioner to install Puppet, hand off to the Puppet provisioner, and then use Puppet to setup a LAMP stack on our VM. After everything completes, try visiting http://localhost:8084 in your web browser and see if you get a shiny “Hello World” staring back at you. If you do – Awesome! If you don’t, check the error messages to determine if there are typos in the Puppet code or if something went wrong in the Vagrantfile.

Where do you take it from here?

The first thing to do is to take the Vagrantfile you’ve created and put it under revision control so you can track the changes you make. I personally have a couple of workflows up on Github that I use as templates when I’m testing out something new. You’ll probably find that your Vagrantfile won’t change much – just the modules you use for testing.

Now that you understand the pattern, you can expand it to fit your workflow. Single-vm projects are great when you’re testing a specific component, but the next logical step is to test out multi-tiered components/applications. In these instances, Vagrant has the ability to spin up multiple VMs from a single Vagrantfile. That workflow saves a TON of time and lets you create your own private network of VMs for the purpose of simulating changes. That’s a post for another time, though…

Get involved

Stay tuned to the Vagrant website for updates on the VMware provisioner. Stability with Virtualbox has notoriously been an issue, but, as of this posting, things have been relatively rock-solid for me (using Virtualbox version 4.1.23 on OS X).

If you want to keep up-to-date on all things Vagrant, follow Mitchell on Twitter, check out #vagrant on Freenode, join the Vagrant list, and check out Google for what other folks have done!

A GIANT thank you to Mitchell Hashimoto for all the work he’s done on Vagrant – I can’t count the number of hours it’s saved me personally (let ALONE everyone at Puppet Labs!

Using Veewee to Build OS X VMs

I hate Tim Sutton in the same way I hate Steven Singer. I hate Tim because he actually IMPLEMENTS some of the things on my list of ‘shit to tinker with when you get some free time’ instead of just TALKING about them. He and Pepijn Bruienne have been working on some code for Veewee that will allow you to automate the creation of OS X VMs in VMware Fusion. For those who are interested, you can check out the pull request containing this code and comment/help them out. For everyone else, read on and entertain the idea…

Prerequisites:

  1. OS X
  2. VMware Fusion
  3. Git
  4. Ruby 1.9.3 and rbenv
  5. Mountain Lion (10.8) installer application

OS X

This walkthrough assumes that you’re running OS X on your development workstation. That’s what I use, and it’s the workflow I know.

VMware Fusion

If you’ve used Veewee before, odds are good that you MIGHT have used it to build baseboxes for Vagrant and/or Virtualbox. Veewee + Vagrant is a post for ANOTHER time, but in short it’s an awesome workflow for testing automation like Puppet on Linux or Windows. When I originally tried using Veewee to build OS X VMs, I had absentmindedly tried to do this using Virtualbox…which isn’t supported. As such, VMware Fusion is the only virtualization platform (as of this posting date) that is supported with this method. I’m using VMware Fusion 5 pro, but YMMV.

Git

Because the code that supports OS X only exists in Tim’s fork of Veewee (as of this posting date), we’re going to have to install/use Veewee from source. That introduces a bit more complexity, but hold my hand – we’ll get through this together. Git comes with the XCode Command Line Tools or can be installed with native packages.

Ruby 1.9.3 and rbenv

Veewee uses the gssapi gem which requires Ruby 1.9.1 or higher. The problem, though, is that the version of Ruby that comes with OS X is 1.8.7. There are typically two camps when it comes to getting a development version of Ruby on OS X: rbenv or RVM. I recommend rbenv because it doesn’t screw with your path and it’s a bit more lightweight, so that’s the path I’m going to take in this writeup. Octopress has instructions for getting rbenv on your machine – so make sure to check those out for this step. The instructions describe using rbenv to install Ruby version 1.9.3 – and that’s the version we’ll use here.

Mountain Lion installer application

This workflow supports creating a VM for Mountain Lion (10.8), but it SHOULD also work for Lion (10.7). The final piece of our whole puzzle is the installer application from the App Store. Get that application somehow and drop it in the /Applications directory (that’s where the App store puts it by default). We REALLY only need a single disk image from the installer, and we’ll get that next.

Some assembly required…

Now that you have all the pieces, let’s tie this FrankenVM up with some code and ugly bash, shall we? Splendid.

Copy out the installation disk image

In the last step, we downloaded the Install OS X Mountain Lion.app file from the App Store to /Applications, but we’ll want to extract the main installation disk image somewhere where we can work with it. I’m going to make a copy on the Desktop so we don’t screw up the main installer:

1
$ cp /Applications/Install\ OS\ X\ Mountain\ Lion.app/Contents/SharedSupport/InstallESD.dmg ~/Desktop

Beautiful. This should take a minute as it’s a sizeable file, but in the end you’ll have the installation disk image on your Desktop. For now this is fine, but we’ll be revisiting it later…

Clone Tim’s fork of Veewee

As if the writing of this blog post, the code you need is ONLY in Tim’s fork, so let’s pull that down to somewhere where we can work with it:

1
2
3
4
5
6
## I prefer to work with code out of a 'src' directory in my home directory
$ mkdir ~/src
$ cd ~/src

$ git clone http://github.com/timsutton/veewee
$ cd ~/src/veewee

Install Gems for veewee

We now have the Veewee source in ~/src/veewee, but we need to ensure all the Rubygems necessary to make Veewee work have been installed. We’re going to do this with Bundler. Let’s switch to Ruby 1.9.3 and get Bundler installed:

1
2
3
$ cd ~/src/veewee
$ rbenv local 1.9.3
$ gem install bundler

Next, let’s use bundler to install the rest of the gems we need to use Veewee:

1
$ bundle install

Once that command completes, Bundler will have installed all the necessary gems for Veewee and we can move on.

Define your new VM

Veewee has templates for most operatingsystems that can be used to spin up a ‘vanilla’ VM. Tim’s code provides a template called ‘OSX-10.8.2’ containing the necessary scaffolding for building a vanilla 10.8.2 VM. Let’s create a new VM project based on this template called ‘osx-vm’ with the following:

1
2
$ cd ~/src/veewee
$ bundle exec veewee fusion define 'osx-vm' 'OSX-10.8.2'

This will create definitions/osx-vm inside the Veewee directory with the template code from templates/OSX-10.8.2. We’re almost ready to let Veewee create our VM, but we need an installation ‘ISO’ first…

Prepare an ‘ISO’ for OS X

The prepare_veewee_iso.sh script in Veewee’s templates/OSX-10.8.2/prepare_veewee_iso directory provides awesome detail as to why we can’t use the vanilla InstallESD.dmg file to install 10.8 in our new VM. Feel free to open that file and read through the detail, or check out the details online for more information. Let’s use that script and prepare an installation ‘ISO’ for our new VM:

1
2
3
$ cd ~/src/veewee
$ mkdir iso
$ sudo templates/OSX-10.8.2/prepare_veewee_iso/prepare_veewee_iso.sh ~/Desktop/InstallESD.dmg iso/

You’ll need to be root to do this, but the script should handle everything necessary to prepare the ISO and drop it into the iso directory we created.

Define any additional post-installation tasks

Veewee supports post-installation tasks through the postinstall.sh script in the definitions/osx-vm folder. By default, this script will install VMware tools, setup Vagrant keys, download the XCode Command Line Tools, and install Puppet via Rubygems. Because this is all outlined in the postinstall.sh script, you’re free to modify this code or add your own steps. Here’s the current postinstall.sh script as of this posting:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
date > /etc/vagrant_box_build_time
OSX_VERS=$(sw_vers -productVersion | awk -F "." '{print $2}')
# Install VMware tools if we were built with VMware
if [ -e .vmfusion_version ]; then
  TMPMOUNT=`/usr/bin/mktemp -d /tmp/vmware-tools.XXXX`
  hdiutil attach darwin.iso -mountpoint "$TMPMOUNT"
  installer -pkg "$TMPMOUNT/Install VMware Tools.app/Contents/Resources/VMware Tools.pkg" -target /
  # This usually fails
  hdiutil detach "$TMPMOUNT"
  rm -rf "$TMPMOUNT"
fi

# Installing vagrant keys
mkdir /Users/vagrant/.ssh
chmod 700 /Users/vagrant/.ssh
curl -k 'https://raw.github.com/mitchellh/vagrant/master/keys/vagrant.pub' > /Users/vagrant/.ssh/authorized_keys
chmod 600 /Users/vagrant/.ssh/authorized_keys
chown -R vagrant /Users/vagrant/.ssh

# Get Xcode CLI tools for Lion (at least to build Chef)
# https://devimages.apple.com.edgekey.net/downloads/xcode/simulators/index-3905972D-B609-49CE-8D06-51ADC78E07BC.dvtdownloadableindex
TOOLS=clitools.dmg
if [ "$OSX_VERS" -eq 7 ]; then
  DMGURL=http://devimages.apple.com/downloads/xcode/command_line_tools_for_xcode_os_x_lion_november_2012.dmg
elif [ "$OSX_VERS" -eq 8 ]; then
  DMGURL=http://devimages.apple.com/downloads/xcode/command_line_tools_for_xcode_os_x_mountain_lion_november_2012.dmg
fi
curl "$DMGURL" -o "$TOOLS"
TMPMOUNT=`/usr/bin/mktemp -d /tmp/clitools.XXXX`
hdiutil attach "$TOOLS" -mountpoint "$TMPMOUNT"
installer -pkg "$(find $TMPMOUNT -name '*.mpkg')" -target /
hdiutil detach "$TMPMOUNT"
rm -rf "$TMPMOUNT"
rm "$TOOLS"

# Get gems - we should really be installing rvm instead, since we can't even compile Chef or have a Ruby dev environment..
gem update --system
gem install puppet --no-ri --no-rdoc
# gem install chef --no-ri --no-rdoc
exit

Build the VM

Everything we’ve done has all been for this step. With the ISO built, the VM defined, and all necessary Gems installed, we can finally BUILD the vm with the following command:

1
2
$ cd ~/src/veewee
$ bundle exec veewee fusion build osx-vm

This process takes the longest – you should see VMware Fusion fire up, a new VM get created, and follow the process of OS X being installed into the VM. When it completes, your VM will have been created. Just like most Vagrant workflows, the resultant vm will have a vagrant user whose password is also vagrant. Feel free to login and ensure that everything looks good.

Now what?

At this point, I would snapshot the VM before making any changes. Because Virtualbox isn’t yet supported with Veewee for building OS X VMs (and Vagrant doesn’t currently include VMware Fusion support for its workflow), this VM isn’t going to fit into a Vagrant workflow…yet. What you have, though, is a vanilla OS X VM that can be built on-demand (or reverted to a snapshot) to test whatever configuration changes you need to make (all 11.36 Gb worth of it).

As you would expect, this is all pretty experimental for the moment. If you’re a Ruby developer who needs an OS X VM for testing purposes but have never managed OS X and it’s ‘quirky’ imaging process, this workflow is for you. For everyone else, it’s probably an academic proof-of-concept that’s more interesting from the point of view of “Look what you can do” versus “Let’s make this my primary testing workflow.”

Credit goes to Patrick Dubois for creating and managing the Veewee project, and to Tim Sutton and Pepijn Bruienne – like I mentioned before – for the work they’ve done on this. You can speak with them directly in the ##osx-server chatroom on Freenode, or by checking them out on Twitter.

Managing a Blog Is Insane; Octopress FTW!

Masochists are everywhere. Managing a system by hand appeals to a certain subset of the population, but I joined Puppet Labs because I was a fan of automating the shit out of menial tasks. I have no burning desire to handcraft an artisanal blog made out of organic bits. I tried web development at one point in my life and learned an important lesson:

I am not a web developer

What I DO want is to have a platform to share information and code I’ve accumulated along the way that:

  1. Doesn’t look like hell
  2. Doesn’t take forever to update
  3. Doesn’t require me being online to write a post
  4. Allows me to post code that’s syntactically highlighted and easy to copy/paste
  5. Accepts Markdown
  6. Fits into my DVCS workflow
  7. Is free – because screw paying for A BLOG

Seriously, is this too much to ask?

Originally I waded the waters of Posterous but found that not only did it have awkward Markdown syntax, but staging of blog posts was cumbersome. Also, while there WAS gist integration, the code you posted looked like crap. That sucks because most of what I post has code attached.

Others have sold their soul for Wordpress/Drupal/Bumblefuck CMS (whether hosted or unhosted platforms), but it still felt like too much process and not enough action for my taste.

Github to the rescue…again

The answer, it turns out, was right in front of my face the whole time. Github Pages has always been available to host web content or Jekyll sites – I’m just late to the damn party.

Hosting static web content was out; like I said before, I don’t really want to ‘manage’ this thing. Jekyll, however, intrigued me. Jekyll is essentially a static site generator that allows you to throw Markdown at it and get static content in the end. There are even Jekyll bootstrap projects aimed at making it (even more) stupid simple. I tried plain vanilla Jekyll and realized that I didn’t really want/need that level of control (again, I’m not into web development). I pulled down Jekyll Bootstrap, but this time it felt a little TOO ‘template-y’. Next, I pulled down an awesome Jekyll template called Left by Zach Holman (seen, conveniently, in the picture on the left). I REALLY liked the look of Left, but was stuck with Jekyll’s code formatting that was…less than ideal. Jekyll is pluggable (and people have made plugins to fix this sort of thing), but I still didn’t have enough experience at that time to be able to deal with the plugins in an intelligent manner.

Octopress = Jekyll + Plugins + <3

During my plugin party, I discovered Octopress, which basically had EVERYTHING I wanted wrapped with a templated bow. I loved the way it rendered code, it supported things like fenced code blocks , and it seemed REALLY simple to update and deploy code. The thing I DIDN’T like was that NEARLY EVERY DAMN OCTOPRESS SITE LOOKS EXACTLY THE SAME! I know I said that I’m not a web developer and didn’t want to tweak styles a great bit, but damn – couldn’t I get a BIT of differentiation? That can be done, but you’re still editing styles (so some CSS is necessary – but not much). After evaluating all three options, I opted for Octopress.

Documentation? Who knew!?

Octopress.org has GREAT documentation for getting setup with an Octopress blog. They even have documentation for setting up Ruby 1.9.3 on rbenv or RVM (I recommend rbenv for reasons beyond this blog), so make sure to check it out if you’re unfamiliar with either. To not reinvent that documented wheel, make sure to check out that site to get Octopress setup (I recommend cloning all repositories to somewhere in your home directory like ~/src or ~/repos, but other than that their docs are solid). Normally I post the dirty technical details, but the point of this post is to outline WHY I made the decision I did and WHY it works for ME.

Pretty code is pretty

I’m not gonna lie – syntactical highlighting with the Solarized theme was pretty sexy. Let’s look at some Ruby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def create
    if resource[:value_type] == :boolean
      unless resource[:value].first.to_s =~ /(true|false)/i
        raise Puppet::Error, "Valid boolean values are 'true' or 'false', you specified '#{resource[:value].first}'"
      end
    end

    if File.file? resource[:path]
      plist = read_plist_file(resource[:path])
    else
      plist = OSX::NSMutableDictionary.alloc.init
    end

    case resource[:value_type]
    when :integer
      plist_value = Integer(resource[:value].first)
    when :boolean
      if resource[:value].to_s =~ /false/i
        plist_value = false
      else
        plist_value = true
      end
    when :hash
      plist_value = resource[:value].first
    else
      plist_value = resource[:value]
    end

    plist[resource[:key]] = plist_value

    write_plist_file(plist, resource[:path])
  end

How about some Puppet code?

Plist Management
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
property_list_key { 'simple':
  ensure => present,
  path => '/tmp/com.puppetlabs.puppet',
  key    => 'simple',
  value  => 'value',
}

property_list_key { 'boolean':
  ensure     => present,
  path       => '/tmp/com.puppetlabs.puppet',
  key        => 'boolean',
  value      => false,
  value_type => 'boolean',
}

property_list_key { 'hashtest':
  ensure     => present,
  path       => '/tmp/com.puppetlabs.puppet',
  key        => 'hashtest',
  value      => { 'key' => 'value' },
  value_type => 'hash'
}

Doesn’t that look awesome?

For a blog with a ton of code, something like this is pretty damn important.

Markdown formatting

Markdown is becoming more prolific as a lightweight way to format text. If you’ve used Github to comment on code, then you’ve probably used markdown syntax at some point in your life. You’ll notice the recurring goal of being quick and precise, and markdown really ‘does it for me’.

Workflow++

It’s really liberating to write a blog post in (insert favorite text editor). Travelling as often as I do, I’m usually jotting down notes in vi because I’m in a plane or somewhere without internet access. Octopress not only lets you write offline, but it will let you generate and preview your site locally while being offline. That’s totally handy. My workflow typically looks like this:

  • Generate a post

To start a new post, you can create a new file by hand, or use the provided Rakefile to scaffold it out for you (NOTE: to see all commands available to the Rakefile, you can run rake -T). Here’s how to scaffold with rake:

1
2
$ rake 'new_post["Title of my post"]'
$ vi source/_posts/2013-01-17-title-of-my-post.markdown
  • Edit the post file
  • Generate the site content

Remember that Octopress/Jekyll generates static content to be uploaded to your host. With every change, you’ll need to re-generate the content. Yeah, that kinda sucks, but that action has been abstracted away in the Rakefile with the following command:

1
$ rake generate
  • Display and view the site
1
$ rake preview

The following command will serve up your page over Webrick in the terminal, just navigate to http://localhost:4000 in your browser to see a local copy of how your site will look once deployed. Once you’re done, just do a Control+c from your terminal to cancel the process.

  • Edit/generate/preview until done
  • Commit your code and deploy

Because everything is a series of flat-files, you can use git to keep it all under version control. Did you make a mistake? Revert to a previous commit. Deploying to your blog is similarly easy, as you’ll see in the next steps

Be a unique snowflake

I used this article to help change the color scheme of my blog, and checked out a list of other Octopress sites to steal a couple of other tweaks. That’s all I needed for customization, but if you need more knobs they’re there for you to twist.

Hosted by your pals at Github

Github Pages is a free service for any Github user (again free to join) and is an EXCELLENT FIT for an Octopress blog. As you would expect, Octopress has great documentation for enabling deployment to Github Pages. If you have your own custom domain, you can STILL use Github Pages (hint, the instructions are on that page too).

Fuck it. Ship it.

The act of updating a blog has GOT to be frictionless. Is this simple enough for you:

1
rake deploy

Yep. That’s all it takes to deploy code to your blog once you’ve setup Github Pages. Forget about logging in, wading through a GUI, cutting/pasting content, FTPing things up to a host, and any of that crap. I’ve done that. ‘That’ sucks.

Write your own damn blog

You now have no excuse to NOT share all the cool things you’re doing. Seriously, if you can EDIT A TEXT FILE then you can ‘maintain’ a blog.