Writing better Charm interfaces

UPDATE 13th January 2017

I was in contact with one of the developers of charms.reactive and he pointed out that the proposal suggested in this post may lead to subtle bugs if the charm author is not absolutely clear on what's going on. Essentially, it links the hook context event to a reactive state, and then that state disappears at the end of the hook invocation. This breaks the contract that reactive states are, if you think about it, supposed to represent the state at the end of the hook invocation, rather than be transitive during the hook invocation. I'll be doing a another post to delve into that more deeply.

Anyway, the take away is to use the proposed solution sparingly and perhaps instead, rather than have the trigger state self-destruct, make it a manual process in the charm.

Original

I've been writing Juju reactive charms for nearly a year now, and I've noticed that the way I've learnt to write interfaces (associated with charms) results in inefficient hook invocations. I explain in this post what I think the problem is, along with an example that you can try, and also a solution.


Synopsis

The charms.reactive and technique of writing charms with layers often results in interface .connected and .available handlers being executed for unrelated hook invocations (e.g. update-status). This means, for example, that for an update-status hook, the code for every relation that is .available will be run, possibly resulting in multiple configuration files being written, and even services being restarted.


Background

Reactive charms are charms written using the Python charms.reactive library. This library is designed to help simplify handling the hook events that Juju generates, for example, when two charms are related to each other. That is, when one charm is connected to another charm (which is a relation), Juju 'calls' a hook on each charm (in the hooks directory). This is how the charm is notified that it is being related to another charm.

Note: this is not a charm 101, so I'm going to refrain from continuing a low level explaination of how charms work. The docs provide a thorough introduction to charms, Juju and what it's all about.

So charms.reactive helps by enabling a charm author to create states that a charm can be in. In reality, these are essentially labels that you can use to indicate that various things have happened. e.g. a relation has been made, or the installation isn't yet done.

A further tool is layers which provide a re-use and code organisation system for writing charms. A key component of this is the interface which abstracts relation handling code into its own module. In order to provide loose coupling between layers and interfaces, the intended mechanism is to use reactive states to signal between interfaces and layers that something has happened that might need to be acted on. And this is where the problem hides.

The Problem

When charms.reactive and layers are used to build a charm they largely abstract the handling of Juju hooks from the charm author, instead encouraging the use of reactive states.

Reactive states are typically:

@when_not('installed')
def do_install():
    ... some installation stuff ...
    set_state('installed')

In this case, when the charm code is loaded, reactive will run all the handlers where the conditions of the states are true. i.e. when some state AND when_not some other state, THEN run this handler. do_install() is a handler for when the state installed is not set.

So, and this is important, states are inspected on every invocation of the charm code (think: hook invocation), and associated handlers are run if the associated state conditions are met. And this is for every hook invocation (install, upgrade, config-changed, update-status, relation-*, etc.)

Why is this important? Because of how interface layers have been written to date. Essentially, interface layers tend to have two states which are {interface-name}.connected and {interface-name}.available. Generally, the .connected state is used to indicate that the relation has been made and the .available state to indicate that some set of conditions on that interface have been met such that the data passed over it is now available to be used in the charm.

Typical examples of interfaces that are written in this way are keystone, http and db2 which all follow the same sort of pattern.

An example from charm-a is:

@charms.reactive.when('interface-a.available')
def interface_available(interface):
    hookenv.log("pid({}) - Charm A: interface_available() called"
                .format(os.getpid()))
    _update_status()

e.g. the _update_status() function will be called if the reactive state interface-a.available is set. The interface sets this state when the relation code is satisfied that all data on the relation is available. Note, it's the interface author that determines what this might be.

The problem is that the handlers for the states all get evaluated for every single type of hook invocation. It's like the different types of hook no longer exist when using states as it's now just the states which determine which handlers get executed, which means that code gets executed again and again again even when it doesn't actually need to be.

And I think that it's a bit of a code smell that unnecessary code is being executed. Now you may be thinking, "but isn't this because you've writen the interface code badly." That's true, the error is in the interface converting an event (the hook) into a static state which is persisted between charm invocations. But the whole 'point' of writing the interface is so that you don't use the @hook('{'interface-name}...') decorators in the main charm layer. Which is how the problem arises.

Let's see this in action.

Example of the problem

All the code in the following examples is in the GitHub repository. If you'd like to try this on your own systems, then using ZFS and LXD is probably the easiest (and cheapest) way to achieve it.

In the repository are four charms and two interfaces. Charm A and B, along with interface AB, represent the issue, and Charm C and D, with interface CD, show a solution to the problem.

If you want to try this out, ensure that you have a Juju environment to work with (LXD/localhost is useful for this), and then build, deploy, and relate the charms. You'll also need tox available (which on Ubuntu is in the tox package (sudo apt install tox):

# in charm-a directory
tox -e build
juju deploy ./build/builds/charm-a

# in charm-b directory
tox -e build
juju deploy ./build/builds/charm-b

# relate the two charms together
juju add-relation charm-a charm-b

After everything has settled down, charm-a and charm-b will be displaying the following:

Unit        Workload  Agent  Machine  Public address  Ports  Message
charm-a/0*  active    idle   0        172.16.1.53            all is good: response:hello right back at you
charm-b/0*  active    idle   1        172.16.1.2             all is good (hello)

Now we just wait a bit for the update-status event to go by. Then we look at what got run on charm-a when the event occurs (this is from the debug-log):

unit-charm-a-0: 12:01:11 INFO unit.charm-a/0.juju-log Reactive main running for hook update-status
unit-charm-a-0: 12:01:11 INFO unit.charm-a/0.juju-log Invoking reactive handler: reactive/handlers.py:44:update_status
unit-charm-a-0: 12:01:11 INFO unit.charm-a/0.juju-log pid(8166) - Charm A: update_status() called
unit-charm-a-0: 12:01:11 INFO unit.charm-a/0.juju-log pid(8166) - Charm A: running _update_status()
unit-charm-a-0: 12:01:11 INFO unit.charm-a/0.juju-log pid(8166) - get_requires(hello right back at you)
unit-charm-a-0: 12:01:11 INFO unit.charm-a/0.juju-log Invoking reactive handler: reactive/handlers.py:19:interface_connected
unit-charm-a-0: 12:01:11 INFO unit.charm-a/0.juju-log pid(8166) - Charm A: interface_connected() called
unit-charm-a-0: 12:01:11 INFO unit.charm-a/0.juju-log pid(8166) - set_provides(hello)
unit-charm-a-0: 12:01:12 INFO unit.charm-a/0.juju-log pid(8166) - Charm A: running _update_status()
unit-charm-a-0: 12:01:12 INFO unit.charm-a/0.juju-log pid(8166) - get_requires(hello right back at you)
unit-charm-a-0: 12:01:12 INFO unit.charm-a/0.juju-log Invoking reactive handler: reactive/handlers.py:30:interface_available
unit-charm-a-0: 12:01:12 INFO unit.charm-a/0.juju-log pid(8166) - Charm A: interface_available() called
unit-charm-a-0: 12:01:12 INFO unit.charm-a/0.juju-log pid(8166) - Charm A: running _update_status()
unit-charm-a-0: 12:01:12 INFO unit.charm-a/0.juju-log pid(8166) - get_requires(hello right back at you)

So, for charm-a, the update-status hook causes the charm code to be loaded and then the reactive framework to invoke the update_status, interface_connected and interface_available handlers.

This is because the interface-a.connected and interface-a.available states are set (as the interface is both connected and available). It also means, that naively, the _update_status() function has been called THREE times, which means that the state of the charm is checked 3 times and then set.

This is how most interfaces and charms are written. For a charm with multiple interfaces, that could mean a whole lot of handlers are run for no good reason.

This might cause configuration files to be written, services to be (re)started, or other expensive operations.

Most interface .connected and .available handlers tend to look like this:

@charms.reactive.when('{interface-name}.connected')
def interface_connected(interface):
    # do something when the interface connects
    # and update the status
    _update_status()


@charms.reactive.when('{interface-name}.available')
def interface_available(interface):
    # do something when the interface is 'complete'
    # and update the status
    _update_status()

i.e. naively thinking that {interface-name}.connected is an event rather than a state. The thinking is so prevalent that most interfaces and charms seem to have been written that way. And I've been continuing the trend for the last few months!

A Solution

One possible solution is to provide an event-like state to indicate that an interface has changed. This can then be used as well as the .connected and .available states to control whether a handler is executed. That is, the handlers associated with interfaces would only ever get executed if the charm code was actually handling a hook event for that interface.

Charms C and D (and interface CD) implement one possible solution to this. The idea is to introduce an {interface-name}.triggered state that only exists if, and only if, one of the interface's hook events was executed. An issue was: "how do we clear the state at the end of the hook invocation?". i.e. make it transitory and ONLY for that invocation of the code even if no handler is every invoked.

The key to that is this piece of code in the interface-cd code:

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        hookenv.atexit(lambda: self.remove_state(self.states.triggered))

hookenv.atexit() works like the python library equivalent atexit() and provides a way for the interface to clean up the triggered state when the script invocation ends. As Juju runs a new process for each hook invocation, this means that the triggered state will only exist if a @reactive.hook(...) for an interface was the reason the script was started.

Thus, in the interface code, the joined hook now looks like:

    @reactive.hook('{provides:interface-cd}-relation-joined')
    def joined(self):
        # do some work and then indicate that the interface was triggered.
        self.set_state(self.states.triggered)

In the main handler code, we can now change it so that we also check to see if the interface was triggered. For example:

@charms.reactive.when_not('interface-c.available')
@charms.reactive.when('interface-c.connected')
@charms.reactive.when('interface-c.triggered')
def interface_connected(interface, *args):
    hookenv.log("pid({}) - Charm C: interface_connected() called"
                .format(os.getpid()))
    config = hookenv.config()
    option = config['the-option']
    if option:
        interface.set_provides(option)
    _update_status()

That is, only run this code if we are connected, but not yet available, and also guard it by being triggered. Note we need to include *args in the function arguments as we've mentioned two states which would put the interface-c instance into the args list of def interface_connected(...) twice. As we only actually need one of them, the *args just lets us ignore the second interface instance passed.

Using a triggered state does result in more checks to make, but results in a cleaner invocation of the charm which only executes the code that is needed on each execution. The update_status hook run for charm-c now looks like:

unit-charm-c-0: 18:25:13 INFO unit.charm-c/0.juju-log Reactive main running for hook update-status
unit-charm-c-0: 18:25:13 INFO unit.charm-c/0.juju-log Invoking reactive handler: reactive/handlers.py:62:update_status
unit-charm-c-0: 18:25:13 INFO unit.charm-c/0.juju-log pid(8121) - Charm C: update_status() called
unit-charm-c-0: 18:25:13 INFO unit.charm-c/0.juju-log pid(8121) - Charm C: running _update_status()
unit-charm-c-0: 18:25:13 INFO unit.charm-c/0.juju-log pid(8121) - get_requires(hello right back at you)

Only update_status is invoked, and the update_status() function is only executed once.

Summary

By using a transitory state (as an event indication) we can cleanly ensure that only the code that should run on each invocation does. The charm-helpers hookenv.atexit() function provides a simple, and effective, way to remove the event state at the end of the hook execution. This results in less resource usage and more predictable charm behaviour.

What next, then?

There are several patterns emerging around the writing of (complex) charms using charms.reactive and layers. The team I work in at Canonical has been abstracting some of these into the charms.openstack library.

I'm currently working on pulling out and simplifying some of the abstractions in that library and including more generalised support for writing cleaner interfaces in an upcoming charms.declarative library. This will be a highly opinionated library to provide a framework for writing charms, using charms.reactive and layers by using conventions to simplify and remove boilerplate code.

I'm really interested in your thoughts so please feel free either to comment here, or contact me via twitter or on IRC on Freenode in the #openstack-charms channel. I'm tinwood there.


Resources

These are some resources for exploring the concepts discussed in this post.

Comments are disabled for this page.