jamielennox.net

Requests-mock

| Comments

Having just release v0.5 of requests-mock and having it used by both keystoneclient and novaclient with others in the works I thought I’d finally do a post explaining what it is and how to use it.

Motivation

I was the person who brought HTTPretty into the OpenStack requirements.

The initial reason for this was that keystoneclient was transitioning from the httplib library to requests and I needed to prove that there was no changes to the HTTP requests during the transition. HTTPretty is a way to mock HTTP responses at the socket level, so it is not dependant on the HTTP library you use and for this it was fairly successful.

As part of that transition I converted all the unit tests so that they actually traversed through to the requesting layer and found a number of edge case bugs because the responses were being mocked out above this point. I have therefore advocated that the clients convert to mocking at the request layer rather than stubbing out returned values. I’m pretty sure that this doesn’t adhere strictly to the unit testing philosophy of testing small isolated changes, but our client libraries aren’t that deep and I’d honestly prefer to just test the whole way through and find those edge cases.

Having done this has made it remarkably easier to transition to using sessions in the clients as well, because we are testing the whole path down to making HTTP requests for all the resource tests so again have assurances that the HTTP requests being sent are equivalent.

At the same time we’ve had a number of problems with HTTPretty:

  • It was the lingering last requirement for getting Python 3 support. Thanks to Cyril Roelandt for finally getting that fixed.
  • For various reasons it is difficult for the distributions to package.
  • It has a bad habit of doing backwards incompatible, or simply broken releases. The current requirements string is: httpretty>=0.8.0,!=0.8.1,!=0.8.2,!=0.8.3
  • Because it acts at the socket layer it doesn’t always play nicely with other things using the socket. For example it has to be disabled for live memcache tests.
  • It pins its requirements on pypi.

Now I feel like I’m just ranting. There are additional oddities I found in trying to fix these upstream but this is not about bashing HTTPretty.

requests-mock

requests-mock follows the same concepts allowing users to stub out responses to HTTP requests, however it specifically targets the requests library rather than stubbing the socket. All the OpenStack clients have been converted to requests at this point, and for the general audience if you are writing HTTP code in Python you should be using requests.

Note: a lot of what is used in these examples is only available since the 0.5 release. The current OpenStack requirements still have 0.4 so you’ll need to wait for some of the new syntax.

The intention of requests-mock is to work in as similar way to requests itself as possible. Hence all the variable names and conventions should be as close to a requests.Response as possible. For example:

1
2
3
4
5
6
7
8
9
10
11
>>> import requests
>>> import requests_mock
>>> url = 'http://www.google.com'
>>> with requests_mock.mock() as m:
...     m.get(url, text='Not really google', status_code=218)
...     r = requests.get(url)
...
>>> r.text
u'Not really google'
>>> r.status_code
218

So text in the mock equates to text in the response and similarly for status_code. Some more advanced usage of the requests library:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> with requests_mock.mock() as m:
...     m.get(url, json={'hello': 'world'}, headers={'test': 'header'})
...     r = requests.get(url)
...
>>> r.text
u'{"hello": "world"}'
>>> r.json()
{u'hello': u'world'}
>>> r.status_code
200
>>> r.headers
{'test': 'header'}
>>> r.headers['test']
'header'

You can also use callbacks to create responses dynamically:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>>> def _request_callback(request, context):
...     context.status_code = 201
...     context.headers['test'] = 'header'
...     return {'request': request.body}
...
>>> with requests_mock.mock() as m:
...     m.post(url, json=_request_callback)
...     r = requests.post(url, data='data')
...
>>> r.status_code
201
>>> r.headers
{'test': 'header'}
>>> r.json()
{u'request': u'data'}

Note that because the callback was passed as the json parameter the return type is expected to be the same as if you had passed it as a predefined json=blob value. If you wanted to return text the callback would be on the text parameter.

Cool tricks

So rather than give a lot of examples i’ll just highlight some of the interesting things you can do with the library and how to do it.

  • Queue mutliple responses for a url, each element of the list is interpreted as if they were **kwargs for a response. In this case every request other than the first will get a 401 error:
1
2
m.get(url, [{'json': _request_callback},
            {'text': 'Not Allowed', 'status_code': 401}])
  • See the history of requests:
1
2
3
4
m.request_history  # all requests
m.last_request  # the last request
m.call_count  # number of requests
m.called  # boolean, True if called
  • match on the only the URL path:
1
m.get('/path/only')
  • match on any method:
1
m.request(requests_mock.ANY, url)
  • or match on any URL:
1
m.get(requests_mock.ANY)
  • match on headers that are part of the request (useful for distinguishing between multiple requests to the same URL):
1
m.get(url, request_headers={'X-Auth-Token': 'XXXXX'})
  • be used as a function decorator
1
2
3
4
@requests_mock.mock()
def test_a_thing(m):
   m.get(requests_mock.ANY, text='resp')
   ...

Try it!

There is a lot more it can do and if you want to know more you can check out:

As a final selling point because it was built particularly around OpenStack needs it is:

  • Easily integrated with the fixtures library.
  • Hosted on stackforge and reviewed via Gerrit.
  • Continuously tested against at least keystoneclient and novaclient to prevent backwards incompatible changes.
  • Accepted as part of OpenStack requirements.

Patches and bug reports are welcome.

Git Commands for Messy People

| Comments

I am terrible at keeping my git branches in order. Particularly since I work across multiple machines and forget where things are I will often have multiple branches with different names being different versions of the same review.

On a project I work on frequently I currently have 71 local branches which are a mix of my code, some code reviews, and some branches that were for trialling ideas. git review at least prefixes branches it downloads with review/ but that doesn’t help to figure out what was happening with local branches labelled auth through auth-4.

However this post isn’t about me fixing my terrible habit it’s about two git commands which help me work with the mess.

The first is an alias which I called branch-date:

1
2
[alias]
    branch-date = "!git for-each-ref --sort=committerdate --format='%1B[32m%(committerdate:iso8601) %1B[34m%(committerdate:relative) %1B[0;m%(refname:short)' refs/heads/"

This gives a nicely formatted list of branches in the project sorted by the last time they were committed to and how long ago it was. So if I know I’m looking for a branch that I last worked on last week I can quickly locate those branches.

List of branches ordered by date

The next is a script to figure out which of my branches have made it through review and have been merged upstream which I called branch-merged.

Using git you can already call git branch --merged master to determine which branches are fully merged into the master branch. However this won’t take into account if a later version of a review was merged, in which case I can probably get rid of that branch.

We can figure this out by using the Commit-Id: field of our Gerrit reviews.

So print out the branches where all the Commit-Ids are also in master. It’s not greatly efficient and if you are working with code bases with long histories you might need to limit the depth, but given that it doesn’t run often it completes quickly enough.

There’s no guarantee that there wasn’t something new in those branches, but most likely it was an earlier review or test code that is no longer relevant. I was considering a tool that could use the Commit-Id to figure out from gerrit if a branch is an exact match to one that was previously up for review and so contained no possibly useful experimenting code, but teaching myself to clean up branches as I go is probably a better use of my time.

Identity_uri in Auth Token Middleware

| Comments

As part of the 0.8 release of keystoneclient (2014-04-17) we made an update to the way that you configure auth_token middleware in OpenStack.

Previously you specify the path to the keystone server as a number of individual parameters such as:

1
2
3
4
5
[keystone_authtoken]
auth_protocol = http
auth_port = 35357
auth_host = 127.0.0.1
auth_admin_prefix =

This made sense in code when using httplib for communication where you use each of those independent pieces. However we removed httplib a number of releases ago and now simply reconstruct the full URL in code in the form:

1
%(auth_protocol)s://%(auth_host)s:%(auth_port)d/%(auth_admin_prefix)s

This format is much more intuitive for configuration and so should now be used with the key identity_uri. e.g.

1
2
[keystone_authtoken]
identity_uri = http://127.0.0.1:35357

Using the original format will continue to work but you’ll see a deprecation message like:

1
WARNING keystoneclient.middleware.auth_token [-] Configuring admin URI using auth fragments. This is deprecated, use 'identity_uri' instead.

Client Session Objects

| Comments

Keystoneclient has recently introduced a Session object. The concept was discussed and generally accepted at the Hong Kong Summit that keystoneclient as the root of authentication (and arguably security) should be responsible for transport (HTTP) and authentication across all the clients.

The majority of the functionality in this post is written and up for review but has not yet been committed. I write this in an attempt to show the direction of clients as there is currently a lot of talk around projects such as the OpenStack-SDK.

When working with clients you would first create an authentication object, then create a session object with that authentication and then re-use that session object across all the clients you instantiate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from keystoneclient.auth.identity import v2
from keystoneclient import session
from keystoneclient.v2_0 import client

auth = v2.Password(auth_url='https://localhost:5000/v2.0',
                   username='user',
                   password='pass',
                   tenant_name='demo')

sess = session.Session(auth=auth,
                       verify='/path/to/ca.pem')

ksclient = client.Client(session=sess,
                         region_name='RegionOne')
# other clients can be created sharing the sess parameter

Now whenever you want to make an authenticated request you just indicated it as part of the request call.

1
2
3
# requests with authenticated are sent with a token
users = sess.get('http://localhost:35357/v2.0/users',
                 authenticated=True)

This was pretty much the extent of the initial proposal, however in working with the plugins I have come to realize that authentication is responsible for much more than simply getting a token.

A large part of the data in a keystone token is the service catalog. This is a listing of the services known to an OpenStack deployment and the URLs that we should use when accessing those services. Because of the disjointed way in which clients have been developed this service catalog is parsed by each client to determine the URL with which to make API calls.

With a session object in control of authentication and the service catalog there is no reason for a client to know its URL, just what it wants to communicate.

1
2
3
4
5
users = sess.get('/users',
                 authenticated=True,
                 service_type='identity',
                 endpoint_type='admin',
                 region_name='RegionOne')

The values of service_type and endpoint_type are well known and constant to a client, region_name is generally passed in when instantiating (if required). Requests made via the client object will have these parameters added automatically, so given the client from above the following call is exactly the same:

1
users = ksclient.get('/users')

Where I feel that this will really begin to help though is in dealing with the transition between API versions.

Currently deployments of OpenStack put a versioned endpoint in the service catalog eg for identity http://localhost:5000/v2.0. This made sense initially however now as we try to transition people to the V3 identity API we find that there is no backwards compatible way to advertise both the v2 and v3 services. The agreed solution long-term is that entries in the service catalog should not be versioned eg. http://localhost:5000 as the root path of a service will list the available versions. So how do we handle this transition across the 8+ clients? Easy:

1
2
3
4
5
6
7
8
9
try:
    users = sess.get('/users',
                     authenticated=True,
                     service_type='identity',
                     endpoint_type='admin',
                     region_name='RegionOne',
                     version=(2, 0))  # just specify the version you need
except keystoneclient.exceptions.EndpointNotFound:
    logging.error('No v2 identity endpoint available', exc_info=True)

This solution also means that when we have a suitable hack for the transition to unversioned endpoints it needs only be implemented in one place.

Reliant on this is a means to discover the available versions of all the OpenStack services. Turns out that in general the projects are similar enough in structure that it can be done with a few minor hacks. For newer projects there is now a definitive specification on the wiki.

A major advantage of this common approach is we now have a standard way of determining whether a version of a project is available in this cloud. Therefore we get client version discovery pretty much for free:

1
2
3
4
5
if sess.is_available(service_type='identity',
                     version=(2,0)):
    ksclient = v2_0.client.Client(sess)
else:
    logging.error("Can't create a v2 identity client")

That’s a little verbose as a client knows that information, so we can extract a wrapper:

1
2
if v2_0.client.Client.is_available(sess):
    ksclient = v2_0.client.Client(sess)

or simply:

1
2
3
4
ksclient = keystoneclient.client.Client(session=sess,
                                        version=(2,0))
if ksclient:
    # do stuff

So the session object has evolved from a pure transport level object and this departure is somewhat concerning as I don’t like mixing layers of responsibility. However in practice we have standardized on the requests library to abstract much of this away and the Session object is providing helpers around this.

So, along with standardizing transport, by using the session object like this we can:

  • reduce the basic client down to an object consisting of a few variables indicating the service type and version required.
  • finally get a common service discovery mechanism for all the clients.
  • shift the problem of API version migration onto someone else - probably me.

Disclaimers and Notes

  • The examples provided above use keystoneclient and the ‘identity’ service purely because this is what has been implemented so far. In terms of CRUD operations keystoneclient is essentially the same as other client in that it retrieves its endpoint from the service catalog and issues requests to it, so the approach will work equally well.

  • Currently none of the other clients rely upon the session object, I have been waiting on the inclusion of authentication plugins and service discovery before making this push.

  • Region handling is still a little awkward when using the clients. I blame this completely on the fact that region handling is awkward on the servers. In Juno we should have hierarchical regions and then it may make sense to allow region_name to be set on a session rather than per client.

Dealing With .pyc

| Comments

I have often found that when dealing with multiple branches and refactoring patches I get caught out by left over *.pyc files from python files that don’t exist on this branch. This bit me again recently so I went looking for options.

A useful environment variable that I found via some stackoverflow questions is: PYTHONDONTWRITEBYTECODE which, when set, prevents python from writing .pyc and .pyo files. This is not something that I want to set permanently on my machine but is great for development.

The other tool I use for all my python projects is virtualenvwrapper which allows you to isolate project dependencies and environments in what I think is a more intuitive way than with virtualenv directly.

Armed with the simple idea that these two concepts should be able to work together I found I was not the first person to think of this. There are other guides out there but the basic concept is simply to set PYTHONDONTWRITEBYTECODE when we activate a virtualenv and reset it when we deactivate it.

Easy.

Add to ~/.virtualenvs/postactivate:

1
2
export _PYTHONDONTWRITEBYTECODE=$PYTHONDONTWRITEBYTECODE
export PYTHONDONTWRITEBYTECODE=1

Add to ~/.virtualenvs/predeactivate:

1
2
export PYTHONDONTWRITEBYTECODE=$_PYTHONDONTWRITEBYTECODE
unset _PYTHONDONTWRITEBYETCODE

Keystone Token Binding

| Comments

With the Havana release of OpenStack, Keystone gains the ability to issue and verify tokens “bound” to some authentication mechanism. To understand the reason for this feature we need to first consider the security model of the current token architecture.

OpenStack tokens are what we call “bearer tokens”. The term seems to have come out of the OAuth movement but means that whoever has the token has all the rights associated with that person. This is not an uncommon situation on the Internet, it is the way basic auth (username and password), cookies, and session ids all work, and one of the reasons that SSL is so important when authenticating against a website. If an attacker was to get your token then they have all the rights of that token for as long as it is valid, including permission to reissue a token or change your password. While all of these mechanism are symmetric secrets, they are only shared between two end points. Keystone tokens are shared across all of the public services in an OpensStack deployment.

As OpenStack grows and this token is presented to an ever increasing list of services the vulnerability of this mechanism increases. So what can we do about it? The typical answer, particularly for the enterprise, is to use Kerberos or x509 client certificates. This is a great solution but we don’t want to have each service dealing with different authentication mechanisms, that’s what Keystone does.

What is a “bound token”?

A “bound token” is a regular keystone token with some additional information that indicates that the token may only be used in conjunction with the specified external authentication mechanism. Taking the example of Kerberos, when a token is issued Keystone embeds the name of the Kerberos principle into the token. When this token is then presented to another service the service notices the bind information and ensures that Kerberos authentication was used and that the same user is making the request.

So how does this help to protect token hijacking? To give an example:

  1. Alice connects to Keystone using her Kerberos credentials and gets a token. Embedded within this token is her Kerberos principal name alice@ACME.COM.
  2. Alice authenticates to HaaS (hacked as a service) using her token and Kerberos credentials and is allowed to perform her operations.
  3. Bob, who has privileged access to HaaS, records the token that Alice presented to the service (or otherwise gets Alice’s token)
  4. Bob attempts to connect to Keystone as Alice to change her password. He connects to keystone with his own Kerberos credentials bob@ACME.COM. Because these credentials do not match the ones that were present when the token was created his access is disallowed.

It does not necessarily mean that the user initially authenticated themselves by there Kerberos credentials, they may have used there regular username and password. It simply means that the user who created the token has said that they are also the owner of this Kerberos principal (note: that it is tied to the principal, not a ticket so it will survive ticket re-issuing) and the token should not be authenticated in future without it present.

What is implemented?

Currently tokens issued from Keystone can be bound to a Kerberos principal. Extending this mechanism to x509 client certificates should be a fairly simple exercise but will not be included in the Havana release.

A patch to handle bind checking in auth_token middleware is currently under review to bring checking to other services.

There are however a number of problems with enforcing bound tokens today:

  • Kerberos authentication is not supported by the eventlet http server (the server that drives most of the OpenStack web services), and so there is no way to authenticate to the server to provide the credentials. This essentially restricts bind checking to services running in httpd, which to the best of my knowledge is currently only keystone and swift.
  • None of the clients currently support connecting with Kerberos authentication. The option was added to Keystoneclient as a proof of concept but I am hoping that this can be solved across all clients by standardizing the way they communicate rather than having to add and maintain support in each individual client. There will also be the issue of how to configure the servers to use these clients correctly.
  • Kerberos tickets are issued to users, not hosts, and typically expire after a period of time. To allow unattended servers to have valid Kerberos credentials requires a way of automatically refreshing or fetching new tickets. I am told that there is support for this scenario coming in Fedora 20 but I am not sure what it will involve.

Configuring Token Binding

The new argument to enable token binding in keystone.conf is:

[token]

# External auth mechanisms that should add bind information to token.
# eg kerberos, x509
bind = kerberos

As mentioned currently only the value Kerberos is currently supported here. One of the next supported mechanisms will be x509 client certificates.

To enable token bind authentication in keystone.conf is:

[token]
# Enforcement policy on tokens presented to keystone with bind information.
# One of disabled, permissive, strict, required or a specifically required bind
# mode e.g. kerberos or x509 to require binding to that authentication.
enforce_token_bind = permissive

As illustrated by the comments the possible values here are:

  • disabled: Disables token bind checking.
  • permissive: Token bind information will be verified if present. If there is bind information for a token and the server does not know how to verify that information then it will be ignored and the token will be allowed. This is the new default value and should have no effect on existing systems.
  • strict: Like permissive but if unknown bind information is present then the token will be rejected.
  • required: Tokens will only be allowed if bind information is present and verified.
  • A specific form of bind information is present and verified. The only currently available value here is kerberos indicating that a token must be bound to a Kerberos principal to be accepted.

In Conclusion

For a deployment with access to a Kerberos or x509 infrastructure token binding will dramatically increase your user’s security. Unfortunately the limitations of Kerberos within OpenStack don’t really make this a viable deployment option in Havana. Watch this space however as we add x509 authentication and binding, and improve Kerberos handling throughout.

Keystone With HTTPd in Devstack

| Comments

Keystone has been slowly pushing away from being deployed with Eventlet and the keystone-all script in favour of the more traditional httpd mod_wsgi application method. There has been discussion of Eventlet’s place in OpenStack before and its (mis)use has led to numerous subtle bugs and problems, however from my opinion in Keystone the most important reasons to move away from Eventlet are:

  • Eventlet does not support Kerberos authentication.
  • pyOpenSSL only releases the GIL around some SSL verification commands. This leads to a series of hacks to prevent long running crypto commands blocking Eventlet threads and thus the entire Keystone process.
  • There are already a lot of httpd authentication/authorization plugins that we could make use of in Keystone.
  • It’s faster to have things handled by httpd modules in C than in Python.

Keystone has shipped with sample WSGI scripts and httpd configuration files since Foslom and documentation for how to use them is available however most guides and service wrappers (upstart, systemd etc) will use the keystone-all method.

To get some wider adoption and understanding of the process I’ve just added Keystone with httpd support into devstack. Simply set:

1
APACHE_ENABLED_SERVICES=key

in your localrc or environment variables and re-run ./stack.sh to try it out.

P.S. Swift can also be deployed this way by adding swift to the (comma separated) services list.

APIClient Communications

| Comments

There has been interest recently in porting novaclient’s authentication plugin system to the rest of the OpenStack client libraries and moving the plugins into keystoneclient. At a similar time Alessio Ababilov started trying to introduce the concept of a common base client into keystoneclient. This is a fantastic idea and one that is well supported by the Keystone, Oslo and I’m sure other teams. I’ve been referring to this move as APIClient as that is the name of the folder in Oslo code. At its core is a change in how clients communicate that will result in some significant changes to the base client objects and incorporate these plugins.

Keystone is interested in handling how communication is managed within OpenStack, not just for tokens but as we bring in client certificate and kerberos authentication it will need to have influence over the requests being sent. After discussing the situation with Alessio he agreed to let me take his base work and start the process of getting these changes into keystoneclient with the intent that this pattern be picked up by other OpenStack clients. This has unfortunately been a slower process than I would have liked and I think it is hampered by a lack of clear understanding in what is trying to be achieved, which I hope to address with this post. What follows is in the vein of Alessio’s ideas and definitely a result of his work but is my own interpretation of the problem and the implementation has been rewritten from that initial work.

Most OpenStack clients have the concept of a HTTPClient which abstracts the basic communication with a server, however projects differ in what this object is and how it is used. Novaclient creates an instance of a HTTPClient object which it saves as self.client (for yet another candidate for what a client object is). Much of what the novaclient object does then in terms of setting and using authentication plugins is simply a wrapper around calls to the HTTPClient object. Managers (the part of client responsible for a resource eg user, project etc) are provided with a reference to the base client object (this time saved as api) and so make requests in the form self.api.client.get. Keystoneclient subclasses HTTPClient and managers make calls in the form self.api.get. Other projects can go either way depending on which client they were using as reference.

My guess here is that when keystoneclient was initially split out from novaclient the subclassing of HTTPClient was intentional, such that keystoneclient would provide an authenticated HTTPClient that novaclient would use. Keystoneclient however has its own managers and requirements and the projects have sufficiently diverged so that it no longer fits into this role. To this day novaclient does not use keystoneclient (in any way) and introduced authentication plugins instead.

If there is going to be a common communication framework then there must be a decision between:

  • Standardizing on a common base client class that is capable of handling communication (as keystoneclient does).
  • Create a standalone communication object that clients make use of (as novaclient does).

The APIClient design goes for the latter. We create a communication object that can be used by any type of client and be reused by different instances of clients (which novaclient does not currently allow). This communication object is passed between clients deprecating some of the ever increasing list of parameters passed to clients and changes the flow from authenticating a client to authenticating a channel that clients can make use of. This centralizes authentication and token fetching (including kerberos and client certs), catalog management and endpoint selection and will let us address caching, HTTP session management etc in the future.

In the initial APIClient this object was the new HTTPClient, however this term was so abused I am currently using ClientSession (as it is built on the requests library and is similar to the requests.Session concept) but debate continues.

This is where authentication plugins will live so that any communication through a ClientSession object can request a token added from the plugin. Maintaining the plugin architecture is preferred here to simply having multiple ClientSession subclasses to allow independent storing and caching of authentication, plugin discovery, and changing or renewing authentication.

So an example of the new workflow is:

1
2
3
4
5
6
7
8
9
10
11
12
13
from keystoneclient.auth.identity import v3_auth
from keystoneclient import session
from keystoneclient.v3 import client as v3_client
from novaclient.v1_1 import client

auth = v3_auth.Auth(username='username',
                    password='password',
                    project_id='project_id',
                    auth_url='https://keystone.example.com')
client_session = session.ClientSession(auth)

keystone_client = v3_client.Client(client_session)
nova_client = client.Client(client_session)

It is obviously a little longer than the current method but I’m sure that the old syntax can be maintained for when you only need a single client.

Implementations of this are starting to go into review on keystoneclient. For the time being some features from nova such as authentication plugins specifying CLI arguments are not being considered until we can ensure that the new system meets at least the current functionality.

The major problem found so far is maintaining API compatibility. Much of what is currently on keystoneclient that will be moved is defined publicly and cannot simply be thrown away even though they are typically attributes and abstract functions that a user should have no need of.

Hopefully this or something very similar will be coming to the various OpenStack clients soon.

User Access to Libvirtd

| Comments

To enable access to libvirtd without sudo:

  1. Create a group for privileged users, I called mine libvirt and add your users to it.
  2. Create a new file in /etc/polkit-1/rules.d/ i called mine 50-libvirt-group.rules
  3. Add the following function:
1
2
3
4
5
6
polkit.addRule(function(action, subject) {
    if (action.id == "org.libvirt.unix.manage" &&
        subject.isInGroup('libvirt') ) {
        return polkit.Result.YES;
    }
});

Cryptographic Message Syntax

| Comments

CMS is the IETF’s standardized approach to cryptographically secure messages. It provides a BER encoded, ASN1 defined means of communicating the parameters and data of a message between recipients.

The most recent definitions come from RFC 5652 and it does do a good job of explaining how each operation works and what is required, it even provides some usage examples. What is missing is a simple rundown of the different types of messages that goes into more detail than Wikipedia but doesn’t have you jumping straight into the RFC.

Data section should be thought of as a way of using a cryptographic function rather than a message all on its own. Correlations will become obvious but EncryptedData is simply a way of portraying symmetrically encrypted data and AuthenticatedData is essentially a way of addressing and sending a MAC. As with using normal crypto functions you will often need a combination to provide the required security and so CMS messages are designed to contain nested data sections. A common example is a Signed data wrapping an Enveloped data to provide confidentiality and authenticity, or an Enveloped Digest message for confidentiality and integrity.

I realize this is by no means a comprehensive rundown but it should frame the situation and give an overview before you get to the RFC. The meat of the post is a list of each CMS data type and the most important parts of the ASN1 definition as it should allow you to figure out how each segment is used and the chain of data you will want in your message.

EncryptedData

The most simple CMS message just takes a symmetric key and encrypt some data. How to give that key to someone else is outside this message. It’s defined as:

1
2
3
4
EncryptedData ::= SEQUENCE {
  version CMSVersion,
  encryptedContentInfo EncryptedContentInfo,
  unprotectedAttrs [1] IMPLICIT UnprotectedAttributes OPTIONAL }

DigestedData

Provides a digest along with the plaintext. Typically this is then wrapped by an EnvelopedData or such to provide Integrity to a message.

1
2
3
4
5
DigestedData ::= SEQUENCE {
  version CMSVersion,
  digestAlgorithm DigestAlgorithmIdentifier,
  encapContentInfo EncapsulatedContentInfo,
  digest Digest }

SignedData

SignedData messages allow any number of certificates to sign a payload. In the regular signing way, each signer creates a digest of the payload and encrypts it with their private key. The message contains the certificate of the signers and can contain a store of certificates and CRLs for cert path validation by the receiver.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SignerInfo ::= SEQUENCE {
  version CMSVersion,
  sid SignerIdentifier,
  digestAlgorithm DigestAlgorithmIdentifier,
  signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
  signatureAlgorithm SignatureAlgorithmIdentifier,
  signature SignatureValue,
  unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }


SignedData ::= SEQUENCE {
  version CMSVersion,
  digestAlgorithms DigestAlgorithmIdentifiers,
  encapContentInfo EncapsulatedContentInfo,
  certificates [0] IMPLICIT CertificateSet OPTIONAL,
  crls [1] IMPLICIT RevocationInfoChoices OPTIONAL,
  signerInfos SignerInfos }

EnvelopedData

EnvelopedData allows you to address encrypted data to any number of specific recipients. The most common recipients are a certificate (or the private key of), a symmetric key or a password. The payload is encrypted with a symmetric key and then this key is encrypted with the key provided by the recipient (or PBKDF for passwords). So on decoding the recipient will decrypt the symmetric key from the RecipientInfo associated with them and then use that to decrypt the plaintext mesage.

1
2
3
4
5
6
7
8
9
10
11
12
13
EnvelopedData ::= SEQUENCE {
  version CMSVersion,
  originatorInfo [0] IMPLICIT OriginatorInfo OPTIONAL,
  recipientInfos RecipientInfos,
  encryptedContentInfo EncryptedContentInfo,
  unprotectedAttrs [1] IMPLICIT UnprotectedAttributes OPTIONAL }

RecipientInfo ::= CHOICE {
  ktri KeyTransRecipientInfo,
  kari [1] KeyAgreeRecipientInfo,
  kekri [2] KEKRecipientInfo,
  pwri [3] PasswordRecipientinfo,
  ori [4] OtherRecipientInfo }

AuthenticatedData

AuthenticatedData is the one that always trips me up, it allows you to send a message that is only verifiable by a number of specific recipients. It generates a new MAC secret key and with it generates a MAC digest for the plaintext. It then encrypts the MAC secret key to any number of recipients similar to EnvelopedData and includes the MAC in the message. The message itself is not encrypted and can be retrieved, but you cannot assert the message integrity without being one of the explicit recipients.

1
2
3
4
5
6
7
8
9
10
AuthenticatedData ::= SEQUENCE {
  version CMSVersion,
  originatorInfo [0] IMPLICIT OriginatorInfo OPTIONAL,
  recipientInfos RecipientInfos,
  macAlgorithm MessageAuthenticationCodeAlgorithm,
  digestAlgorithm [1] DigestAlgorithmIdentifier OPTIONAL,
  encapContentInfo EncapsulatedContentInfo,
  authAttrs [2] IMPLICIT AuthAttributes OPTIONAL,
  mac MessageAuthenticationCode,
  unauthAttrs [3] IMPLICIT UnauthAttributes OPTIONAL }