Python: Demystifying AWS’ Boto3

As the GitHub page says, “Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2.”

The good news is that Boto 3 is extremely well documented. However, the bad news is that it is quite difficult to follow. The documentation starts with a Quickstart guide, followed by a Sample Tutorial followed then by Code Examples. This is all good stuff, though it doesn’t give you much of an understanding of how to actually use Boto 3. For example,  we see things such as:

and,

But we haven’t yet learned what a client and a resource is, nor do we see sessions mentioned until much later in the documentation. But I digress. Let’s go ahead and get started!

10,000 foot view

The What’s New page gives you a great, high level view of the parts which make up Boto 3.

Boto 3 consists of the following major features:

  • Resources: a high level, object oriented interface
  • Collections: a tool to iterate and manipulate groups of resources
  • Clients: low level service connections
  • Paginators: automatic paging of responses
  • Waiters: a way to block until a certain state has been reached

Along with these major features, Boto 3 also provides sessions and per-session credentials & configuration, as well as basic components like authentication, parameter & response handling, an event system for customizations and logic to retry failed requests.

Note: You can have a read of the Resources page for more information, though I’ll expand on the above later on in this post too.

Botocore

Boto 3 is built atop of a library called Botocore, which is shared by the AWS CLI. Botocore provides the low level clients, session, and credential & configuration data. Boto 3 builds on top of Botocore by providing its own session, resources and collections.

The Migration page, while aimed at users who are going from Boto 2.X to Boto 3, is actually quite useful for newcomers too, as per the following points:

  • Modules are typically split into two categories, those which include a high-level object-oriented interface and those which include only a low-level interface which matches the underlying Amazon Web Services API.
    • Note: These interfaces are resources and clients respectively. We’ll take a look at both of them shortly.
  • Some modules are completely high-level (like Amazon S3 or EC2), some include high-level code on top of a low-level connection (like Amazon DynamoDB), and others are 100% low-level (like Amazon Elastic Transcoder).

Credentials

Almost all of the examples I’ve come across in the documentation are similar to what was shown above (and pasted again below for ease of reference), where a client or resource specifies an AWS service and nothing else:

However, when you try it yourself, you’ll likely get quite a lot of errored output:

The good news is that the output is easy to deciphered. In fact, the last line of the output gives us a pretty big hint:  NoRegionError: You must specify a region.

After some searching I came across the Session Reference page, and it was exactly what I was looking for. It tells us the parameters which can be specified for both sessions (which can be used for clients and resources):

as well as clients:

and resources:

I then found the Configuring Credentials section of the documentation. With these two pieces of information, I was able to put together a working solutions for sessions:

as well as for clients:

and resources:

Note: As you can see, the code extracts above specify the Key ID and Secret Key inside the script itself. This was done to help newcomers get up and running with minimal fuss. However, it is not advisable to use this method permanently nor in an production environment. Credentials should always be stored in a location where they may accidentally fall into the wrong hands (e.g a script which is being tracked by git.)

Sessions

The Sessions documentation is a little cryptic in regards to when you should use them, saying: “It is possible and recommended to maintain your own session(s) in some scenarios.” Unfortunately it doesn’t go on to explain which scenarios.

Having said that, heading back over to the Session Reference page we see that “A session stores configuration state and allows you to create service clients and resources.” so that gives us a pretty good idea on what they’re used for.

If you’d like additional input on Sessions, see this StackOverflow post.

Services documentation

The Services documentation provides information on how to use Clients and Resources (where available) for each of AWS’ services. It’s important to note that while each service has its own page of documentation, the Client and Resource sections of these documents are not clearly defined.

For example, heading over to the EC2 documentation we see the following:

  • Client
  • Paginators
  • Waiters
  • Service Resource
  • ClassicAddress
  • DhcpOptions
  • Image
  • Instance
  • <omitted>

Because of the flat hierarchy of Boto 3’s Table of Contents, newcomers might not realise that only the Client, Paginators and Waiters sections pertain to Client configuration, while all other sections pertain to Resources.

Clients

The Low-level Clients page tells us that:

  • Clients support all services.
  • The outputs are returned using Python dictionaries.
  • We have to traverse these dictionaries ourselves.
  • We need to keep in mind that “responses may not always include all of the expected data.”

Let’s take a look at the two points I’ve highlighted in bold. But first, let’s use the EC2 Client (as well as a Session) to create an instance:

The run_instances documentation gives us a large amount of useful information. For example:

  • Request Syntax: How we can customise our instance under the section.
  • Response Syntax: What output we can expect to see.
  • Return type: The format in which the output will be provided. (Recall that Clients will always have a return type of dict. Resources on the other hand will have varying return types.)

Let’s now check the contents of  client_instance:

As we suspected, it is a dictionary.

Now let’s cover off what the documentation means when it says we have to traverse the output ourselves. Say we want to obtain the InstanceId from the client_instance dictionary. We would need traverse it to the point where the instance ID resides:

For more informaiton on dictionary traversal, please see the Understanding Ansible Output Structure post.

Note: There are other ways to access the InstanceId, such as using describe_instance_status() as per the output below. However, this does not negate the need to traverse the dictionary. Needless to say, it doesn’t feel all that Pythonic!

Resources

The Resources documentation tells us that “Resources represent an object-oriented interface to Amazon Web Services (AWS). They provide a higher-level abstraction than the raw, low-level calls made by service clients.”

In other words, Resources (where available), can do the same thing as Clients but produce outputs which are a lot easier to consume, once we get the hang of it.

The documentation then goes on to tell us that “These can conceptually be split up into identifiers, attributes, actions, references, sub-resources, and collections”. Let’s go ahead and look into what these are.

Identifiers, attributes, references, actions & collections

Where Clients are fairly easy to wrap your head around, there’s quite a lot more to Resources. Thankfully AWS’ documentation gives us a great starting point. The italicised points below is AWS’ take on things, while the non-italicised points are my comments:

  • Identifiers: 
    • Properties of a resource that are set upon instantation of the resource.
    • When a resource is created, it is given an ID. This ID can then be used in subsequent Attribute and Action calls.
  • Attributes
    • Provide access to the properties of a resource. Attributes are lazy-loaded the first time one is accessed via the load() method.
    • Used in conjunction with a resource’s ID, Attributes provide information about the specified resource. For example, passing an EC2 instance’s ID to the image_id attribute shows us the AMI that this instance is running:
  • References
  • Related resource instances that have a belongs-to relationship.
  • While attributes require a resource’s ID, References are methods that belong to the Python instance’s object and therefore do not need the ID to be provided:
  • Actions
  • Call operations on resources. They may automatically handle the passing in of arguments set from identifiers and some attributes. 
  • Make something happen. For example, shut down an instance:
  • Collections provide an interface to iterate over and manipulate groups of resources.
  • Outputs which need to be iterated over, for example the IP addresses in a VPC or the number of volumes attached to an instance:
  • Waiters provide an interface to wait for a resource to reach a specific state.
  • For more information, see the Resources page.

EC2 instance using a Resource

Let’s now go ahead and use a session to create an EC2 Resource:

Those with a keen eye for detail may have noticed that the configuration Client and Resource configuration is almost identical. The only differences being the following two bits:

  • session.resource as opposed to session.client
  • ec2resource.create_instances as opposed to ec2client.run_instances

However, that’s where the similarities end. For example, when we resource_instance  we see a fraction of the information we saw when we did the same with  client_instance :

Wow, what a difference. The Client output a large amount of details about the instance whereas the Resource only shows us the instance’s ID.  As a result of this, we’ll need to extract the ID to obtain more information on the instance itself.

To extract the this ID we use a similar technique to the one we used when extracting the ID from  client_instance. Though while in that case we traverse a dictionary, this time we’re navigating an OOP instance:

Congratulations!

If you’ve made it to the end of this post, you most certainly deserve a pat on the back. I didn’t intend for this to be such a length post, but once I got started I couldn’t stop :)

I hope you enjoyed it. Please feel free to drop me a message if you’d like me to expand on any part or parts of Boto 3 and I’ll be happy to oblige.

Knowledge Base

See the Knowledge Base for more information.

As always, if you have any questions or have a topic that you would like me to discuss, please feel free to post a comment at the bottom of this blog entry, e-mail at will@oznetnerd.com, or drop me a message on Twitter (@OzNetNerd).

Note: This website is my personal blog. The opinions expressed in this blog are my own and not those of my employer.

One thought on “Python: Demystifying AWS’ Boto3

  1. You want to use session objects if you are using multiple credentials, regions or accounts in a script. Default session object is tied to you default profile.

Leave a Reply

Your email address will not be published. Required fields are marked *