New AutoSpotting Version

Hi there,

Just in case you haven’t been following the AutoSpotting project, a few weeks ago we released a new version that improves it a lot, and we recommend you to give it a try. Below you can see a few highlights of the latest version and general news about the project.

The spot bidding engine was heavily refactored, using less memory and being much more scalable on large installations

Previously AutoSpotting was launching the spot instances by using the traditional ec2.RequestSpotInstances API calls. This API works pretty well, but it has some limitations that need to be handled by repeatedly polling looking for the newly launched instances, which was causing dozens of API calls for each new instance. The Lambda function could also run out of memory and became unreliable immediately after enabling it for the first time on AWS accounts with lots of groups due to API throttling. There were complaints from people running AutoSpotting on more than 100 groups and 500 instances in a single region, where initial runs would be failing to handle the newly launched instances and they were left lying around unattached to their groups. Some of the people even resorted to writing their own AutoSpotting clones to better address this situation, which was an unfortunate waste of effort.

This problematic API call was recently replaced with the ordinary ec2.RunInstances API call that can also be used for launching on-demand instances. This was enhanced in December last year to also enable the launch of spot instances, and addresses pretty much all limitations of the previous ec2.RequestSpotInstances API calls.

This offloads a lot of logic we previously needed to do ourselves, and allowed us to delete a significant amount of polling code. The code base has been significantly cleaned up and reduced in size by about a fifth, and in the process we also addressed some other issues you can see below, and slightly increased the test coverage.

Screenshot from 2018-07-13 21-28-53.png

This work was sponsored by HERE Technologies, who recently rolled out AutoSpotting on more than 400 AWS accounts and needed it in order to properly handle their largest accounts. It was successfully tested on one of their largest AWS accounts, where we didn’t notice any problems, and the memory usage was much lower than before. Extrapolating this test, we estimate that the currently default configuration should safely handle more than 1000 node replacements per AWS account, and could potentially be extended to about 5000 by increasing the Lambda function’s memory allocation. We don’t have access to such a large setup, but if you do please try it out and let us know what you see, we’d like to hear from you.

Better handling of out of capacity situations

As of December last year, AWS also fundamentally changed the way the spot market works by making the prices stable over time, and at the same time decoupling the bid price from the launch and termination of spot instances.

This change is largely beneficial, but unfortunately had some unforeseen implications on AutoSpotting because it can often fail to launch instances even if the bid price is much higher than the market price. This could cause AutoSpotting to launch multiple instances when the spot capacity couldn’t be fulfilled within a single Lambda function run. These instances were never tagged so in some circumstances the would not be set up but remained running outside the group.

This has been addressed out of the box by the ec2.RunInstances API call,which is automatically cancelling the spot instance requests it creates on our behalf if they can’t be fulfilled immediately. AutoSpotting will simply fail after a timeout of a few seconds, and will retry to launch another instance in the next runs until it succeeds.

Better handling of VPC, DefaultVPC and EC2 Classic security groups

The API for launching instances had some issues with the way we have to configure the security groups on newly launched spot instances. This sometimes needed groups to be given by ID, while other times by name, so we needed to do some ugly workarounds in order to support VPC, DefaultVPC and EC2 Classic at the same time. Even with those, it was failing to handle some edge cases, such as on DirectVPC environments created by CloudFormation code designed for EC2Classic, like it’s the case with ElasticBeanstalk environments running in Default VPC.

This has been addressed out of the box by the ec2.RunInstances API call, which can always accept security groups given by ID, so we could clean up all those workaround, and the code was tested and seemed to work reliably on all these flavors of EC2.

Support running in opt-out mode

By default AutoSpotting runs in opt-in mode, only taking over the groups tagged with a certain tag (by default “spot-enabled=true”) while ignoring all the other groups. This allows people to test it properly and gradually extend the rollout to their groups as they gain confidence in it. But eventually they may be confident enough to be able to enable it by default on all their groups, regardless if this tag was set or not. This feature can also be enforced by companies running lots of AWS accounts, who can use it by default on all their development accounts where the risk of suddenly enabling this feature can be relatively small but the benefit substantial enough to worth it.

This can be enabled by using the latest CloudFormation or Terraform infrastructure code, where it is exposed as an additional parameter. The default tag is “spot-enabled=true”, but this is configurable.

This work was also sponsored by HERE Technologies, who plans to eventually enable it on all their 200+ development-only AWS accounts, potentially generating savings in the millions dollars monthly.

Tagging improvements

AutoSpotting is now tagging all the launched spot instances with “launched-by-autospotting=true”, and it can also tag its own Lambda function at install time with a tag configurable at install time.

This work was also sponsored by HERE Technologies, who plans to use this to measure the overall runtime cost and the savings generated by AutoSpotting across their entire fleet.

Instance type expansion and price updates

AutoSpotting now has support for all the recently launched instance types, such as the C5D and M5D, including the automated compatibility checks for their storage volumes. The pricing information is now also up-to-date so we can use it to set more accurate bid prices.

Instance scale-in protection support

AutoSpotting now considers the scale-in protection flag that can be set on AutoScaling group members. Support for Instance Termination protection is still work in progress, and is expected to be released within the next few weeks.

This work was contributed by Jam ‘codejamninja’ Risser.

Fix compilation of macOS

AutoSpotting can now be built on Mac, without using Docker or VMs.

This work was also sponsored by HERE Technologies,

Smaller binaries

The binaries are now stripped of debugging information, which decreases them by about 20%

This work was also sponsored by HERE Technologies.

Terraform module in the Terraform Registry

Thanks to a contribution by Neill Turner, our slightly modified Terraform module is now published to the Terraform Registry, which makes it much easier to install by Terraform users, as easy as

module "autospotting" {
  source  = "cristim/autospotting/aws"
  version = "0.0.9"
}

HERE Technologies now supports development of AutoSpotting

The fact that you noticed so many contributions sponsored by HERE Technologies is not a coincidence. I am their full-time employee for a lot of time now, and as of a couple of months ago they started to allow me to work on AutoSpotting for 20% of my employment time in order to roll it out widely on their large fleet of AWS accounts. They also employ a couple of other occasional contributors, Artem Nikitin and Johannes ‘lenucksi’ Tigges.

Huge thanks to HERE for their support and the trust they put in this project.

New Patreon members

We recently got a few new Patreon members. Huge thanks to Golo Roden and Sumit Sarkar, who both pledged to donate $40 to the project each month in order to get access to the official binaries, while our first backer Ivan Kalinin keeps his $5 pledge started a few months ago.

If you really like this project and/or your company benefits significantly from it, please consider joining them, this makes a huge difference to its further development. You also receive better support and pre-built binaries that can be easily rolled out.

What’s next?

We’ll continue polishing AutoSpotting as part of the rollout at HERE Technologies, addressing various issues reported by their development teams. The first on the list so far is support for instance termination protection, which will hopefully land within a few more weeks.

Also, Artem is pretty advanced in his XRay support patch, which will enable us to instrument the runtime performance of AutoSpotting and see what are the bottlenecks on really large installations, so we can improve it further.

That’s all folks

Thanks for reading so far and I hoped I convinced you to give the latest version of AutoSpotting a try.

If you have any questions or comments please get in touch, I’ll personally answer to all of you

-Cristian

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s