As of July 25th, there are 135 open site reliability engineer roles on Built In New York.
We expect that number to grow very quickly in the near future.
SRE is not a new concept, but the cementation of DevOps in the tech industry has led to a surge of interest in it. It’s a natural evolution, with companies moving to incorporate practices that enable them to both quickly develop and release software and services to those who ensure said software and services actually stay up, are safe and are functioning properly.
Having a dedicated team responsible for these functions both ensures a better user experience and frees up teams that used to take on these tasks to focus on their core functions. The benefits of this combination are plain to see, at least according to Greg Behrend, director of technical operations at Flex.
“As we look to scale and grow more — like any organization — we can simultaneously focus on innovation and a positive customer experience,” Behrend said. “A team of SREs using a healthy DevOps philosophy allows us to do that.”
Making the shift from DevOps to SRE is not something every company will or has to do, nor is it a move that can be made overnight. Jason Meredith, lead DevOps engineer at Caesars Digital, said that while his team is currently in the DevOps camp, he does envision its SRE department growing in the near future. That said, there are a few considerations Meredith is taking into account before going all in on SRE.
What are the key differences between DevOps and site reliability engineering?
It was only just 15-ish years ago that DevOps didn’t exist. There were UNIX admins — or some other sort — architects and network engineers. Then it got a bit simpler as things improved, but you had to memorize more tools. That’s when DevOps started happening, at least that’s when it did for me. It coincided with the rise in popularity of CFEngine, one of the first configuration management tools released. As time went on, more continued to pop up, and as niches needed to be filled, CI/CD grew in popularity. Increased monitoring, logging and alerting platforms came, and then people realized attention needed to be paid to the app’s stability, alerting process and uptime lifecycle. This launched SRE. Some companies now have CI/CD engineers just to handle job automation.
Right now, we are very much DevOps. We want everyone to own and understand as much as possible. Not only do I believe it helps us figure out the root of the problem, but it keeps everyone happy and adds to their resume and skill set. We do have an SRM, or site reliability manager, therefore I definitely see us growing our SRE team in the very near future.
As we grow, things get more complex.”
What prompted your team to shift from DevOps to SRE?
When things are simple, it’s easy for DevOps to understand the full picture and provide the support an SRE would. As we grow, things get more complex, making it difficult to master the necessary skills, tools and current best practices. You need a team whose main purpose is to help maintain stability in the platform and has the capability to fix issues, alerts and bugs as quickly and simply as possible. Next, you need to have a secondary focus on how the architecture can either fix itself or ensure the issue doesn’t happen again.
It’s important to avoid DevOps being pulled in multiple directions. Having other team leads wait for you to complete changes they have could delay sprints and requirement gathering. The problems of today tend to supersede the potential problems of tomorrow.
What impact has this shift had on your engineering organization, the tech you build or the business as a whole?
As we continue to sort this out, the complexity of this response errs on the side of inconclusive. At this moment we are all DevOps engineers, infrastructure/platform engineers and principal engineers working toward the same goal. Accepting this and applying it to our current team is a shift we look forward to and hope to see happen. That being said, there is more than one way to address the current shortcomings we have while making things better overall. We have yet to draw a line in the sand. Our DevOps team is currently performing the SRE work and for now, that still works. As with anything else in life, nothing is perfect. The future path is unknown, but we strive to make it the best one possible.
Tech can be as much about keeping up with the Joneses as it is blazing new trails, which sometimes means that companies adopt new technologies or functions without fully knowing what their benefit will be. That’s not the case for Flex and SRE, though. Greg Behrend, director of technical operations, told Built In exactly how he envisions SRE helping Flex reach its growth goals.
What are the key differences between DevOps and site reliability engineering?
DevOps and Site Reliability Engineering, or SRE, are two different but complementary concepts. While DevOps focuses on best practices for software development and delivery, SRE optimizes the implementation of those practices as well as the availability, latency, performance, capacity, security and scalability of the systems where that software is delivered. We look at DevOps as a subset of our SRE team, as sound DevOps is essential to optimal SRE performance.
We look at DevOps as a subset of our SRE team, as sound DevOps is essential to optimal SRE performance.”
What prompted your team to shift from DevOps to SRE?
As our engineering team continues to implement an ever-increasing feature set to compete in our market, scalability and availability of our product are essential to survival. It is not enough to have a good approach to software development and deployment but to ensure that software is available and performs for the customers that we have and that we want. We look to SRE to be the catalyst for that growth.
What impact has this shift had on your engineering organization, the tech you build or the business as a whole?
It has had a positive impact across our development, deployment and operational processes. The most notable has been to remove the burden of ensuring a well-functioning systems infrastructure away from our product engineers to a dedicated team of SREs. As we look to scale and grow more — like any organization — we can simultaneously focus on innovation and a positive customer experience. A team of SREs using a healthy DevOps philosophy allows us to do that.