The 2018 holiday shopping season provided marketing platform Bluecore a promising, but telling, snapshot.
According to VP of Engineering James Lewis, email volume around that time was up between 400-500 percent. But to accommodate for future growth, the retail-focused martech company — whose clients include Sephora, Express and CVS — needed to retool its email infrastructure.
“Based on that growth, and based on that incremental cost to send emails, we were going to run into a problem delivering all the emails that we needed to, as well as the ability to pay for the infrastructure to deliver them,” Lewis said.
According to software engineer Lily Wu, the team got to work in early 2019 on designing the new pipeline, which is constructed primarily on Kubernetes. By the summer, they began rolling out static emails before graduating to more sophisticated, personalized emails. This spring, the company began the multi-month process of migrating customers’ entire email programs.
Lewis said the initial results have been promising, citing a five-fold increase in “peak capacity” for static emails and a four-fold increase for personalized emails. The true potential, he said, remains to be seen, as they’ve limited testing in order to remain cost-conscious. However, he anticipates a need to ramp up testing in anticipation of the 2020 holidays.
“We’re expecting an even larger order of magnitude of traffic than in previous holiday seasons,” Lewis said. “Over the next quarter, we’ll probably need to be testing up to 30-times our normal traffic to ensure that we have the capacity for the holiday season.”
To pull off the project, abiding by agile “incremental deliverables” principles, conducting shadow testing and reviewing performance often were instrumental in building a more resilient email pipeline while adhering to cost goals.
How did you stay on track during such a time-intensive project?
Lewis: We started with a very small proof of concept to ensure that we were on the right track and to validate that our re-architecting was going to work at all. Then, we built off of those initial learnings as we implemented the first version of it.
There are different sorts of emails that we send. Some of them are “static” emails — no personalized content, just straight HTML, and as we progress up that personalization stack, we layer on more and more personalized components to drive value. The escalation of email features allowed us to have staged releases, where we could release certain types of emails through the new infrastructure to build the team’s confidence. Incremental deliverables really helped keep everyone rowing in the same direction.
Wu: We were really focused on saving costs, but at the same time, making sure that we stayed reliable and that clients didn’t notice a difference in email sends. We didn’t worry about extra features that only a few clients might have but instead focused on the parts of the pipeline that are actually the biggest drivers for cost.
Noam Sohn: “Dark window” — which locks email sends during specific times — is an example of a feature that was being used by one client. We didn’t spend our time moving it into the new system. We focused on things that were going to get the wins that we can celebrate.
How did you measure performance along the way?
Lewis: Early on in the project, one of the engineers working on the initial implementation built some tooling for running our load tests — sending millions, and eventually tens of millions, of emails through the new pipeline. That was a good baseline to refer to whenever we made substantial changes. We run that rather frequently, compare baseline-over-baseline, figure out what the next bottleneck was, and fix and rerun it.
With the secondary goal of driving the cost down, without instrumentation of our cloud billing, we were able to figure out how much we were running in terms of costs for these tests. That gave us good, directionally-accurate information about the impact our incremental costs were going to be with the new pipeline.
Sohn: We spent a lot of time shadow testing: We’ll run an email campaign in our original pipeline, then an hour later, we’ll run the exact same campaign in the new pipeline and compare them. During that, we were also able to measure performance to see that our system was able to complete the same send much quicker, which was exciting to see that our sending performance was improving.
Wu: Shadow testing is one of those tools that we used extensively to make sure that the content is the same as the new pipeline. It also helps to ensure that we’re also delivering the same amount of emails and the exact same content for those emails.
Incremental deliverables really helped keep everyone rowing in the same direction.”
Why did the team lean on Kubernetes for the project?
Wu: We decided on it because of the low computational costs compared to App Engine, which is what we were using previously. Additionally, once we started writing the new email pipeline in Kubernetes, there were a few other microservices that we thought we could also pull out from the pipeline into its own service.
For example, we have a service that keeps track of the minimum time between emails. We split that into its own service. Because we split that out, we can use it in this new pipeline and also in our old pipeline. That helps make sure that this new pipeline wouldn’t affect the email sending in the old pipeline. Kubernetes makes it really easy to have microservices that can talk to each other.
Sohn: We weren’t just deprecating our old system; we had to also make sure that everything could work in both systems since we have clients who still use the old system. It was really a cross-functional project — different teams picked up on the things that they could get into Kubernetes and helped make that happen for us.
Lewis: As we worked with partner engineering teams to provide functionality for the new email pipeline, we saw an explosion in the number of services we were hosting on Kubernetes. At this point, there are around 15 different services that are running on Kubernetes. We built a handful of those in order to support the email pipeline, but a lot of them were part of the ecosystem around our email pipeline and other independent components of our infrastructure that were being built or rebuilt.
Infrastructure upgrade advice from Bluecore engineers:
- Lewis: “Make sure that the executives understand both the milestones and the impact.”
- Wu: “Have a consistent way of monitoring and measuring.”
- Sohn: “Build incrementally and test frequently.”
What wins have emerged from the upgrade?
Sohn: We’ve taken this opportunity to address any tech debt, simplify our pipeline and only use the features that we actually want to support. Internally, we now have more engineers who know the end-to-end system because we are the ones who wrote it. That makes it easier to build up the system and pass along knowledge from engineer to engineer.
Wu: Additionally, with our redesign, we made the pipeline more microservice-friendly. We separated the parts of the pipeline more clearly so that, in the future, we could potentially split out different parts of the pipeline into their own microservices. The personalization portion is one of those pieces.
Lewis: This infrastructure investment made client goals for email sends more feasible. Clients noticed that their daily 10 million email send was completing in half or a third of the time than it was previously. Additionally, we were able to personalize a substantial amount more than we had been in a shorter time period. That was something that our marketing teams were very excited about and helped with new business conversations and existing client upsells.