Skip to content

How We Use Zuul At Netflix

matthew hawthorne edited this page Jun 12, 2013 · 3 revisions

zuul-physical-arch.png

Overview

Zuul gives us a lot of insight, flexibility, and resiliency, in part by making use of other Netflix OSS components:

  • Hystrix is used to wrap calls to our origins, which allows us to shed and prioritize traffic when issues occur

  • Ribbon is our client for all outbound requests from Zuul, which provides detailed information into network performance and errors, as well as handles software load balancing for even load distribution

  • Turbine aggregates fine­grained metrics in real­time so that we can quickly observe and react to problems

  • Archaius handles configuration and gives the ability to dynamically change properties

Surgical Routing

We can create a filter to route a specific customer or device to a separate API cluster for debugging. Prior to using Zuul, we were using Hadoop to query through billions of logged requests to find the several thousand requests we were interested in.

Stress Testing

We have an automated process that uses dynamic Archaius configurations within a Zuul filter to steadily increase the traffic routed to a small cluster of origin servers. As the instances receive more traffic, we measure their performance characteristics and capacity. This informs us of how many EC2 instances we will need to run at peak, whether our autoscaling policies need to be modified, and whether or not a particular build has the required performance characteristics to be pushed to production.

Multi-Region Resiliency

Zuul is central to our multi-region ELB resiliency project that we call Isthmus. As part of Isthmus Zuul is used to route requests from the west coast cloud region to the east coast to help us have multi-region redundancy in our ELBs for our critical domains.