Scaling Deployments¶

Use cases¶

Continuous delivery.
Deployment and operations.

Current state¶

This discovery collects the missing pieces in Tutor that are required to dynamically increase or decrease the number of LMS and Studio workers.

This is something that is not available in edX at the moment, since the stack is deployed on OpenStack with a fixed number of processes. The lack of this support was mitigated by deploying multiple instances of edX installation behind a load balancer.

Now that Grove exists and utilizes Tutor to set up instances on Kubernetes bases, we have multiple options to resolve this long-standing issue. This discovery targets those options, evaluating the pros and cons of each of them.

Approaches For Scaling¶

Manual scaling¶

To scale deployments, the fastest and most obvious way is using kubectl scale –replicas=... command, as described in Tutor’s documentation. The biggest issue with scaling this is the manual intervention when needed. Hence, this option is not evaluated in more depth.

Pros:

The fastest way to scale deployments up or down
Easy workflow

Cons:

Manual intervention needed
Secrets and credentials should be available locally to manage scaling
We must actively monitor resource usages to know when to scale up or down

Pre-configured scaling¶

Another approach is having pre-configured scaling. We could set the required number of replicas in the configuration of provisioned instances (per instance). This is a bit more flexible option, also we know who, why and when changed the scaling options for a given instance.

Pros:

Fast way to scale deployments up or down
Easy workflow
No local credentials or secrets needed
Version controlled

Cons:

Manual config update needed
We must actively monitor resource usages to know when to scale up or down

Auto-scaling¶

The next approach we could use is “Horizontal Pod Autoscaling” which scales up or down resources automatically, based on the Kubernetes-monitored resource usage. The command to set up auto-scaling is kubectl auto-scale rs <replica set> --min=<X> --max=<Y> --cpu-percent=<Z> for CPU-based auto-scaling. Although this approach is the most feasible so far among the listed approaches, it is still not as good as needed.

Pros:

Fast way to scale deployments up or down
Easy workflow
No local credentials or secrets needed
Resources are monitored by Kubernetes

Cons:

Auto-scaling boundaries are manually set

Pre-configured auto-scaling¶

This approach is a mixture of pre-configured and auto-scaling approaches, utilizing the pros of both approaches. The resources are monitored by Kubernetes, auto-scaling is set by GitLab pipelines or Tutor, and the configuration is version controlled.

Pros:

Fast way to scale deployments up or down
Easy workflow
No local credentials or secrets needed
Resources are monitored by Kubernetes
Min and Max number of pods are version controlled

Cons:

In the case of GitLab pipelines managed auto-scaling, we expose one more tool for pipelines, kubectl. This logic should be hidden for the pipeline and moved behind Tutor, though that requires some deployment refactoring in Tutor as described in the Proposed solution section.

CeleryBeat workers¶

EdX platform has the ability of using CeleryBeat combined with the single-beat Python package to ensure only one beat scheduler is running at the same time.

Having CeleryBeat scheduler process/pod in Tutor is not implemented yet. This makes it impossible to use periodic instructor reports as it depends on that. To resolve this issue, we need to add the scheduler process support for deployments on k8s, and to other installation options as well, such as dev or local.

Proposed solution¶

As it is visible above, the most flexible and sustainable approach is using auto-scaling configured by Tutor as per the config tracked by VCS.

Although the approach is not complicated, some refactoring in the deployment resource definition is needed to include a HorizontalPodAutoscaler resource (api version: autoscaling/v2beta2). This resource definition should be optional and enabled in an opt-in manner since not everyone needs auto-scaling. A similar conditional is already defined in Tutor, so we should follow the pattern used there.

Since HorizontalPodAutoscaler requires resource limits on deployments to be set per container. That configuration should be optional as well, and depend on the enablement of auto-scaling. If auto-scaling is not enabled, the resource limits shouldn't be set at all.

The auto-scaler resource would look similar to the following definition, though keep in mind that it is just an example and not tested or fine-tuned at all.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: cms-hpa
  labels:
    app.kubernetes.io/name: cms-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cms
  minReplicas: {{ CMS_MIN_REPLICAS }}
  maxReplicas: {{ CMS_MAX_REPLICAS }}
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 1792Mi  # 75% of 2Gi

By setting the HPA resource, kubernetes will be aware of auto-scaling, and we do not depend on GitLab pipeline, which resolves the only concern against this approach.

As of the CeleryBeat workers, we need to allow an extra processes running CeleryBeat. Since the auto-scaling will only scale lms/studio apps and workers, we don't need to add single-beat, but we must make sure -- at least by adding a comment -- that the process running CeleryBeat is not running multiple times per instance. Otherwise, scheduled jobs could be duplicated.