Skip to content

Jobs on Kubernetes

Kubernetes is a powerful cloud agnostic platform and this extension provides a way to run batch jobs on Kubernetes. Note that this extension is only for jobs and not for any pipelines. Please refer to argo or Kubeflow to run pipelines on Kubernetes.

Additional dependencies

Magnus extensions needs additional packages to use this extension. Please install magnus-extensions via:

pip install "magnus_extensions[k8s]"

or

poetry add "magnus_extensions[k8s]"

Since kubernetes is a cloud based job scheduler, other services which are not accessible by cloud would not work.

Configuration:

executor:
  type: "kfp"
  config:
    config_path: str # Required
    docker_image: str # Required
    namespace: str # Defaults to "default"
    cpu_limit: str # Defaults to "250m"
    memory_limit: str # Defaults to "1G"
    gpu_limit: int # Defaults to 0
    gpu_vendor: str # Defaults to "nvidia.com/gpu"
    cpu_request: str # Defaults to cpu_limit
    memory_request: str # Defaults to memory_limit
    active_deadline_seconds: int # Defaults to 2 hours
    ttl_seconds_after_finished: int   #  Defaults to 1 minute
    image_pull_policy: str # Defaults to  "Always"
    secrets_from_k8s: dict # EnvVar=SecretName:Key
    persistent_volumes: dict # volume-name:mount_path
    labels: Dict[str, str]
  • config_path

The location of the kubeconfig file to submit jobs.

  • docker_image

The docker image to use to run the job. The docker image should be accessible from the Kubernetes cluster.

  • namespace

The namespace of the Kubernetes cluster to submit the jobs to. It defaults to "default".

  • cpu_limit

The default CPU limit for Kubernetes job. Defaults to "250m". Please refer to this documentation to understand more

  • memory_limit

The default memory limit for Kubernetes job. Defaults to 1G Please refer to this documentation to understand more

  • gpu_limit

The default GPU limit for Kubernetes job. Defaults to 0. Please refer to this documentation to understand more

  • gpu_vendor

The GPU type to use for Kubernetes job. The cluster should support the GPU type for this to work. Defaults to nvidia.com/gpu. [Please refer to this documentation to understand more.]https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

  • cpu_request

The default CPU request for Kubernetes job. Defaults to cpu_limit. Please refer to this documentation to understand more

  • memory_request

The default memory request for Kubernetes job. Defaults to memory_limit Please refer to this documentation to understand more

  • active_deadline_seconds

The maximum amount of time that the job can run on the kubernetes cluster. Defaults to 2 hours. Please use this value appropriately for your job.

Please refer to this documentation to understand more.

  • ttl_seconds_after_finished

The amount of time that the job/pod should be active after completing the job. Defaults to 1 minute. Please increase this time (in seconds) if you want to look into more debugging information.

  • image_pull_policy:

Set to "Always", the available options are: "IfNotPresent", "Always", "Never".

Warning

Use "IfNotPresent" cautiously, as the check happens on the tag of the docker image and an improper versioning strategy might result in wrong docker images being used.

  • secrets_from_k8s:

Use secrets stored in underlying K8's while running the containers. The format is EnvVar=SecretName:Key where

- EnvVar is the name of the Environment variable the secret should be in the container.
- SecretName: The name of the secret in K8's.
- Key: The key in the secret that should be exposed in the container.
  • persistent_volumes

Volumes to mount from the underlying cluster onto the container during the execution of the job.

The format is name-of-the-volume:mountpoint.

  • labels

Any labels that you wish to apply to the job.