Skip to content

Crane, an elastic GPU cluster manager with batteries included.

Build Status

Note

Crane is under active development and subject to major changes in the near future.

Crane is a cluster manager specialized for elastically scheduling GPU resources. Major strengths include:

  • Gang-Scheduling: GPU resource is scheduled in the unit of container groups (called cargos), allowing the execution of multiple distributed DL training jobs on a single cluster.
  • Elastic: Cargos can be dynamically resized. Furthormore, the resource reserved for an entire job (called mini-cluster) can be resized, allowing AutoML jobs to run, pause, and kill separate trials.
  • Multi-tenant: Crane supports multiple users and diverse job types inside a single cluster.
  • Transparent: Crane exposes GPU usage to its app. Each app can query its own GPU usage statistics.
  • Batteries Included: Crane supports all these features out-of-the-box.

Last update: March 2, 2022