Crane, an elastic GPU cluster manager with batteries included.

Crane is under active development and subject to major changes in the near future.

Crane is a cluster manager specialized for elastically scheduling GPU resources. Major strengths include:

  • Gang-Scheduling: GPU resource is scheduled in the unit of container groups (called cargos), allowing the execution of multiple distributed DL training jobs on a single cluster.
  • Elastic: Cargos can be dynamically resized. Furthormore, the resource reserved for an entire job (called mini-cluster) can be resized, allowing AutoML jobs to run, pause, and kill separate trials.
  • Multi-tenant: Crane supports multiple users and diverse job types inside a single cluster.
  • Transparent: Crane exposes GPU usage to its app. Each app can query its own GPU usage statistics.
  • Batteries Included: Crane supports all these features out-of-the-box.

Last update: March 2, 2022