, and negotiate staffing plans. Manage the end-to-end cluster development cycle, including forecasting, sourcing, procurement...
personnel to create costed bills of material (BOMs) for rack and cluster level solutions Partner with business development...
stacks, and cluster environments. This role requires good understanding and experience in ROCm, CUDA, GPU architecture, ML...
. Lead and manage interconnection applications and queue positions during the cluster study phases in NYISO/PJM/SERC...
of running AI/HPC workloads in single node and cluster level and develop test suites and performance automation. Lead the debug...
, and automated provisioning. Strong experience in Kafka cluster management, topic configuration, performance tuning, and ensuring...
consistency Proficiency in monitoring cluster health and resource utilization Ability to troubleshoot complex database...
networking. Experience with PCIe, CXL, NVMe interconnects and cluster schedulers (Kubernetes, Slurm). Proven ability...
. Expertise in Databricks components such as Delta Lake, Notebooks, Pipelines, cluster management, and cloud integration (Azure...
. Expertise in Databricks components such as Delta Lake, Notebooks, Pipelines, cluster management, and cloud integration (Azure...
Contribute to the evolution of OCI's infrastructure into next-gen cluster and automation frameworks Qualifications: Disclaimer...
. Experiences to run workloads, especially AI models, on large scale heterogeneous cluster Familiarity with clusters...
, EC2, RDS, S3, CloudWatch, IAM) and Kubernetes including multi-cluster management * Strong programming skills (Python...
, CloudWatch, IAM) and Kubernetes including cluster management Proficient programming skills (Python, Go, or Java...
, and reliability. Standardize Databricks workspaces-cluster policies, repos/CI/CD, secrets, cost guardrails, and operational SLAs...
, storage systems, networking components, and cluster automation. Integrate modern technologies to ensure platform components...
, storage systems, networking components, and cluster automation. Integrate modern technologies to ensure platform components...
to validate Scale-up and Scale-out architectures, including definition of test plans at cluster-level as well as writing code... to validate, monitor and root-cause errors at cluster-level. Make improvements to system level integration test strategies...
stringent performance and functional requirements across diverse cluster environments. THE PERSON: You are accustomed.... Experience with Linux/UNIX environments and cluster-computing concepts. Familiarity with network technologies relevant to HPC...
with Infrastructure as Code, and automated provisioning. Strong experience in Kafka cluster management, topic configuration, performance..., and automated provisioning. Strong experience in Kafka cluster management, topic configuration, performance tuning, and ensuring...