, OpenAI Triton Distributed systems: Ray, Megatron-LM Performance analysis tools: NSight Compute, nvprof, PyTorch Profiler... Proven track record architecting distributed training systems handling large scale systems Expert knowledge of transformer...
and a distributed systems generalist, able to dive deep into any part of the stack and low-level systems and design broad distributed... understanding of operating systems, hardware-software integration, distributed services, and cloud-scale automation...
and highly available distributed systems. Background in operating systems, compute architecture, virtualization, CPU and GPU...-on engineers with expertise and passion for solving complex problems in distributed systems, storage infrastructure, transaction...
. We're looking for hands-on engineers with expertise and passion in solving difficult problems in distributed systems.... You should be both a rock solid coder and a lead level engineer, able to dive deep into any part of the stack and low level systems, as well...