Parasail Jobs

Senior Software Engineer, LLM Performance

Parasail

Senior Software Engineer, LLM Performance

Reposted 11 Days Ago

In-Office or Remote

Hiring Remotely in CA

Senior level

In-Office or Remote

Hiring Remotely in CA

Senior level

Optimize and integrate LLMs across the stack from GPU kernels to Kubernetes deployments. Improve inference performance via kernel development, algorithmic techniques (quantization, speculative decoding), and contributions to open-source LLM engines like vLLM. Drive hardware utilization, profiling, and enterprise-grade scalable implementations.

The summary above was generated by AI

Parasail is redefining AI infrastructure by enabling seamless deployment across a distributed network of GPUs, optimizing for cost, performance, and flexibility. Our mission is to empower AI developers with a fast, cost-efficient, and scalable cloud experience—free from vendor lock-in and designed for the next generation of AI workloads.

Job Description:

The Senior Software Engineer, LLM Performance plays a crucial role in delivering a competitive platform by focusing on efficiently scheduling, executing, and managing AI workloads on distributed compute systems. This role is deeply technical, spanning from low-level GPU kernels to distributed AI orchestration and Kubernetes (K8s) deployments. It is about more than optimization; it’s about pioneering efficient infrastructure that supports AI’s transformative role in reshaping productivity, revolutionizing industries, and addressing some of the world’s most challenging problems. You’ll ensure that generative AI — including large language models (LLMs), multi-modal models, and diffusion models — operates efficiently at enterprise scale while driving continuous improvements in cost, performance, and sustainability.

Responsibilities:

Add support for new LLMs, working across the stack from low-level GPU kernels to Kubernetes-based deployments.
Contribute to cutting-edge open-source LLM engines such as vLLM or SGLang to extend their capabilities and performance (e.g. use Python technologies to improve API servers or request schedulers).
Operate closer to the hardware, focusing on building and integrating solutions to boost performance and hardware utilization. For example, improve attention backends like FlashAttention or FlashInfer by contributing to their development and optimization, or by integrating their solutions into vLLM.
Improve LLM performance using advanced algorithmic solutions such as speculative decoding, quantization, or other state-of-the-art techniques. Understand the impact of such techniques in model quality.

Qualifications:

Expertise in GPU computing, including low-level platforms such as CUDA, ROCm, XLA, PyTorch, Jax, etc.
Background in performance analysis and optimization of AI/HPC workloads (e.g. profiling or theoretical analysis of Flops and bandwidth).
Experience in writing GPU kernels using technologies like CUDA, CUTLASS, Triton.
Strength in Python and C++.
Demonstrated contributions to open-source projects. Contributions to inference engines such as vLLM is a strong plus.
A production-oriented mindset emphasizing robust, scalable code suitable for enterprise-grade applications.
A relentless curiosity about cutting-edge AI technologies combined with a passion for solving complex problems.

What You Bring to the Table: We are looking for people who are eager to learn and master the lower-level compute concepts that are critical for the AI revolution. With us, your skills will not only contribute to coding but will also have a significant impact on the scalability and efficiency of AI applications at large. If you're geared up for the challenge of optimizing AI performance and eager to push our technological prowess to new heights, we're excited to welcome you aboard.

Similar Jobs

Inspiren

Senior Data Scientist

10 Hours Ago

Easy Apply

In-Office or Remote

Canada

Easy Apply

Senior level

Artificial Intelligence • Hardware • Healthtech • Software

The Senior Data Scientist will build models and analyses, design experiments, integrate datasets, and leverage AI for improved workflows and insights in data science.

Top Skills: DatabricksMlflowPandasPythonPyTorch

Zapier

Staff Engineer

10 Hours Ago

Remote

Canada

Senior level

Artificial Intelligence • Productivity • Software • Automation

As a Staff Engineer for Revenue, you'll shape technical vision and architecture for billing and pricing systems, ensuring correctness while enhancing cross-team collaboration.

Top Skills: APIsBilling SystemsPerformance OptimizationSubscription Management

Optum

Senior Software Engineer

10 Hours Ago

In-Office or Remote

Senior level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

The Senior Software Engineer will design, develop, and operate cloud-based services while maintaining production systems. Responsibilities include coding, CI/CD, and collaborating with cross-functional teams to improve engineering practices and developer productivity.

Top Skills: .NetAsp.Net CoreAWSAzureC#DockerGCPJavaScriptReactTypescript

What you need to know about the Calgary Tech Scene

Employees can spend up to one-third of their life at work, so choosing the right company is crucial, not just for the job itself but for the company culture as well. While startups often offer dynamic culture and growth opportunities, large corporations provide benefits like career development and networking, especially appealing to recent graduates. Fortunately, Calgary stands out as a hub for both, recognized as one of Startup Genome's Top 100 Emerging Ecosystems, while also playing host to a number of multinational enterprises. In Calgary, job seekers can find a wide range of opportunities.