Machine Learning Engineer - Inference
Job ID: 590690 Posted: 5/17/2026Machine Learning Engineer - Inference
Description
## About the Role
Together AI is seeking a skilled and performance-focused **Machine Learning Engineer** to join its Inference Engine team. This role focuses on optimizing, scaling, and enhancing AI inference systems that support state-of-the-art large language models and other advanced AI workloads.
The Machine Learning Engineer will help design and build production-grade AI infrastructure that delivers efficient, reliable, and high-performance model serving at scale. This position is ideal for an engineer who understands both machine learning systems and production infrastructure, and who enjoys working close to the runtime layer where model performance, latency, throughput, reliability, and scalability all matter.
The engineer will collaborate closely with AI researchers, infrastructure engineers, platform teams, and product stakeholders to bring new model-serving capabilities to users. This role requires strong software engineering skills, experience with machine learning systems, and the ability to develop reliable services that support demanding AI workloads in production environments.
## Key Responsibilities
* Design, build, test, and maintain production systems for large-scale AI inference.
* Develop and optimize runtime inference services for state-of-the-art large language models and related AI systems.
* Improve inference efficiency, latency, throughput, reliability, scalability, and cost performance.
* Collaborate with AI researchers and engineers to bring new model-serving capabilities from research and experimentation into production.
* Build services, tools, and infrastructure that support efficient deployment, serving, monitoring, and operation of AI models.
* Implement robust systems for data ingestion, processing, transformation, and delivery to support inference workflows.
* Contribute to architecture decisions for high-performance AI infrastructure and model-serving platforms.
* Analyze system bottlenecks and improve performance across compute, memory, networking, scheduling, batching, and request handling.
* Support the deployment and scaling of large language models across production environments.
* Build tooling to improve developer productivity, observability, debugging, testing, and operational reliability.
* Conduct design reviews and code reviews to maintain high engineering standards.
* Work with cross-functional teams to understand technical requirements, prioritize improvements, and deliver reliable infrastructure.
* Monitor production systems and help troubleshoot issues related to performance, reliability, scaling, failures, and model-serving behavior.
* Document system architecture, technical decisions, implementation details, and operational procedures.
* Stay current with emerging AI infrastructure patterns, inference optimization techniques, model-serving frameworks, and machine learning systems research.
## Required Qualifications
* Strong software engineering experience with the ability to build reliable, scalable, and maintainable production systems.
* Experience working with machine learning systems, AI infrastructure, model serving, or high-performance backend services.
* Strong programming skills in Python, C++, Go, Rust, Java, or another systems or backend engineering language.
* Experience designing and building distributed systems, runtime services, APIs, data pipelines, or cloud-based infrastructure.
* Understanding of large language models, inference workloads, model deployment, and AI serving requirements.
* Ability to optimize systems for latency, throughput, reliability, efficiency, and scalability.
* Experience debugging complex production systems and identifying performance bottlenecks.
* Familiarity with data ingestion, processing, and transformation systems.
* Strong collaboration skills and the ability to work with researchers, engineers, and technical stakeholders.
* Experience participating in design reviews, code reviews, testing, documentation, and production readiness processes.
* Strong problem-solving skills and the ability to work through complex technical challenges in a fast-moving environment.
## Preferred Qualifications
* Experience building or optimizing inference systems for large language models, generative AI models, or deep learning workloads.
* Familiarity with model-serving frameworks, GPU-based computing, distributed inference, batching, caching, scheduling, or request routing.
* Experience with PyTorch, TensorFlow, JAX, Triton, CUDA, ONNX, vLLM, TensorRT, Ray, Kubernetes, Docker, or similar technologies.
* Experience working with GPUs, accelerators, memory optimization, parallelism, quantization, or inference runtime optimization.
* Familiarity with cloud platforms, container orchestration, observability tools, CI/CD pipelines, and production monitoring systems.
* Experience with high-throughput APIs, streaming systems, message queues, or large-scale data processing.
* Understanding of reliability engineering, service-level objectives, incident response, observability, and production operations.
* Experience working in an AI infrastructure company, research-oriented engineering team, cloud platform, or high-scale SaaS environment.
* Familiarity with open-source AI models, model deployment workflows, benchmark testing, or performance evaluation methods.
* Advanced degree in computer science, machine learning, artificial intelligence, distributed systems, or a related technical field.
## Ideal Candidate Profile
The ideal candidate is a practical and technically strong engineer who enjoys building the systems that make advanced AI models usable in production. This person understands that high-quality inference infrastructure must be fast, reliable, scalable, observable, and efficient.
They should be comfortable working at the intersection of machine learning, backend engineering, distributed systems, and production operations. The right candidate can collaborate with researchers to understand new model requirements, work with engineers to build robust serving systems, and improve performance through careful analysis, testing, and optimization.
This person should be curious, detail-oriented, and comfortable solving difficult technical problems. They should enjoy working on infrastructure where milliseconds, memory usage, throughput, and reliability can have a major impact on user experience and business outcomes.
## Work Environment
This role will involve close collaboration with AI researchers, infrastructure engineers, platform teams, and product stakeholders. The Machine Learning Engineer should be comfortable working in a fast-paced technical environment where priorities may evolve as models, customer needs, and infrastructure requirements change.
The position requires strong communication, self-direction, and a commitment to engineering quality. The engineer will participate in planning discussions, design reviews, code reviews, system testing, production monitoring, and ongoing infrastructure improvement.
## Compensation and Benefits
Compensation will be based on experience, qualifications, and overall fit for the role. Benefits may include:
* Competitive salary
* Full-time employment
* Remote or flexible work options, depending on company policy
* Health, dental, and vision benefits
* Paid time off
* Retirement plan options
* Professional development opportunities
* Opportunity to work on state-of-the-art AI infrastructure
* Opportunity to collaborate with AI researchers and advanced engineering teams
* Collaborative, innovative, and technically challenging work environment
## Equal Opportunity Statement
Together AI is an equal opportunity employer and welcomes applicants from all backgrounds. Employment decisions are based on qualifications, experience, skills, and business needs. The company is committed to maintaining a respectful, inclusive, and collaborative work environment.
Preferred Skills
- Knowledge of AI inference systems such as TGI, vLLM, TensorRT-LLM, or Optimum
- Knowledge of AI inference techniques such as speculative decoding
- Knowledge of CUDA or Triton programming
- Knowledge of Rust, Cython, and compilers
Technology / Software Requirements
- Python
- PyTorch
Benefits Offered
- Competitive health insurance plans
- Dental and vision insurance
- Flexible time off policy
- Pre-tax health and flexible spending accounts
- Generous parental leave
- Competitive salary and equity packages
- 401(k) matching
- Life and disability protection plans
- Lunch, dinner, snacks, and coffee available in the office
- Parking and transit stipends
- Company events such as happy hours and team events
- Relocation stipend available for employees moving to the Bay Area
- Fireside chats with leading experts
- Conference expenses covered when relevant to the employee role or field