AI Infrastructure Engineer

Category: Engineering

Employment Type: Direct Hire

Reference:  BH-388481

AI Infrastructure Engineer

About the Position
We are looking for a senior AI Inference Infrastructure Software Engineer with strong hands-on experience building, optimizing, and deploying high-performance, scalable inference systems. This position is focused on designing, implementing, and delivering production-grade software that powers real-world applications of Large Language Models (LLMs) and Vision-Language Models (VLMs).
This is an exciting opportunity for an engineer who thrives at the intersection of AI systems, hardware acceleration, and large-scale robust deployment, and who wants to see their contributions ship in production, at scale.
In this role, you will directly shape the architecture, roadmap and performance of AI capabilities of our AIOS platform, driving innovations that make LLM/VLM systems fast, efficient, and scalable across cloud, edge, and hybrid edge-cloud environments. You will work closely with system, hardware, and product teams to deliver high-performance inference kernels for hardware accelerators, design scalable inference serving systems, and integrate optimizations such tensor parallelism and custom kernels into production pipelines. Your work will have immediate impact, powering intelligent automotive systems in the next generation of electric vehicles.

Roles and Responsibilities:

  • Design and implement high-performance, scalable inference systems for LLMs and VLMs across cloud, edge, and edge-cloud hybrid platforms.
  • Develop and optimize custom kernels and operators for specific hardware accelerators (GPU, NPU, DSP, etc.), improving throughput, latency, and memory efficiency.
  • Integrate advanced optimization techniques such as KV-cache management, tensor/model parallelism, quantization, and memory-efficient execution into production inference systems.
  • Partner with system and hardware teams to ensure tight hardware-software integration and optimal performance across diverse compute environments.
  • Translate architectural requirements into robust, maintainable, production-ready software that meets performance, safety, and reliability standards.
  • Define and drive the evolution roadmap for LLM/VLM inference in the AIOS stack, ensuring scalability and adaptability to new workloads.
  • Stay ahead of industry trends and competitor solutions, applying best practices from both AI and large-scale systems engineering.

Must Qualifications:
  • 5+ years of hands-on software development experience in building and optimizing AI inference systems at scale.
  • Direct experience in LLM/VLM model internals, including Transformer-based architectures, inference bottlenecks, and optimization techniques.
  • Strong expertise in performance engineering: kernel development, parallelism strategies, memory optimization, and distributed inference systems.
  • Proficiency with GPU/NPU programming (CUDA, or vendor-specific SDKs), compiler toolchains, and deep learning frameworks (PyTorch, or TensorFlow).
  • Strong programming skills in C/C++, with a track record of delivering high-performance, production-grade software.
  • Solid foundation in computer architecture, systems programming (CPU/GPU pipelines, memory hierarchy, scheduling), and embedded systems.
  • BS/MS in Computer Science, Computer Engineering, or related technical field.
  • Excellent communication and collaboration skills, with the ability to work across cross-functional teams.

Preferred Qualifications:
  • Master’s or PhD degree in Computer Science, Electrical/Computer Engineering, or related fields, plus 5 years industry experience
  • Experience building inference serving systems for large models, including batching, scheduling, caching, and load balancing.
  • Expertise in hardware-aware model optimization (e.g., kernel fusion, mixed precision, quantization, pruning).
  • Familiarity with edge and embedded AI, including real-time constraints and limited-resource optimization.
  • Contributions to widely used AI frameworks, libraries, or performance-critical software (open source or proprietary).


Estimated Min Rate: $200,000.00
Estimated Max Rate: $300,000.00


What’s In It for You?
We welcome you to be a part of the largest and legendary global staffing companies to meet your career aspirations. Yoh’s network of client companies has been employing professionals like you for over 65 years in the U.S., UK and Canada. Join Yoh’s extensive talent community that will provide you with access to Yoh’s vast network of opportunities and gain access to this exclusive opportunity available to you. Benefit eligibility is in accordance with applicable laws and client requirements. Benefits include:

  • Medical, Prescription, Dental & Vision Benefits (for employees working 20+ hours per week)
  • Health Savings Account (HSA) (for employees working 20+ hours per week)
  • Life & Disability Insurance (for employees working 20+ hours per week)
  • MetLife Voluntary Benefits
  • Employee Assistance Program (EAP)
  • 401K Retirement Savings Plan
  • Direct Deposit & weekly epayroll
  • Referral Bonus Programs
  • Certification and training opportunities

Note: Any pay ranges displayed are estimations. Actual pay is determined by an applicant's experience, technical expertise, and other qualifications as listed in the job description. All qualified applicants are welcome to apply.

Yoh, a Day & Zimmermann company, is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Visit https://www.yoh.com/applicants-with-disabilities to contact us if you are an individual with a disability and require accommodation in the application process.

For California applicants, qualified applicants with arrest or conviction records will be considered for employment in accordance with the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. All of the material job duties described in this posting are job duties for which a criminal history may have a direct, adverse, and negative relationship potentially resulting in the withdrawal of a conditional offer of employment.


Posted on 09-04-2025

AI Infrastructure Engineer

Engineering

Engineer

Direct Hire

Apply Now
Create as Alert
Share this job
Interested in this job?
Save Job

Similar Jobs

SCHEMA MARKUP ( This text will only show on the editor. )