Job Description
Are you ready to engineer the infrastructure of tomorrow? NexaCore Technologies is pioneering the future of autonomous systems and is seeking a visionary AI Infrastructure Lead to join our elite engineering team in San Francisco. We are looking for a technical expert who thrives on solving complex scalability challenges and building robust, future-proof architectures for next-generation AI models.
In this pivotal role, you will spearhead the deployment of our proprietary neural networks across global cloud ecosystems. You will define the technical roadmap for our 2026 infrastructure goals, ensuring our systems are not only efficient today but scalable for the demands of the future.
Why Join Us?
- Work on cutting-edge AI technologies that define the industry standard.
- Competitive compensation and equity packages.
- Flexible remote and hybrid work options.
Responsibilities
- Architectural Strategy: Design and implement highly scalable, fault-tolerant infrastructure for AI workloads, including large language models and computer vision systems.
- Cloud Optimization: Oversee the migration and management of AI clusters on AWS and GCP, optimizing for cost and performance.
- DevOps Integration: Establish CI/CD pipelines specifically tailored for machine learning model training and deployment.
- Performance Tuning: Monitor system health and latency, implementing real-time optimizations to handle peak inference loads.
- Security & Compliance: Enforce rigorous security protocols to protect sensitive training data and model weights.
- Team Leadership: Mentor junior engineers and data scientists, fostering a culture of technical excellence and innovation.
Qualifications
- Education: Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related technical field.
- Experience: 5+ years of experience in software engineering, with at least 3 years specifically in AI infrastructure or high-performance computing.
- Technical Skills: Deep expertise in Python, Kubernetes, Docker, and Linux systems administration.
- Cloud Mastery: Proven experience deploying large-scale AI workloads on AWS (SageMaker, EC2) or GCP (AI Platform, TPU Pods).
- Networking: Strong understanding of networking protocols (TCP/IP, HTTP) and distributed system architectures.
- Future-Ready Mindset: Demonstrated ability to anticipate industry trends and adapt infrastructure strategies for upcoming technological shifts (e.g., quantum-ready architectures).