About Qwerky AI
QWERKY AI is a human-centered artificial intelligence company focused on building practical and approachable AI tools for real-world use. Headquartered in Columbia, South Carolina, with a distributed team across the U.S., QWERKY is led by a founding team of tech entrepreneurs with over a decade of experience. The company is dedicated to creating AI that enhances — rather than replaces — human intelligence. QWERKY AI is currently developing an AI platform to empower knowledge workers, creatives and small businesses.
Job Description
QWERKY AI seeks a highly experienced, visionary, and technically exceptional Senior Machine Learning Engineer (MLE) to spearhead critical initiatives within our Research & Development team. In this leadership role, you will be responsible for architecting, designing, and implementing highly scalable, robust, and cutting-edge machine learning systems and infrastructure. You will drive the technical strategy for MLOps and ML system design, mentor a team of talented engineers, and tackle our most complex engineering challenges in operationalizing AI. The ideal candidate is a recognized expert in ML engineering with a proven track record of delivering complex, high-impact ML systems from concept to production at scale.
Responsibilities
- Design, develop, and deploy mission-critical machine learning systems, platforms, and infrastructure, ensuring best-in-class reliability, scalability, and performance.
- Execute the organization's technical vision and strategy for MLOps practices, tools, and frameworks.
- Own and oversee the end-to-end lifecycle of complex ML systems, from requirements gathering and system design to implementation, testing, deployment, and long-term operational excellence.
- Provide technical leadership, mentorship, and guidance to machine learning engineers, fostering a culture of innovation, collaboration, and engineering excellence.
- Champion and enforce software engineering and MLOps best practices, including advanced CI/CD for ML, automated testing, infrastructure-as-code, comprehensive monitoring, and proactive incident response.
- Collaborate with data scientists to understand model intricacies and translate research prototypes into production-grade systems.
- Spearhead the optimization of machine learning models and inference pipelines for ultra-low latency, high throughput, and optimal resource utilization on various hardware platforms.
- Help lead the evaluation, selection, and integration of new technologies, tools, and methodologies to enhance our ML engineering capabilities.
- Drive initiatives to improve our ML infrastructure's scalability, reliability, and cost-effectiveness.
- Troubleshoot and resolve challenging issues in production ML systems, often requiring deep dives into complex, distributed environments.
Required Skills
- Bachelor’s, Master's, or PhD in Computer Science, Software Engineering, a closely related technical field, or equivalent experience (10+ years).
- Extensive, proven experience (typically 5+ years, or 3+ with a PhD) in machine learning engineering, software engineering focusing on ML systems, or a similar role.
- Expert-level proficiency in Python and proficiency in at least one language relevant to high-performance systems (e.g., C++, Java, Go, Rust).
- Hands-on expertise in building and deploying complex machine learning models and systems into production environments.
- In-depth understanding of MLOps principles, tools, and platforms (e.g., MLflow, Kubeflow, TFX, Seldon Core, Docker, Kubernetes, CI/CD for ML, model registries, feature stores).
- Experience with major cloud platforms (e.g., AWS, Azure, GCP), including their advanced ML services, compute options, and infrastructure components.
- Expert understanding of machine learning lifecycle, distributed systems, microservices, and data engineering principles.
- Demonstrated ability to develop complex technical projects, mentor engineers, and execute technical strategy.
- Exceptional problem-solving, debugging, and system design skills, with the ability to execute on solutions for ambiguous and challenging requirements.
- Outstanding communication and interpersonal skills, with the ability to articulate complex technical designs and strategies to technical and executive audiences.
Bonus Skills
- Significant contributions to open-source MLOps, machine learning, or distributed systems projects.
- Expertise in designing and implementing solutions for real-time, low-latency ML inference at scale.
- Knowledge of specific hardware acceleration for ML (e.g., GPUs, TPUs, FPGAs) and experience with CUDA programming or similar.
- Experience building and managing large-scale data processing pipelines using technologies like Spark, Flink, Kafka, or Beam.
- Expertise in network programming, distributed consensus, or high-availability system design.
- Advanced knowledge of C++ for building and optimizing high-performance ML inference pipelines or system components.
- Experience with security best practices for ML systems and data.
- A track record of publications in top-tier engineering or ML systems conferences/journals.
Pay / Benefits
- Salary or hourly rate
- Stock options plan (we are a private company, so this is not liquid)
- If you are in the USA: healthcare, dental, vision, 401k
- Unlimited time off policy
- Flexible working hours
Hiring Process
- Submit a resume to us for review.
- We will follow up with a technical screening (this will take approximately one hour).
- Following the successful completion of the technical screening, we will schedule an Onsite Interview.
- The Onsite Interview to meet more of the team, it will consist of the following (total time three hours):
- Technical Screenings (1-2)
- System Design
- Behavioral Interview
- We’ll reach out with an offer if you're a great fit.
- Once accepted, you start working with us!