Join the Ranks
ML System Administrator (m/f/d)
full-time/part-time | Lübeck
Your mission
As ML System Administrator your main responsibilities will be administering our in-house High Performance Computing servers and supporting our machine learning engineers with any challenges they might face from an infrastructure and operational point of view.
You will monitor and administer a NAS with an associated NFS, a backup server and multiple GPU compute servers. The ideal candidate has proven system administration experience with focus on GPU servers..
Responsibilities
- Day-to-day monitoring and administration of a GPU-centric ML HPC system with focus on reliability
- Maintaining internal data sharing systems (SAMBA / NFS / NAS)
- Organizing and backing up critical data
- Installing and maintaining ML supporting software and frameworks
- Administer, configure, maintain, and build upon deployments using industry-standard tools (e.g. Slurm, Kubernetes, Docker, Jira, etc).
- Specify hardware requirements and environments to support ML loads
- Provide prompt support for team members on computing, storage or software outages
Your Qualifications and Experience
- BSc in a computer science or any related field
- 3 years experience building, configuring and administering computer networks / servers
- Ideally 3+ years experience administrating machine learning HPC systems and distributed GPU-centric cluster
- Experience with orchestration and container deployment (Docker / Kubernetes)
- In-depth knowledge of the Linux operating system (certifications are a plus)
- In-depth knowledge of LAN / SAN / NAS / fiber channel networking (certifications are a plus)
- Good knowledge of Python, basic knowledge of ML frameworks (Tensorflow, PyTorch)
- You have excellent communication skills (English and German)
Interested? Here is what we need from you:
- Application with CV as well as your expectations in working with us
- Your earliest entry date
- Your desired salary
We are looking forward to meeting you!