Summary
The Engineering Challenge
You will be responsible for architecting and building the engine from the ground up. You must solve three specific constraints:
- The Data Reality: You will not have clean studio data. You must build a pipeline that can ingest "noisy" real-world audio (radio archives, podcasts, street interviews) and autonomously clean, align, and diarize it to create a high-fidelity training set.
- The Linguistic Complexity: The model must handle Code-Switching (fluidly mixing two languages in one sentence) and Tonal markers without breaking prosody. You must understand how to modify tokenizers to respect these nuances.
- The Inference Economics: We are not burning venture capital on infinite compute. You must quantize and optimize the model to run on consumer-grade GPUs with low latency. Efficiency is a constraint, not a nice-to-have.
What You Will Own
- End-to-End Pipeline: From raw audio ingestion to served API response.
- Model Fine-Tuning: Adapting foundation models to highly specific, low-resource dialects.
- Inference Architecture: Building a stateless, containerized inference server that handles concurrent requests with sub-200ms latency.
The DNA We Need
- Systems Thinker: You don't just train models; you build products. You understand how the model sits inside a container, how the API handles backpressure, and how the tokenizer affects the runtime.
- Data Realist: You know that 80% of the work is in the dataset. You are comfortable writing custom scripts to slice, denoise, and filter terabytes of audio.
- First-Principles Optimiser: You understand why a model is slow. You are comfortable with quantization, distillation, and kernel-level optimizations to squeeze performance out of limited hardware.