AI Pipeline Engineer (Python) — Document Intelligence

    AI Pipeline Engineer (Python) — Document Intelligence

    Open
    Veröffentlicht 3/2/2026
    Aktualisiert 3/2/2026

    Stellenbeschreibung

    AI Pipeline Engineer (Python) – Production AI & GenAI Systems

    We are looking for a hands-on AI Pipeline Engineer (Python) to take full ownership of our production AI pipelines and deliver new document-analysis capabilities.

    This is a delivery-driven role: you will maintain and extend our existing extraction pipelines while building new ones from scratch. At the same time, you’ll help lay the technical foundation for next-generation GenAI features.

    You will collaborate via well-defined APIs and typed schemas so product engineers can orchestrate workflows—while you own the “AI kernel” (prompts, evaluation, quality gates, model/tool logic).

    Anforderungen

    Own & Extend the Production Extraction Pipeline (Python / AWS)


    • Maintain and optimize our document extraction pipeline (Dockerized components + serverless architecture).


    • Ensure reliable processing, correct outputs, and stable triggering of downstream jobs.


    • Improve observability: logging, metrics, alerting, traceability, and cost monitoring.


    Deliver New Pipelines


    • Design and implement new end-to-end pipelines for new document families.


    • Build evaluation datasets, quality metrics, and regression tests to prevent silent degradation.

    LLM-Based Interpretation (Q&A / Structured Extraction / Risk Signals)


    • Improve Q&A and interpretation quality using LLMs.


    • Implement structured outputs, retrieval (RAG), deterministic validation, and post-processing.


    • Build robust failure handling (timeouts, retries, fallbacks, safe defaults).


    GenAI Foundations


    • Implement “agentic” building blocks in Python behind stable APIs.


    • Collaborate with product engineers to integrate AI workflows into the platform.

    Production Readiness


    • Ensure pipelines are secure, scalable, and cost-efficient.


    • Contribute to deployment practices, versioning, rollbacks, and environment management.

    What We’re Looking For

    Must-Have


    • Strong production-grade Python (APIs/services, testing, packaging, clean architecture).


    • Experience owning and maintaining production systems.


    • AWS serverless experience, especially Lambda + S3 (bonus: Step Functions, SQS, CloudWatch).


    • Docker and experience operating containerized components.


    • Proven experience debugging and maintaining automated pipelines in production.


    • Practical experience with LLMs (OpenAI/Azure OpenAI/Anthropic):


    - Structured outputs

    - Prompt iteration

    - Retrieval (RAG)

    - Evaluation approaches


    Nice-to-Have


    • Experience with document AI (OCR, layout/text extraction, noisy PDFs).


    • Evaluation-driven development (test sets, regression checks, quality metrics, cost/latency budgets).


    • Familiarity with TypeScript/Node and/or agent orchestration frameworks.


    • Experience integrating with RESTful services.



    First 90 Days – What Success Looks Like

    Weeks 1–2: Takeover & Stability


    • Set up local + staging environments for the existing pipeline.


    • Document architecture, failure modes, and operational playbooks.


    • Establish baseline observability (logs/metrics).


    • Define quality and cost targets (latency, cost per document, success rate).


    Weeks 3–6: New Pipeline Delivery


    • Deliver a new end-to-end pipeline for a new document family.


    • Create evaluation datasets and automated regression checks.


    • Ship a production-ready version with monitoring and rollback strategies.


    Weeks 7–12: Quality, Scale & GenAI Foundations


    • Improve extraction/Q&A accuracy through structured outputs, retrieval, and validation.


    • Deliver an initial GenAI “kernel” API (planner/drafter/risk checks) ready for orchestration.


    • Harden operations: cost controls, retries/fallbacks, production runbooks.


    Tech Stack


    Frontend: React (TypeScript)


    Backend: Java (JEE)


    AI / Document Processing: Dockerized OCR/NLP components, AWS Lambda-based Python pipelines, AWS S3


    LLMs: Azure-hosted ChatGPT / LLM APIs


    Infrastructure: AWS + Azure (hybrid)

    Über Division5

    Seit über 10 Jahren schaffen wir Lösungen, die Leben verbessern. Wir übernehmen komplexe Projekte, die andere meiden. Unsere DNA: Kühnheit und Mut. Wir sagen ja, wenn andere nein sagen.

    Remote-First Unternehmen
    50+ Teammitglieder
    Branchenführer

    Ähnliche Stellen

    Schau dir andere offene Stellen auf unserer Karriereseite an.

    Alle Stellen Anzeigen