Oasis - AI Engineering Sandbox

Evaluate What Actually Matters

🛡️

Adversarial Testing

Test candidates on real-world edge cases like recursive tool-calling loops, RAG hallucination, and prompt injection defenses.

⚖️

LLM-as-a-Judge

Automated grading via locally-hosted Ollama. Candidates get instant feedback while you get detailed AI trace logs.

📦

Dynamic Docker Sandboxes

Provide a rich development environment. The orchestrator dynamically spawns an air-gapped VS Code container for every candidate.

🛡️

Async Secure Evaluation

Grader scripts are executed inside disposable, isolated containers using background workers, completely eliminating remote code execution vulnerabilities.

📊

Recruiter Intelligence

A beautiful dashboard to visualize pass rates, evaluate raw code submissions, and generate secure one-time invite links.

COMING NEXT

From Sandbox MVP to Enterprise Hiring OS

The next wave of Oasis turns realistic AI engineering simulations into a governed, audit-ready assessment platform for hiring teams.

Enterprise roadmap

AI assists evaluation. Humans make the hiring decision.

Upcoming workflows focus on structured evidence, reviewer calibration, compliance exports, and isolated candidate environments built for security reviews.

01 Structured Scorecards

Rubric dimensions, evidence trails, confidence, and reviewer override.

02 Secure Sandboxes

Per-session workspaces, hardened orchestration, and artifact snapshots.

03 Compliance Center

Candidate notices, audit packets, adverse-impact reporting, and retention controls.

04 ATS Integrations

SSO, SCIM, webhooks, and hiring-system sync for enterprise workflows.

1

Harden The MVP

Structured grader JSON, per-session workspace cloning, invite expiry, safer secrets, and core regression tests.

2

Enterprise Control Plane

Organizations, jobs, candidates, scoped RBAC, Postgres migrations, artifact storage, and audit logs.

3

Hiring-Grade Evaluation

Rubric builder, deterministic checks, trace analysis, LLM-assisted review, calibration, and human decision records.

6 Pre-Loaded Domains

Domain A: Agentic MCP

Debug a LangGraph financial agent stuck in a recursive tool-calling loop.

Domain B: RAG Systems

Fix a ChromaDB retrieval system suffering from hallucination.

Domain C: LLM Security

Defend a customer support bot against malicious jailbreaks and prompt injections.

Domain D: MLOps

Refactor a blocking PyTorch endpoint into an optimized, async global inference cache.

Domain E: AI SWE

Debug core Python algorithms using an AI coding assistant.

Domain F: Multi-Agent Systems

Evaluate orchestration, delegation, and failure recovery across cooperating AI agents.

Enterprise Architecture

Oasis is built to be secure, scalable, and fully air-gapped.

🧠

FastAPI Orchestrator

A high-performance asynchronous backend that handles RBAC, JWT authentication, and session management seamlessly.

🐳

Dynamic Sandboxing

Uses the Docker SDK to spin up completely isolated, ephemeral code-server IDE containers natively on the host.

⚡

Async Background Evaluation

Grader scripts are executed inside strictly isolated disposable containers, eliminating all Remote Code Execution (RCE) risks.

💾

Persistent Storage

Metadata and candidate trace logs are safely stored via mounted SQLite volumes, ensuring crash resilience.

Roadmap architecture

Production deployments will move toward Postgres, object storage, queue-backed evaluation runners, managed secrets, OpenTelemetry, and sandbox orchestration through ECS/Fargate, Kubernetes, or another hardened worker layer.

Quick Start

Terminal

# 1. Clone the repository
git clone https://github.com/sumanthp/oasis.git
cd oasis

# 2. Generate a secure secret key
export SECRET_KEY=$(openssl rand -hex 32)

# 3. Spin up the orchestrator and AI Judge
docker-compose up -d --build

# Navigate to http://localhost:8000 and login with admin/admin