Judge API Documentation - LLM as a Judge

Overview

Judge is an LLM-as-a-Judge SaaS platform that enables you to test web applications using AI agents running in secure E2B sandboxes. The AI agent browses your application, captures screenshots at each step, and uses Vision LLMs (like GPT-4 Vision) to evaluate quality, accessibility, UX, and more.

AI-Powered Testing

GPT-4 Vision & Claude 3 evaluate your UI

Secure Sandboxes

Tests run in isolated E2B environments

Visual Testing

Screenshots captured at every step

Pro Tip

Use Judge to test platform.lum.tools or any web application. Perfect for automated accessibility checks, regression testing, and UX validation.

Quick Start

1. Get Your API Key

Visit platform.lum.tools/account/api-keys to create an API key.

Bash

export JUDGE_API_KEY="lum_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export JUDGE_API="https://judge.lum.tools/api/v1"

2. Create Your First Test

cURL

# Create evaluation
EVAL_ID=$(curl -s -X POST $JUDGE_API/evaluations \
  -H "Authorization: Bearer $JUDGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My First Test",
    "model_provider": "openai",
    "model_name": "gpt-4-vision-preview"
  }' | jq -r '.id')

# Add browser test
curl -X POST $JUDGE_API/evaluations/$EVAL_ID/test-cases \
  -H "Authorization: Bearer $JUDGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Homepage Test",
    "test_type": "browser",
    "browser_steps": [
      {"action": "navigate", "url": "https://platform.lum.tools"},
      {"action": "screenshot", "name": "homepage"}
    ],
    "evaluation_criteria": {
      "criteria": ["Is the homepage accessible?"]
    }
  }'

# Run test
curl -X POST $JUDGE_API/evaluations/$EVAL_ID/run \
  -H "Authorization: Bearer $JUDGE_API_KEY"

# Get results (after 30-60 seconds)
curl $JUDGE_API/evaluations/$EVAL_ID/results \
  -H "Authorization: Bearer $JUDGE_API_KEY" | jq

Authentication

All API requests require authentication using your platform.lum.tools API key.

Authorization Header

Authorization: Bearer lum_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Access Requirements

Judge API requires a Lum Pro subscription. Free tier users will receive a 403 Forbidden error.

API Endpoints

Evaluations

GET /api/v1/evaluations

List all evaluations for the authenticated user.

POST /api/v1/evaluations

Create a new evaluation suite.

GET /api/v1/evaluations/{id}

Get evaluation summary and status.

GET /api/v1/evaluations/{id}/results New

Get detailed results including LLM feedback, screenshots, and scores.

Test Cases

POST /api/v1/evaluations/{id}/test-cases

Add a browser test case to an evaluation.

Execution

POST /api/v1/evaluations/{id}/run

Queue evaluation for execution. Tests run in E2B sandbox.

Full API Reference (Swagger) Alternative View (ReDoc)

Code Examples

cURL - Complete Workflow

#!/bin/bash

# Step 1: Create evaluation
EVAL_ID=$(curl -s -X POST https://judge.lum.tools/api/v1/evaluations \
  -H "Authorization: Bearer $JUDGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Homepage Accessibility Test",
    "model_provider": "openai",
    "model_name": "gpt-4-vision-preview"
  }' | jq -r '.id')

echo "Created evaluation: $EVAL_ID"

# Step 2: Add test case
curl -X POST https://judge.lum.tools/api/v1/evaluations/$EVAL_ID/test-cases \
  -H "Authorization: Bearer $JUDGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Visual Accessibility Check",
    "test_type": "browser",
    "browser_steps": [
      {"action": "navigate", "url": "https://platform.lum.tools"},
      {"action": "wait", "duration": 2000},
      {"action": "screenshot", "name": "homepage"}
    ],
    "evaluation_criteria": {
      "criteria": [
        "Is the navigation accessible?",
        "Are all elements labeled?",
        "Is the contrast sufficient?"
      ]
    }
  }'

# Step 3: Run test
curl -X POST https://judge.lum.tools/api/v1/evaluations/$EVAL_ID/run \
  -H "Authorization: Bearer $JUDGE_API_KEY"

# Step 4: Wait and get results
sleep 60
curl https://judge.lum.tools/api/v1/evaluations/$EVAL_ID/results \
  -H "Authorization: Bearer $JUDGE_API_KEY" | jq

Python - Using httpx

import httpx
import asyncio
import os

JUDGE_API_KEY = os.getenv("JUDGE_API_KEY")
JUDGE_API = "https://judge.lum.tools/api/v1"

async def run_test():
    async with httpx.AsyncClient() as client:
        # Create evaluation
        response = await client.post(
            f"{JUDGE_API}/evaluations",
            headers={"Authorization": f"Bearer {JUDGE_API_KEY}"},
            json={
                "name": "Homepage Test",
                "model_provider": "openai",
                "model_name": "gpt-4-vision-preview"
            }
        )
        eval_id = response.json()["id"]
        print(f"Created evaluation: {eval_id}")

        # Add test case
        await client.post(
            f"{JUDGE_API}/evaluations/{eval_id}/test-cases",
            headers={"Authorization": f"Bearer {JUDGE_API_KEY}"},
            json={
                "name": "Visual Check",
                "test_type": "browser",
                "browser_steps": [
                    {"action": "navigate", "url": "https://platform.lum.tools"},
                    {"action": "screenshot", "name": "page"}
                ],
                "evaluation_criteria": {
                    "criteria": ["Is it accessible?"]
                }
            }
        )

        # Run test
        await client.post(
            f"{JUDGE_API}/evaluations/{eval_id}/run",
            headers={"Authorization": f"Bearer {JUDGE_API_KEY}"}
        )

        # Wait for results
        await asyncio.sleep(60)

        # Get results
        results = await client.get(
            f"{JUDGE_API}/evaluations/{eval_id}/results",
            headers={"Authorization": f"Bearer {JUDGE_API_KEY}"}
        )
        print(results.json())

asyncio.run(run_test())

JavaScript - Using fetch

const JUDGE_API_KEY = process.env.JUDGE_API_KEY;
const JUDGE_API = "https://judge.lum.tools/api/v1";

async function runTest() {
  const headers = {
    "Authorization": `Bearer ${JUDGE_API_KEY}`,
    "Content-Type": "application/json"
  };

  // Create evaluation
  const evalResponse = await fetch(`${JUDGE_API}/evaluations`, {
    method: "POST",
    headers,
    body: JSON.stringify({
      name: "Homepage Test",
      model_provider: "openai",
      model_name: "gpt-4-vision-preview"
    })
  });
  const { id: evalId } = await evalResponse.json();
  console.log(`Created evaluation: ${evalId}`);

  // Add test case
  await fetch(`${JUDGE_API}/evaluations/${evalId}/test-cases`, {
    method: "POST",
    headers,
    body: JSON.stringify({
      name: "Visual Check",
      test_type: "browser",
      browser_steps: [
        { action: "navigate", url: "https://platform.lum.tools" },
        { action: "screenshot", name: "page" }
      ],
      evaluation_criteria: {
        criteria: ["Is it accessible?"]
      }
    })
  });

  // Run test
  await fetch(`${JUDGE_API}/evaluations/${evalId}/run`, {
    method: "POST",
    headers
  });

  // Wait for results
  await new Promise(resolve => setTimeout(resolve, 60000));

  // Get results
  const resultsResponse = await fetch(
    `${JUDGE_API}/evaluations/${evalId}/results`,
    { headers }
  );
  const results = await resultsResponse.json();
  console.log(results);
}

runTest();

Browser Actions

Define test steps using these browser actions:

Action	Parameters	Example
`navigate`	`url`	`{"action": "navigate", "url": "https://example.com"}`
`click`	`selector, wait`	`{"action": "click", "selector": "#button"}`
`type`	`selector, value`	`{"action": "type", "selector": "#email", "value": "test@example.com"}`
`screenshot`	`name`	`{"action": "screenshot", "name": "homepage"}`
`wait`	`duration (ms)`	`{"action": "wait", "duration": 2000}`

Error Handling

401

Unauthorized

Invalid or missing API key

403

Forbidden

Requires Lum Pro subscription

404

Not Found

Evaluation ID does not exist or doesn't belong to you

400

Bad Request

Invalid JSON or missing required fields