Judge API

LLM as a Judge - AI-Powered Web Testing

Overview

Judge is an LLM-as-a-Judge SaaS platform that enables you to test web applications using AI agents running in secure E2B sandboxes. The AI agent browses your application, captures screenshots at each step, and uses Vision LLMs (like GPT-4 Vision) to evaluate quality, accessibility, UX, and more.

AI-Powered Testing

GPT-4 Vision & Claude 3 evaluate your UI

Secure Sandboxes

Tests run in isolated E2B environments

Visual Testing

Screenshots captured at every step

Pro Tip

Use Judge to test platform.lum.tools or any web application. Perfect for automated accessibility checks, regression testing, and UX validation.

Quick Start

1. Get Your API Key

Visit platform.lum.tools/account/api-keys to create an API key.

Bash
export JUDGE_API_KEY="lum_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export JUDGE_API="https://judge.lum.tools/api/v1"

2. Create Your First Test

cURL
# Create evaluation
EVAL_ID=$(curl -s -X POST $JUDGE_API/evaluations \
  -H "Authorization: Bearer $JUDGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My First Test",
    "model_provider": "openai",
    "model_name": "gpt-4-vision-preview"
  }' | jq -r '.id')

# Add browser test
curl -X POST $JUDGE_API/evaluations/$EVAL_ID/test-cases \
  -H "Authorization: Bearer $JUDGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Homepage Test",
    "test_type": "browser",
    "browser_steps": [
      {"action": "navigate", "url": "https://platform.lum.tools"},
      {"action": "screenshot", "name": "homepage"}
    ],
    "evaluation_criteria": {
      "criteria": ["Is the homepage accessible?"]
    }
  }'

# Run test
curl -X POST $JUDGE_API/evaluations/$EVAL_ID/run \
  -H "Authorization: Bearer $JUDGE_API_KEY"

# Get results (after 30-60 seconds)
curl $JUDGE_API/evaluations/$EVAL_ID/results \
  -H "Authorization: Bearer $JUDGE_API_KEY" | jq

Authentication

All API requests require authentication using your platform.lum.tools API key.

Authorization Header

Authorization: Bearer lum_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Access Requirements

Judge API requires a Lum Pro subscription. Free tier users will receive a 403 Forbidden error.

API Endpoints

Evaluations

GET /api/v1/evaluations

List all evaluations for the authenticated user.

POST /api/v1/evaluations

Create a new evaluation suite.

GET /api/v1/evaluations/{id}

Get evaluation summary and status.

GET /api/v1/evaluations/{id}/results New

Get detailed results including LLM feedback, screenshots, and scores.

Test Cases

POST /api/v1/evaluations/{id}/test-cases

Add a browser test case to an evaluation.

Execution

POST /api/v1/evaluations/{id}/run

Queue evaluation for execution. Tests run in E2B sandbox.

Code Examples

cURL - Complete Workflow
#!/bin/bash

# Step 1: Create evaluation
EVAL_ID=$(curl -s -X POST https://judge.lum.tools/api/v1/evaluations \
  -H "Authorization: Bearer $JUDGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Homepage Accessibility Test",
    "model_provider": "openai",
    "model_name": "gpt-4-vision-preview"
  }' | jq -r '.id')

echo "Created evaluation: $EVAL_ID"

# Step 2: Add test case
curl -X POST https://judge.lum.tools/api/v1/evaluations/$EVAL_ID/test-cases \
  -H "Authorization: Bearer $JUDGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Visual Accessibility Check",
    "test_type": "browser",
    "browser_steps": [
      {"action": "navigate", "url": "https://platform.lum.tools"},
      {"action": "wait", "duration": 2000},
      {"action": "screenshot", "name": "homepage"}
    ],
    "evaluation_criteria": {
      "criteria": [
        "Is the navigation accessible?",
        "Are all elements labeled?",
        "Is the contrast sufficient?"
      ]
    }
  }'

# Step 3: Run test
curl -X POST https://judge.lum.tools/api/v1/evaluations/$EVAL_ID/run \
  -H "Authorization: Bearer $JUDGE_API_KEY"

# Step 4: Wait and get results
sleep 60
curl https://judge.lum.tools/api/v1/evaluations/$EVAL_ID/results \
  -H "Authorization: Bearer $JUDGE_API_KEY" | jq

Browser Actions

Define test steps using these browser actions:

Action Parameters Example
navigate url {"action": "navigate", "url": "https://example.com"}
click selector, wait {"action": "click", "selector": "#button"}
type selector, value {"action": "type", "selector": "#email", "value": "test@example.com"}
screenshot name {"action": "screenshot", "name": "homepage"}
wait duration (ms) {"action": "wait", "duration": 2000}

Error Handling

401

Unauthorized

Invalid or missing API key

403

Forbidden

Requires Lum Pro subscription

404

Not Found

Evaluation ID does not exist or doesn't belong to you

400

Bad Request

Invalid JSON or missing required fields