Overview
Judge is an LLM-as-a-Judge SaaS platform that enables you to test web applications using AI agents running in secure E2B sandboxes. The AI agent browses your application, captures screenshots at each step, and uses Vision LLMs (like GPT-4 Vision) to evaluate quality, accessibility, UX, and more.
AI-Powered Testing
GPT-4 Vision & Claude 3 evaluate your UI
Secure Sandboxes
Tests run in isolated E2B environments
Visual Testing
Screenshots captured at every step
Pro Tip
Use Judge to test platform.lum.tools or any web application. Perfect for automated accessibility checks, regression testing, and UX validation.
Quick Start
1. Get Your API Key
Visit platform.lum.tools/account/api-keys to create an API key.
export JUDGE_API_KEY="lum_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export JUDGE_API="https://judge.lum.tools/api/v1"
2. Create Your First Test
# Create evaluation
EVAL_ID=$(curl -s -X POST $JUDGE_API/evaluations \
-H "Authorization: Bearer $JUDGE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "My First Test",
"model_provider": "openai",
"model_name": "gpt-4-vision-preview"
}' | jq -r '.id')
# Add browser test
curl -X POST $JUDGE_API/evaluations/$EVAL_ID/test-cases \
-H "Authorization: Bearer $JUDGE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Homepage Test",
"test_type": "browser",
"browser_steps": [
{"action": "navigate", "url": "https://platform.lum.tools"},
{"action": "screenshot", "name": "homepage"}
],
"evaluation_criteria": {
"criteria": ["Is the homepage accessible?"]
}
}'
# Run test
curl -X POST $JUDGE_API/evaluations/$EVAL_ID/run \
-H "Authorization: Bearer $JUDGE_API_KEY"
# Get results (after 30-60 seconds)
curl $JUDGE_API/evaluations/$EVAL_ID/results \
-H "Authorization: Bearer $JUDGE_API_KEY" | jq
Authentication
All API requests require authentication using your platform.lum.tools API key.
Authorization Header
Authorization: Bearer lum_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Access Requirements
Judge API requires a Lum Pro subscription. Free tier users will receive a 403 Forbidden error.
API Endpoints
Evaluations
/api/v1/evaluations
List all evaluations for the authenticated user.
/api/v1/evaluations
Create a new evaluation suite.
/api/v1/evaluations/{id}
Get evaluation summary and status.
/api/v1/evaluations/{id}/results
New
Get detailed results including LLM feedback, screenshots, and scores.
Test Cases
/api/v1/evaluations/{id}/test-cases
Add a browser test case to an evaluation.
Execution
/api/v1/evaluations/{id}/run
Queue evaluation for execution. Tests run in E2B sandbox.
Code Examples
#!/bin/bash
# Step 1: Create evaluation
EVAL_ID=$(curl -s -X POST https://judge.lum.tools/api/v1/evaluations \
-H "Authorization: Bearer $JUDGE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Homepage Accessibility Test",
"model_provider": "openai",
"model_name": "gpt-4-vision-preview"
}' | jq -r '.id')
echo "Created evaluation: $EVAL_ID"
# Step 2: Add test case
curl -X POST https://judge.lum.tools/api/v1/evaluations/$EVAL_ID/test-cases \
-H "Authorization: Bearer $JUDGE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Visual Accessibility Check",
"test_type": "browser",
"browser_steps": [
{"action": "navigate", "url": "https://platform.lum.tools"},
{"action": "wait", "duration": 2000},
{"action": "screenshot", "name": "homepage"}
],
"evaluation_criteria": {
"criteria": [
"Is the navigation accessible?",
"Are all elements labeled?",
"Is the contrast sufficient?"
]
}
}'
# Step 3: Run test
curl -X POST https://judge.lum.tools/api/v1/evaluations/$EVAL_ID/run \
-H "Authorization: Bearer $JUDGE_API_KEY"
# Step 4: Wait and get results
sleep 60
curl https://judge.lum.tools/api/v1/evaluations/$EVAL_ID/results \
-H "Authorization: Bearer $JUDGE_API_KEY" | jq
Browser Actions
Define test steps using these browser actions:
| Action | Parameters | Example |
|---|---|---|
navigate
|
url
|
{"action": "navigate", "url": "https://example.com"}
|
click
|
selector, wait
|
{"action": "click", "selector": "#button"}
|
type
|
selector, value
|
{"action": "type", "selector": "#email", "value": "test@example.com"}
|
screenshot
|
name
|
{"action": "screenshot", "name": "homepage"}
|
wait
|
duration (ms)
|
{"action": "wait", "duration": 2000}
|
Error Handling
Unauthorized
Invalid or missing API key
Forbidden
Requires Lum Pro subscription
Not Found
Evaluation ID does not exist or doesn't belong to you
Bad Request
Invalid JSON or missing required fields