Arnay Shukla | Software Engineer

Digital proctoring is essential for maintaining integrity in online assessments, but existing solutions often come with significant limitations: high costs, scalability issues, and lack of customization. This is how we architected a plug-and-play proctoring system that handles 50k concurrent users, costs approximately ₹0.35 per user, and provides full control over the experience.

The solution: A lightweight SDK that integrates into any web application via a simple script tag, with a scalable backend that can handle massive concurrent loads.

Use Cases

This proctoring solution is designed as a plug-and-play system that can be integrated into various educational and assessment platforms:

Online Examinations: High-stakes competitive exams, certification tests, and academic assessments requiring strict integrity monitoring.
Live Class Attentiveness: Monitor student engagement during live online classes by tracking presence, attention levels, and participation patterns.
Remote Interviews: Ensure candidate authenticity during remote hiring processes.
Training & Certification: Track completion and authenticity for professional development courses.
Adaptive Learning Assessments: Monitor student behavior during adaptive learning sessions to ensure genuine engagement.

The system is designed to be non-intrusive, with minimal performance impact on the host application, making it suitable for long-duration sessions (3+ hours) without degrading user experience.

The Architecture: Client-Side Capture, Server-Side Orchestration

We designed a system with clear separation of concerns:

┌─────────────────────────────────────────────────────────────────────┐
│                    CANDIDATE BROWSER (Host Application)               │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │              PROCTOR SDK (Script Tag Injection)               │  │
│  │                                                                │  │
│  │  • Permission checks (camera, mic, screen)                     │  │
│  │  • Face captures every 30s (25KB JPEG)                        │  │
│  │  • Screen screenshots every 30s (25KB JPEG)                    │  │
│  │  • Audio recording (continuous, 12kbps Opus)                   │  │
│  │  • Event monitoring (tab switches, keystrokes)                 │  │
│  │  • IndexedDB buffer for offline resilience                     │  │
│  │                                                                │  │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐               │  │
│  │  │   Media    │  │   Event    │  │  IndexedDB │               │  │
│  │  │  Capture   │─▶│ Collector  │─▶│   Buffer   │               │  │
│  │  │ (Workers)  │  │            │  │ (Offline)  │               │  │
│  │  └────────────┘  └────────────┘  └─────┬──────┘               │  │
│  └─────────────────────────────────────────┼──────────────────────┘  │
└─────────────────────────────────────────────┼──────────────────────────┘
                                              │
                    ┌─────────────────────────┼──────────────────────────┐
                    │                         │                          │
                    ▼                         │                          ▼
┌───────────────────────────────┐             │       ┌───────────────────────────┐
│  PROCTOR BACKEND SERVICE (Go) │             │       │      S3 (Direct Upload)   │
│                               │             │       │        ap-south-1         │
│  • GET /api/v1/config         │             │       │                           │
│  • POST /api/v1/credentials   │◄────────────┘       │  /{date}/{test_id}/       │
│  • POST /api/v1/events        │                     │    {candidate_id}/         │
│  • GET /api/v1/dashboard/*   │                     │      camera/  (25KB)       │
│                               │                     │      screen/  (25KB)       │
│                               │                     │      audio/  (14KB/30s)    │
│                               │                     └───────────────────────────┘
│              │                 │
│              ▼                 │
│  ┌─────────────────────────┐   │
│  │        Kafka            │   │
│  │  topic: proctor_events  │   │
│  └───────────┬─────────────┘   │
│              │                 │
│              ▼                 │
│  ┌─────────────────────────┐   │
│  │   Kafka Connect         │   │
│  └───────────┬─────────────┘   │
│              │                 │
│              ▼                 │
│  ┌─────────────────────────┐   │
│  │      ClickHouse          │   │
│  │  table: proctor_events   │   │
│  └──────────────────────────┘   │
└───────────────────────────────────┘

The key insight: media uploads go directly to S3 (bypassing the backend), while events flow through our backend for validation and publishing to Kafka, which are then streamed to ClickHouse via Kafka Connect.

Critical Design Decision: Non-Blocking by Default

This was our most important architectural principle: permission issues block the test; infrastructure failures never do.

Blocking Scenarios (Candidate Must Act)

Permission denied (camera, mic, screen)
Full screen exited
Screen share stopped
Unsupported browser

These are compliance issues. The candidate must fix them to continue.

Non-Blocking Scenarios (Graceful Degradation)

Backend API unavailable → Events buffered in IndexedDB
S3 upload failure → Retry with exponential backoff
Kafka/ClickHouse down → Backend buffers internally
STS credential failure → Retry 3x, then disable S3 uploads

The golden rule: A candidate should NEVER be blocked from taking their test due to a failure in the proctoring infrastructure.

This design choice meant we had to build robust retry logic, offline buffering, and graceful degradation at every layer.

The SDK: Performance-Critical Client-Side Code

The SDK runs in the candidate's browser for 3+ hours. It must have negligible impact on the test-taking experience.

Performance Budgets

Metric	Budget	How We Achieved It
Main thread blocking	<16ms per frame	All heavy ops in Web Workers
Memory usage	<50MB steady state	Aggressive cleanup, limit buffers
CPU usage	<5% average	Throttle captures, use requestIdleCallback
Bundle size	<100KB gzipped	Tree-shaking, lazy loading
Network per minute	<200KB	Aggressive compression, batching

Thread Model

We moved all heavy computation off the main thread:

┌─────────────────────────────────────────────────────────┐
│                    MAIN THREAD                          │
│  • Event listeners (minimal)                             │
│  • Permission prompts                                  │
│  • Blocking overlays                                   │
│  • State coordination                                  │
│                                                        │
│  NO heavy computation here!                            │
└─────────────────────────────────────────────────────────┘
                        │
                        │ Message passing
                        │
┌─────────────────────────────────────────────────────────┐
│                    WEB WORKERS                          │
│                                                        │
│  ┌──────────────────────┐                              │
│  │  ImageWorker.js      │                              │
│  │  • JPEG compression  │                              │
│  │  • Canvas resizing   │                              │
│  │  • Face: 25KB       │                              │
│  │  • Screen: 25KB     │                              │
│  └──────────────────────┘                              │
│                                                        │
│  ┌──────────────────────┐                              │
│  │  AudioWorker.js      │                              │
│  │  • Opus encoding     │                              │
│  │  • Chunk processing  │                              │
│  └──────────────────────┘                              │
│                                                        │
│  ┌──────────────────────┐                              │
│  │  UploadWorker.js     │                              │
│  │  • S3 PUT requests   │                              │
│  │  • Retry logic       │                              │
│  └──────────────────────┘                              │
└─────────────────────────────────────────────────────────┘

Image Compression Strategy

We compress images aggressively in a Web Worker:

// ImageWorker.js - runs off main thread
self.onmessage = async (e) => {
  const { imageData, type, quality, maxWidth } = e.data;

  // Create OffscreenCanvas (no DOM access needed)
  const canvas = new OffscreenCanvas(maxWidth, maxWidth * 0.75);
  const ctx = canvas.getContext("2d");

  // Resize maintaining aspect ratio
  const bitmap = await createImageBitmap(imageData);
  const scale = Math.min(maxWidth / bitmap.width, 1);
  canvas.width = bitmap.width * scale;
  canvas.height = bitmap.height * scale;
  ctx.drawImage(bitmap, 0, 0, canvas.width, canvas.height);
  bitmap.close(); // Release memory immediately

  // Compress to JPEG blob
  const blob = await canvas.convertToBlob({
    type: "image/jpeg",
    quality: type === "face" ? 0.4 : 0.35, // Face: 25KB, Screen: 25KB
  });

  self.postMessage({ blob, type }, [blob]);
};

Result: Face captures are ~25KB, screen screenshots are ~25KB. For a 3-hour test with captures every 30 seconds, that's 360 images × 50KB = ~18MB per candidate. Much better than uncompressed images.

Audio Recording: Real-Time Chunks

We upload audio in real-time chunks (every 30 seconds) rather than one large file at the end. This ensures minimal data loss if the candidate closes the tab.

class AudioRecorder {
  private readonly CHUNK_INTERVAL_MS = 30000; // 30 seconds per chunk

  async start(stream: MediaStream) {
    this.mediaRecorder = new MediaRecorder(stream, {
      mimeType: "audio/webm;codecs=opus",
      audioBitsPerSecond: 12000, // 12kbps Opus = ~3MB for 3 hours total
    });

    this.mediaRecorder.ondataavailable = async (e) => {
      if (e.data.size > 0) {
        await this.uploadChunk(e.data);
      }
    };

    // Fire ondataavailable every 30 seconds
    this.mediaRecorder.start(this.CHUNK_INTERVAL_MS);
  }
}

Each chunk is ~14KB (30 seconds at 12kbps Opus). For a 3-hour test: 360 chunks × 14KB = ~5MB total.

Benefits:

Only last partial chunk lost on tab close (~30s max)
No need for IndexedDB recovery of large files
Smaller retry units if upload fails

Backend: Stateless and Scalable

The backend service is written in Go and designed to be horizontally scalable.

Key Endpoints

POST /api/v1/credentials - Issues STS temporary credentials for S3 uploads

func (s *STSService) GenerateCredentials(ctx context.Context, testID, candidateID string) (*Credentials, error) {
    date := time.Now().Format("2006-01-02")
    prefix := fmt.Sprintf("%s/%s/%s/", date, testID, candidateID)

    // Session policy scopes credentials to candidate's folder only
    sessionPolicy := fmt.Sprintf(`{
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::proctor-media-bucket/%s*"
        }]
    }`, prefix)

    input := &sts.AssumeRoleInput{
        RoleArn:         aws.String("arn:aws:iam::ACCOUNT_ID:role/ProctorMediaUploadRole"),
        RoleSessionName: aws.String(fmt.Sprintf("proctor-%s-%s", testID, candidateID)),
        DurationSeconds: aws.Int64(14400), // 4 hours (covers 3-hour test + buffer)
        Policy:          aws.String(sessionPolicy),
    }

    result, err := s.stsClient.AssumeRole(ctx, input)
    // ... return credentials
}

POST /api/v1/events - Ingests batched events from SDK

The backend validates events, enriches them with server timestamps, and publishes them to Kafka. Events are then consumed by Kafka Connect, which streams them to ClickHouse for real-time analytics. The response includes shutdown signals if proctoring is disabled:

func HandleEvents(w http.ResponseWriter, r *http.Request) {
    // Extract from JWT
    candidateID := r.Context().Value("candidate_id").(string)
    testID := r.FormValue("test_id")

    // Check flags BEFORE processing events
    config, err := db.GetTestConfig(testID, candidateID)
    if err != nil || !config.ProctoringEnabled || config.CandidateBypass {
        json.NewEncoder(w).Encode(map[string]interface{}{
            "status": "shutdown",
            "reason": determineReason(config),
            "action": "graceful_shutdown",
        })
        return
    }

    // Process events normally...
    processEvents(events)
}

This approach uses the existing events channel (called every 5s) to signal shutdown, avoiding the need for separate config polling or WebSocket infrastructure.

Configuration: AWS AppConfig for Instant Propagation

We store all proctoring configuration in AWS AppConfig rather than a database. This gives us:

Instant propagation - Config changes reflect immediately
Built-in rollback - If something breaks, auto-rollback
Feature flags - Native support for gradual rollouts
No database dependency - One less system to manage

Configuration Profiles

Global Defaults:

{
  "defaults": {
    "proctoring_enabled": true,
    "capture_intervals": {
      "face_ms": 30000,
      "screen_ms": 30000
    },
    "blocking": {
      "on_permission_denied": true,
      "on_fullscreen_exit": true
    }
  }
}

Test Overrides:

{
  "tests": {
    "TEST123": {
      "proctoring_enabled": true,
      "capture_intervals": {
        "face_ms": 15000,
        "screen_ms": 60000
      }
    }
  }
}

Candidate Bypass:

{
  "bypasses": {
    "TEST123": {
      "CAND456": {
        "bypassed": true,
        "reason": "Technical issues with camera",
        "bypassed_by": "admin@example.com"
      }
    }
  }
}

The backend resolves config in this order: global defaults → test overrides → candidate bypass. If bypass is enabled, the SDK receives a shutdown signal on its next events call.

Scale: Handling 50K Concurrent Candidates

Let's break down the load:

Scenario: 50,000 candidates, 3-hour test, captures every 30 seconds

Metric	Calculation	Value
Face captures per candidate	3 hours × 2/min	360
Screen screenshots per candidate	3 hours × 2/min	360
Audio chunks per candidate	3 hours × 2/min (30s chunks)	360
Total face captures	50,000 × 360	18,000,000
Total screen screenshots	50,000 × 360	18,000,000
Total audio chunks	50,000 × 360	18,000,000
Other events per candidate	~100 (tab switches, etc.)	5,000,000
Total events		~41,000,000
Storage per test	(36M × 50KB) + (50K × 5MB)	~1.05 TB

QPS Analysis

Phase	Duration	Requests	QPS
Credential burst	2 min (instruction window)	50,000	~417
Steady state events	3 hours	23M batched	~1,100
S3 uploads (face)	3 hours	18M	~1,667
S3 uploads (screen)	3 hours	18M	~1,667
S3 uploads (audio)	3 hours	18M	~1,667

Total S3 uploads: ~54M PUT requests. But since uploads go directly from client to S3, the backend doesn't see this load.

Scaling Strategy

Component	Strategy
Proctor Backend	Horizontal scaling (stateless), K8s HPA
S3 Uploads	Direct client-to-S3 (infinitely scalable)
Kafka	Kafka cluster with topic partitioning
Kafka Connect	Streams events from Kafka to ClickHouse
ClickHouse	ClickHouse cluster for analytics storage

We use Kubernetes Horizontal Pod Autoscaler (HPA) to scale the backend based on CPU and memory:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Scaling Estimates (50k Candidates):

Phase	Duration	Pods Needed	CPU Total
Test Start (burst)	5 min	15-20	15-20 vCPU
Steady State	3 hours	8-10	4-5 vCPU
Test End (burst)	10 min	10-12	5-6 vCPU
Idle (no test)	-	3	0.15 vCPU

Storage: S3 with Lifecycle Policies

We store all media in S3 with a date-prefixed structure for easy lifecycle management:

s3://proctor-media-bucket/
└── 2024-12-22/                         # Date prefix
    └── TEST123/
        └── CAND456/
            ├── camera/                 # Face captures (25KB each)
            │   ├── base_truth_1703234567890.jpg
            │   └── 1703234597890_uuid-1.jpg
            ├── screen/                 # Screen screenshots (25KB each)
            │   └── 1703234597890_uuid-2.jpg
            └── audio/                  # Audio chunks (~14KB each)
                └── audio_1703234567890/
                    ├── chunk_00000.webm
                    ├── chunk_00001.webm
                    └── chunk_00359.webm

Lifecycle Policy:

0-10 days: Standard storage ($0.023/GB)
10-30 days: Glacier Instant Retrieval ($0.004/GB)
30+ days: Deleted

This reduces storage costs by ~83% after 10 days.

Cost Estimation

Per Test (50k users):

Item	Cost (INR)
S3 PUT requests (54M requests)	₹15,000
S3 Storage (10 days Standard)	₹665
S3 Storage (20 days Glacier IR)	₹235
Kafka (marginal)	₹1,000
Compute (burst handling)	₹830
Total per test	~₹17,730

Cost per user: Approximately ₹0.35 per test session.

Annual Infrastructure (Monthly Baseline):

Component	Monthly Cost
Proctor Backend	₹8,000
S3 Storage	₹6,500
Kafka	₹3,000
ClickHouse	₹3,500
AWS AppConfig	₹500
CDN (SDK)	₹1,000
Total monthly	₹22,500

Total Annual: ~₹78,000 (per-test costs) + ~₹2.7L (infrastructure) = ~₹3.5L for 5 tests per year.

The system is designed to scale linearly, with costs primarily driven by storage and compute resources that can be optimized based on retention policies and usage patterns.

Key Learnings

1. Non-Blocking Design is Non-Negotiable

The most critical decision was making infrastructure failures non-blocking. Candidates should never be prevented from taking their test because our proctoring service is down. This required:

Robust retry logic with exponential backoff
IndexedDB buffering for offline resilience
Graceful degradation at every layer
Clear separation between "must block" (permissions) and "never block" (infrastructure)

2. Client-Side Performance Matters

When code runs in a candidate's browser for 3+ hours, every millisecond counts. We achieved <5% CPU overhead by:

Moving all heavy computation to Web Workers
Aggressive image compression (25KB per capture)
Throttling captures to prevent overlap
Using requestIdleCallback for non-critical operations

3. Direct S3 Uploads Scale Infinitely

By having clients upload directly to S3 (via STS credentials), we bypass the backend entirely for media uploads. This means:

No backend bottleneck for uploads
S3 handles the scale (it's designed for this)
Backend only handles lightweight event batching

4. Configuration as Code (AppConfig)

Using AWS AppConfig instead of a database for configuration gives us:

Instant propagation (no polling needed)
Built-in rollback on errors
Feature flag support out of the box
One less database to manage

5. Real-Time Chunks > Single Upload

Uploading audio in 30-second chunks instead of one large file at the end means:

Minimal data loss on tab close (~30s max)
Smaller retry units if upload fails
Better progress tracking
No need for large IndexedDB buffers

What's Next

Phase 1 is complete and handling production load. Future phases will add:

Phase 2: AI-based face matching and anomaly detection
Phase 3: Real-time proctoring with live human monitors
Phase 4: Advanced security (VM detection, remote desktop detection)

But the foundation is solid: a scalable, cost-effective, fully-controlled proctoring solution that handles 50k concurrent candidates without breaking a sweat.

Takeaways

If you're considering building vs. buying for a critical system:

Do the math - At scale, vendor costs can be astronomical
Design for failure - Non-blocking architecture is essential for user-facing systems
Optimize client-side - Performance budgets matter when code runs for hours
Leverage managed services - S3, AppConfig, and existing infrastructure reduce complexity
Measure everything - We track CPU, memory, network, and upload success rates

The result: a plug-and-play system that scales to 50k concurrent users, provides full control over the proctoring experience, and integrates seamlessly into any web application with minimal overhead.