← Back to blog
Behind the ScenesMay 5, 2026

How we use Diffie to test Diffie in CI

Every pull request on the Diffie repo runs through Diffie's own GitHub Actions integration. Here is the workflow file we ship today and the three tests that gate every merge.

We use Diffie to test Diffie. Every pull request opened against our repo runs a small suite of AI browser tests against our dev environment before it can be merged. The suite finished a recent run in 48 seconds with three passes, and the green check on the PR is what gives us confidence to ship.

This post is the actual setup, copy paste ready. One workflow file, two GitHub secrets, three Diffie tests. If you have a Diffie account already, you can replicate the same gate on your repo in about ten minutes.

Why we dogfood

Two reasons. The obvious one is product quality: if Diffie cannot reliably run on Diffie, it cannot reliably run on anyone else's app. The less obvious one is that it makes us live with the same workflow our customers live with. Token rotation, flaky environments, slow previews, broken auth, surprising rate limits. Anything that hurts a customer's morning is going to hurt our morning first, which is exactly the feedback loop we want.

The CI workflow itself is intentionally boring. No custom action, no plugin, no third-party runner. Just curl and jq against two REST endpoints. We want anyone who ever has to debug it at 11pm to be able to read the whole thing in one screen.

The suite that gates every PR

We picked three tests, not thirty. The bar for inclusion was simple: if this flow breaks, no one can use the product, and our oncall has to roll the deploy back. Three flows met that bar.

  1. Diffie AI Login Test  The full Google OAuth round trip into the app. If this breaks, no one can sign in.
  2. Diffie Test Creation and Cleanup  Create a new test through the product UI, then delete it. Exercises the API, the database, the recording flow, and the dashboard.
  3. Diffie Run Test  Open an existing test, run it, and confirm the result page renders. Exercises the runner, the realtime stream, and the result UI.

Together these three take about 35 seconds of wall clock time on the slowest test, and the suite finishes in under a minute. That budget matters. If gating the PR costs five minutes, people start asking to skip the gate. If it costs less than a minute, no one notices it is there until it fails.

Diffie Development Test suite run dashboard showing 3 of 3 tests passed in 36 seconds: Login Test, Test Creation and Cleanup, Run Test.
The Diffie suite run dashboard for one of our CI runs. Three tests, three passes, 36 seconds.

The workflow file

Here is the file we ship at .github/workflows/ci.yml. Token and suite ID are pulled from GitHub repository secrets, never committed.

name: CI

on:
  pull_request:
    types: [opened, synchronize, reopened, ready_for_review]
    branches: ['**']
  push:
    branches:
      - main
  merge_group:

jobs:
  e2e-tests:
    name: diffie-e2e-tests
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'

    steps:
      - name: Run Diffie Test Suite
        env:
          DIFFIE_TOKEN: ${{ secrets.DIFFIE_TOKEN }}
          DIFFIE_SUITE_ID: ${{ secrets.DIFFIE_SUITE_ID }}
          PREVIEW_URL: https://dev.diffie.ai
        run: |
          echo "Running Diffie tests against: $PREVIEW_URL"

          RESPONSE=$(curl -s -X POST \
            "https://api.diffie.ai/ci/suites/$DIFFIE_SUITE_ID/execute" \
            -H "Authorization: Bearer $DIFFIE_TOKEN" \
            -H "Content-Type: application/json" \
            -d "{\"baseUrl\": \"$PREVIEW_URL\"}")

          RUN_ID=$(echo $RESPONSE | jq -r '.suiteRunId')
          RUN_URL=$(echo $RESPONSE | jq -r '.url')

          if [ "$RUN_ID" = "null" ] || [ -z "$RUN_ID" ]; then
            echo "Failed to start suite run"
            echo "$RESPONSE"
            exit 1
          fi

          echo "Suite run started: $RUN_ID"

          while true; do
            STATUS_RESPONSE=$(curl -s \
              "https://api.diffie.ai/ci/suite-runs/$RUN_ID" \
              -H "Authorization: Bearer $DIFFIE_TOKEN")

            STATUS=$(echo $STATUS_RESPONSE | jq -r '.status')
            PASSED=$(echo $STATUS_RESPONSE | jq -r '.passed_tests')
            TOTAL=$(echo $STATUS_RESPONSE | jq -r '.total_tests')

            echo "Status: $STATUS ($PASSED/$TOTAL passed)"

            if [ "$STATUS" = "passed" ]; then
              echo "All tests passed."
              exit 0
            elif [ "$STATUS" = "failed" ]; then
              echo "Tests failed. View details: $RUN_URL"
              exit 1
            elif [ "$STATUS" = "cancelled" ]; then
              echo "Suite run was cancelled."
              exit 1
            fi

            sleep 10
          done

That is the entire integration. Two API calls (start and poll), three exit codes (pass, fail, cancelled), and a sleep. Same script works on GitLab, CircleCI, Jenkins, or anywhere else that can run a shell.

Two GitHub secrets and you are done

Both values come from the Diffie dashboard. Add them under Settings Secrets and variablesActions:

  • DIFFIE_TOKEN  generate a token under Diffie SettingsAPI Tokens.
  • DIFFIE_SUITE_ID  open the suite you want to gate the PR on, copy the ID from the URL.

We never check tokens into the workflow file. The placeholders above use the standard GitHub Actions secrets syntax so the values stay encrypted in your repo settings.

Every PR runs through this gate

We test every pull request like this. The suite's job is not to re-derive the bug the author already fixed. It is to confirm that login, test creation, and the run path still work on the way out. That is the job of a CI gate: stop the obvious cliff, do not pretend to find every cliff.

GitHub pull request page on the Diffie repo showing the diffie-e2e-tests check passed successfully in 48 seconds, with no merge conflicts and the squash and merge button enabled.
The green check on a recent Diffie PR. The diffie-e2e-tests run finished in 48 seconds and gated the merge.

Things that surprised us once we shipped this

A few practical notes from running the gate for ourselves over the last weeks.

  1. Pin your preview URL early.  We test against a fixed dev.diffie.ai rather than per-PR previews. It keeps the suite stable and makes it obvious when a regression is the PR's fault versus an environment hiccup. If you use Vercel or Cloudflare per-PR previews, pass that URL into PREVIEW_URL instead.
  2. Three tests is the right number to start.  We almost added a fourth (billing flow). We are glad we did not. Watching three tests for two weeks taught us where the suite was actually flaky before we doubled the surface area.
  3. Sleep ten seconds, not one.  The first version of the polling loop slept one second. It worked, but it spammed the API, and the logs were unreadable. Ten seconds is fine, the suite still finishes in under a minute, and the GitHub log fits on one screen.
  4. The link to the failed run is the most valuable line in the script.  When a test fails, you click through from the GitHub log to the Diffie dashboard, see screenshots, the trace, and the agent transcript. That click takes triage from minutes to seconds.

What we would add next

The honest answer: not much, and not soon. The temptation is to grow the suite. The failure mode of growing it is a slow gate that everyone learns to bypass. We will add a fourth test the day a fourth flow becomes load bearing for our customers. Until then, three tests, 48 seconds, green check. That is the gate.

If you want to set up the same thing on your repo, the linked guide below walks through the workflow file in more detail, including dynamic preview URL setups for Vercel and Netlify.

Written by Anand Narayan, Founder of Diffie. First engineer at HackerRank, CEO at Codebrahma.

Published May 5, 2026

Run the same gate on your repo

Create a Diffie suite, generate an API token, paste the workflow above. Ten minutes to a green check on every PR.