About

Web Neural Network API (WebNN) is a new web standard that enables web applications and frameworks to accelerate deep neural networks using on-device hardware such as GPUs, CPUs, or purpose-built AI accelerators.

Web AI Benchmark is a web application designed to test the performance of deep learning inference on the client side, covering Web Assembly (Wasm), WebGL, WebGPU, WebNN CPU, WebNN GPU, and WebNN NPU backends.

It measures the performance of AI models using the following metrics:

✔️ Build / Compilation Time
✔️ Time to First Inference
✔️ First Inference Time
✔️ Average / Median Inference Time
✔️ 90th Percentile Inference Time
✔️ Best Inference Time
✔️ Throughput

For language models, the following test indicators are not currently supported:

❌ Time to First Token (TTFS / Prefill)
❌ Time Per Output Token (TPOT)
❌ Decode Time
❌ Tokens Per Second (TPS)
❌ End-to-End Time
❌ E2E Tokens Per Second

Other applications to measure language model performance include:

LiteRT.js Public Dev

Wasm · WebGPU 0.2.1

Runtime Web Public Stable Public Dev Internal Dev

Wasm · WebGPU · WebNN

Estimating Memory

LiteRT.js

ONNX Runtime Web

ONNX Runtime Web

Your CPU Model