About
Web Neural Network API (WebNN) is a new web standard that enables web applications and frameworks to accelerate deep neural networks using on-device hardware such as GPUs, CPUs, or purpose-built AI accelerators.
Web AI Benchmark is a web application designed to test the performance of deep learning inference on the client side, covering Web Assembly (Wasm), WebGL, WebGPU, WebNN CPU, WebNN GPU, and WebNN NPU backends.

It measures the performance of AI models using the following metrics:

  • ✔️ Build / Compilation Time
  • ✔️ Time to First Inference
  • ✔️ First Inference Time
  • ✔️ Average / Median Inference Time
  • ✔️ 90th Percentile Inference Time
  • ✔️ Best Inference Time
  • ✔️ Throughput

For language models, the following test indicators are not currently supported:

  • ❌ Time to First Token (TTFS / Prefill)
  • ❌ Time Per Output Token (TPOT)
  • ❌ Decode Time
  • ❌ Tokens Per Second (TPS)
  • ❌ End-to-End Time
  • ❌ E2E Tokens Per Second

LiteRT.js

Select the version of LiteRT.js to be tested.
LiteRT LiteRT.js
Wasm · WebGPU 0.1.0

ONNX Runtime Web

Select the Dev version of ONNX Runtime Web to be tested.

ONNX Runtime Web

Select the Stable version of ONNX Runtime Web to be tested.
Runtime Web
Wasm · WebGPU · WebNN

Your CPU Model

This website does not collect any test equipment information or user data. It is intended solely for your local use.
Estimating Memory