Testing methodology — The Review Bench

What "long-term" means here

Most consumer tech reviews are written from a one-week loan window. Almost everything that goes wrong with a product at four weeks is invisible at one. Our minimum is four weeks of daily use before publication, and for products that are meant to live in your home or on your person — soundbars, mesh routers, doorbells, password managers, watches — we run a three-month re-test and a twelve-month follow-up.

For comparison reviews and head-to-head evaluations we use parallel testing: each product runs the same workload for the same duration, on the same network, in the same room, with the same controls.

Equipment we use

Audio measurements are taken with a calibrated MiniDSP EARS rig for headphones and a UMIK-1 + REW for soundbars. Network measurements use iPerf3 between a wired Linux machine and clients across multiple Wi-Fi standards (Wi-Fi 6, 6E, and 7); we also run wired-to-wired tests against an Aruba Instant On switch to isolate the radio. Battery measurements use Otii Arc for low-current devices and shunt-resistor logging for higher draws. We disclose any test rig limitation in the review where it matters.

For smartphone, watch, and doorbell battery tests we compare against the manufacturer's stated runtime and report both raw numbers and the delta. Where a product's battery life depends on a specific firmware build, we say so.

Our scoring scale (0 to 10)

We score on a continuous scale from 0 to 10 with one decimal of precision. The scale is anchored:

9.0 and above: exceptional, defines a category. Almost no products live here.
8.0 to 8.9: strong recommendation. Editor's Picks live here.
7.0 to 7.9: good. Worth buying for the right person.
6.0 to 6.9: okay. There are usually better options.
5.0 to 5.9: mediocre. Buy only if there's a specific reason.
Below 5.0: avoid.

We don't believe in a 10. No product we've tested is perfect. We also don't believe a 7.4 is meaningfully different from a 7.5; ratings are an editorial signal, not a measurement.

Pre-registered test plans

For comparison reviews and product re-tests, the reviewer writes the test plan before running it. The plan specifies the workload, the measurement, the expected pass/fail thresholds, and any control conditions. The plan is filed with the editor and dated. Where a result surprises us and we change the plan in flight, we disclose that change in the published review.

Pre-registration matters for software reviews in particular, where it's tempting to add a test after seeing a result. We don't do that without disclosure.

Retest cadence

We re-test at three months for software-driven products (apps, password managers, VPNs, smart-home hubs) because firmware and feature updates change them quickly. We re-test at twelve months for hardware (laptops, headphones, routers) because long-term reliability is something a launch review can't see. Re-tests update the review's "Last updated" date.

For products that change their pricing or business model after we publish — VPNs that pivot, fintech apps that introduce account fees, smart-home platforms that retire features — we update within fourteen days of the change.

What we don't claim to do

We don't run accelerated drop testing or mechanical stress testing. We don't run audio measurements that would require an anechoic chamber. For categories where our equipment is the limiting factor, we say so in the review and note where to look for a more rigorous treatment.

We're a small team. The strength of our reviews is time and attention, not laboratory hardware.