Methodology

How the data on this site is sourced, transformed, refreshed and validated.

Primary sources

SourceWhat we getRefresh
HM Land Registry Price Paid Data (pp-complete.csv)Full register, ~30M residential transactions since 1995Monthly
HM Land Registry Price Paid monthly updateMonthly delta (added/changed/deleted records)Daily check
HMLR reuse termsOGL-aligned licence governing reusen/a

Refresh process

  1. HMLR publishes the complete Price Paid CSV monthly. We pull pp-complete.csv directly from HMLR's S3 mirror within 24 hours of release.
  2. The CSV has 16 columns, no header. Each row is parsed: dates from YYYY-MM-DD HH:MM to ISO, transaction UUIDs unwrapped from braces, district names slugified for clean URLs.
  3. Rows stream into a new table transactions_new via Postgres COPY. No row-by-row inserts.
  4. Indexes are built on the new table after the load completes (GIN tsvector for address full-text, GIN trigram for fuzzy, btree for postcode / outcode / district / date).
  5. A single atomic ALTER TABLE ... RENAME swaps the new table in. The old table is preserved for 24 hours as a rollback safety.
  6. Local-authority aggregates (counts, medians) are recomputed.
  7. Total runtime: roughly 20-30 minutes on the production Hetzner box. The user-facing site stays available throughout.

What the data contains

Per transaction we hold:

  • HMLR transaction UUID
  • Price paid (in pounds), transfer date
  • Postcode, address fields (PAON, SAON, street, locality, town, district, county)
  • Property type (detached / semi / terraced / flat / other)
  • New-build flag, freehold / leasehold
  • HMLR's "PPD category": A for standard transactions, B for additional types (right-to-buy, repossession, deeds of gift, etc.) which can distort comparisons

No owner names. No PII at record level.

Known limitations

  • Coverage is England and Wales only. Scotland (Registers of Scotland) and Northern Ireland (Land Registry of Northern Ireland) publish separately and are not in this dataset.
  • HMLR lags new builds and shared-ownership sales by up to ~6 months from completion. Very recent sales may not be visible yet.
  • PPD category B (additional transactions) is included for completeness but should be excluded when computing typical-market medians. The data we surface flags this.
  • HMLR's address fields (PAON, SAON, street) follow strict capitalisation conventions but do not normalise "Flat 1" vs "1A" consistently. Some properties have multiple address representations across their sale history.
  • We do not yet cross-reference with the EPC register; that's on the build roadmap.

Corrections

Spot an error in how we display a sale? Email [email protected] and I'll fix or remove the issue within 48 hours.

For corrections to the underlying HMLR record (an address typo, a wrong price), contact HMLR directly. Our copy refreshes monthly and your correction propagates automatically.

More