Renter Loop · metrics

Purpose

This document defines what we measure in the Renter Loop and why — separating the north-star (signed leases at CAC < £350), the leading indicators that tell us whether the concierge mechanism is working in-flight, and the outcome indicators we cannot yet measure but will instrument later.

The design principle is honesty about sample size. At 1-10 signed leases per week, most funnel-stage rates are dominated by statistical noise. The metrics here are structured so that at each volume regime there is a single primary view, with secondary views that become trustworthy as volume grows.

See ADR 0003 — Concierge as defining feature for the framing that separates concierge process quality (measurable now) from outcome quality (measurable later).

Measurement conventions

This section defines the shared vocabulary used across the rest of this file and referenced from ADR 0003. It exists so that qualitative phrases like "sustained" or "approximately" are not left to interpretation when a tripwire is about to fire.

"Sustained"

A condition is sustained if it breaches its threshold for two consecutive measurement windows. The window length is per-metric (weekly for CAC; 8-week rolling for alert-vs-digest; per-sampling-cycle for the process-quality dimensions; the tripwires below state their windows explicitly) but the "two consecutive" rule is uniform across all tripwires in this document and in ADR 0003. A single-window breach is a yellow flag and prompts a closer read of the next cycle; two consecutive breaches fire the tripwire and trigger investigation.

Tolerance bands, not "approximately"

Ratios and rate comparisons carry explicit thresholds (e.g. alert-vs-digest reply-rate ratio ≥ 2.0, tripwire fires below 1.1). "Approximately equal" or "approximately zero" is not a measurement term — it is narrative shorthand for a band that is defined where the metric is introduced.

Signposting — where each control-layer concern is addressed

This file and ADR 0003 together close the control layer. For reviewers checking how specific concerns are handled:

Concern	Where handled
Definition of "sustained"	This section, above
Alert-vs-digest tolerance band (vs. "approximately equal")	Alert-vs-digest differentiation, below
Reply-rate as a weak proxy — value vs attention	Alert-vs-digest differentiation — two-tier: reply-rate primary, positive-intent-signal secondary once sample permits
Bounding "justified extension"	ADR 0003 · Honest brokerage — the preserve-at-least-one-core-dimension and declare-the-flex rule
Escalation trigger conditions	ADR 0003 · Escalation triggers; numeric thresholds in Process-quality layer dimension 5 below
System identity (closed-loop, profile-tightening)	ADR 0003 · Context
Silence vs engagement tension	ADR 0003 · Honest brokerage · Silence clause — silence on no-brief-match days is correct behaviour, not failure, and must not be overridden by engagement targets

North-star metric

Signed leases at CAC < £350, per channel per bedroom type.

This is a gated volume metric: the £350 CAC ceiling is a binary quality gate; the volume target is the operational ambition. A lease above the ceiling still counts as a signed lease, but the channel × bedroom combination is flagged for remediation or retirement.

Targets by stage:

Pre-seed: 1 signed lease per week.
Post-seed: 10 signed leases per week.
Series A: 25 signed leases per week.

CAC convention

CAC is lumpy at low volumes. Spending £500 per expected lease means that £1,000 of spend can precede the first signed lease (instantaneous CAC = £1,000), and the second lease, arriving at cumulative spend of £1,100, drops the cumulative figure to £550. Any single-window CAC number is lying until the sample is large enough to converge. The convention here is therefore to report two CAC views in parallel, with a third view added when platform capability permits.

Primary (stability anchor): cumulative CAC

Definition: cumulative spend since inception ÷ cumulative signed leases since inception, per channel, per bedroom type.

Rationale: stable, monotonically converges toward the true long-run CAC, does not whipsaw with early lumpiness. This is the headline number for investor conversations until the sample reaches ~50+ leases per channel × bedroom combination.

Secondary (trend indicator): rolling 90-day CAC

Definition: spend in trailing 90 days ÷ signed leases in trailing 90 days, per channel, per bedroom type.

Rationale: the responsive number that tells us whether things are getting better or worse. Matches the industry default for early-stage CAC reporting and the quarterly board rhythm. Must always be reported with the lease count in the window so readers can infer sample size.

Future primary: cohort CAC

Definition: for the cohort of renters whose earliest RID click was in month X, spend attributable to that cohort ÷ signed leases from that cohort within a fixed window (proposed: 6 months).

Rationale: the honest number — it allocates spend to the renters that spend actually acquired, rather than to whichever signed leases happen to fall inside an arbitrary reporting window. Becomes primary once renter-loop.attribution.cohort-lease-to-cash is operational with platform support for RID-click-through-to-lease joins. See rid-resolution-for-phone-sourced-conversations below.

Click attribution rule

Clicks are attributed to a renter by earliest RID click event. A renter who clicks Google in January, returns through SEO in March, and signs in April is a January-cohort renter for cost attribution. This rule is cheap to commit to now and determinative for cohort CAC later.

Reporting conventions

Investor updates: cumulative CAC first, rolling 90-day alongside, sample size (lease count) always stated. Once cohort CAC is available it is added as a third view, eventually replacing cumulative as primary.
Internal dashboard: rolling 90-day first (for steering), cumulative alongside (for stability check).
Channel × bedroom segmentation is mandatory. Blended CAC across all channels and bedroom types is a vanity number and is not reported.

To formalise later. This CAC convention is a strategic commitment and should be elevated to an ADR once any element of it is challenged in practice. Candidate: ADR 0005 — CAC reporting convention for early-stage.

Funnel layer

Stage-by-stage telemetry from click to signed lease. Each indicator is annotated with measurability today and the volume threshold at which the stage-to-stage rate becomes trustworthy.

Stage 1 · Click (PPC + SEO, Google)

What it measures: top of funnel reach, by channel, by bedroom type.
Measurable today: yes — Google Ads dashboard (PPC) and Google Search Console (SEO).
Measurable reliably at: any volume — click counts are large.
Volume caveat: click-to-enquiry rate is stable enough to steer on; click-to-lease rate is too deep in the funnel to be meaningful until ~25 leases/week.
Produced by: renter-loop.demand.weekly-spend-review.

Stage 2 · Enquiry (RID created)

What it measures: renters who have completed the initial form and had an RID generated.
Measurable today: yes — website event on form submit.
Measurable reliably at: any volume.
Produced by: renter-loop.initiation.conversation-initiation.

Stage 3 · Qualified (4-field profile complete)

What it measures: renters whose profile contains the minimum qualification fields: location, bedrooms, budget, timescale.
Measurable today: yes — spreadsheet (manual entry); automatable once the database table lands.
Measurable reliably at: any volume.
Produced by: renter-loop.initiation.qualification; ongoing by renter-loop.initiation.profile-building.

Stage 4 · Conversation started

Definition: at least one substantive reply from the renter that is not a closed "not interested" response. A substantive reply is one that signals ongoing engagement (question, preference, availability, correction).
Measurable today: manually — scanning email and WhatsApp threads. Feasible at 1/week; painful at 10/week; requires platform work beyond that.
Measurable reliably at: 10/week with manual effort; 25/week requires automation via platform.conversation-infrastructure.
Produced by: renter-loop.initiation.conversation-initiation, continuously enriched via the operations in Domain C.
Note on definition quality. The definition above is deliberately broad. As volume grows, this stage should be split into sub-stages — information-seeking, qualification-responsive, viewing-intent — because a single "conversation started" count will mask composition changes that matter.

Stage 5 · Viewing scheduled

What it measures: a viewing date has been agreed between the renter and the operator for a specific unit.
Measurable today: yes — spreadsheet (manual entry); automatable when viewing coordination moves into the platform.
Measurable reliably at: ~10/week (viewings × ~3-5 per signed lease = usable sample).
Produced by: renter-loop.conversion.viewing-coordination.
Why it matters: viewing scheduled is the strongest pre-signature leading indicator. The viewing-to-signed rate tightens much faster than earlier funnel stages.

Stage 6 · Signed lease

What it measures: the terminal metric — a lease has been executed on a unit surfaced by Rentiful.
Measurable today: yes — spreadsheet; database table migration planned.
Produced by: renter-loop.conversion.lease-execution-handoff; attributed to cohort via renter-loop.attribution.monthly-lease-attribution and renter-loop.attribution.cohort-lease-to-cash.

Stage 7 · Spend (denominator for CAC)

What it measures: marketing spend attributable to clicks, by channel, by bedroom type.
Measurable today: yes — Google Ads console; manual join to signed leases via RID-to-channel attribution in the spreadsheet.
Measurable reliably at: any volume for the raw number; meaningful as a rate only when paired with signed leases per channel per bedroom.
Produced by: renter-loop.demand.weekly-spend-review; reconciled by renter-loop.demand.attribution-reconciliation.

Process-quality layer

Leading indicators for the concierge mechanism — observable in-flight, before any lease outcome is known. Each dimension is sampled from real email and WhatsApp conversations via the conversation health operation.

Each dimension carries an enforceable operational threshold. The form of the threshold is fixed by ADR 0003; the numeric values below are first-draft starting points, to be tuned from baseline data after ~4 weeks of measurement. Placeholder values are marked TBD and must be set before the first quarterly tripwire review.

Responsiveness. Median time from renter message → substantive reply across the conversation. Today: manual from email and WhatsApp timestamps. Automatable once platform.conversation-infrastructure captures timestamps uniformly across channels.
- Threshold form: median response latency < X minutes for AI-handled replies; < Y hours for human-escalated replies.
- Starting values: X = TBD (proposed 15 min); Y = TBD (proposed 4 business hours).
Profile-updating. Of each renter's corrections (budget, area, configuration, timing), what fraction produces a recorded signal that shapes the next search? Today: manual review; near-term platform dependency — the RID profile must be versioned so updates are auditable.
- Threshold form: ≥ X% of renter corrections produce a recorded signal within one profile-update cycle. Not every signal must mutate a profile attribute (some corrections are idiosyncratic) — but every correction must be recorded.
- Starting value: X = TBD (proposed 90%).
Honest brokerage. Every surfaced option carries an inspectable justification against the renter's current brief; when nothing is justifiable, the concierge stays silent (or sends a nurture touch that does not claim to be a match) rather than pushing synthetic or stretch stock. Justified brief-extension is permitted; brief-violation is not. This is the dimension most at risk under automation because templated responses default to optimism. The honest brokerage architectural constraint in ADR 0003 is the enforcement mechanism; the measurable proxies below are the tripwire tests.
- Primary proxy: of options surfaced in the sampled conversations, ≥ X% carry an inspectable per-renter justification against the renter's current brief. Perfect-fit alerts are held to 100% by ADR; digest items carry the non-alert starting value.
- Secondary proxy: on days when no new inventory meets any active renter's brief, the concierge is observed to stay silent (or issue a non-match nurture touch) rather than send a "match." Silence rate on such days ≥ F% is the honest signal; sustained below the floor is the lying signal. The floor is defined in the silence-rate tripwire below.
- Starting values: X = TBD (proposed 100% for perfect-fit alerts; ≥ 95% for digest items). Silence-rate floor F = TBD (proposed 50%; see tripwire 3 below).
Specific reasoning. Each surfaced option carries an inspectable why this unit for this renter. Today: manual review; partially automatable with AI-generated and AI-graded reasoning.
- Threshold form: 100% of perfect-fit alerts carry explicit per-renter reasoning; ≥ X% of digest items carry explicit per-renter reasoning.
- Starting value: X = TBD (proposed 80% for digest; 100% non-negotiable for alerts, per ADR).
Escalation judgement. Medium, channel, or hand-to-human change at the right moment — e.g. switching to WhatsApp when a deadline tightens, or handing to a human when the renter expresses frustration. Today: manual review. The AI must be genuinely good at this dimension or the whole concierge thesis fails. The escalation-triggers section of ADR 0003 defines when escalation must happen; the numeric thresholds for those triggers live here.
- Trigger: repeated negative signals on surfaced options. Escalate when a renter rejects ≥ K successive surfaced options without the profile tightening (no recorded preference update). Starting K: TBD (proposed 3).
- Trigger: expressed frustration or ambiguity in renter language. Escalate on any of: explicit frustration markers in the latest message, a contradictory signal in the thread (e.g. budget contracts and expands in the same conversation), or confusion markers ("I don't understand", "this isn't what I asked for"). No numeric threshold — a single detection triggers human review of the conversation before the next automated touch.
- Trigger: stuck-conversation firing. When renter-loop.engagement.stuck-conversation identifies a conversation as stuck, escalation is automatic. No threshold — the operation is the threshold.
- Threshold form — dimension quality: ≥ X% of escalations rated "correct timing" in post-hoc conversation-health review; ≥ Y% of stuck conversations include a prior escalation attempt.
- Starting values: X = TBD (proposed 80%); Y = TBD (proposed 70%).
- Failure symmetry: a system that never escalates fails this dimension. A system that escalates on every other message also fails it, but the former is the more dangerous drift under automation pressure, per ADR 0003.
Feedback-loop capture. Every surfaced option produces a recorded signal (too small, too expensive, wrong area, right direction). Today: manual review; platform dependency — option→signal→profile update must be a first-class event in the conversation infrastructure.
- Threshold form: ≥ X% of surfaced options produce a recorded signal within N days of being surfaced.
- Starting values: X = TBD (proposed 85%); N = TBD (proposed 7 days).

Alert-vs-digest differentiation

This is the single falsifiable test of the concierge thesis — see ADR 0003 · Perfect-fit alert as the falsifiable differentiation claim.

Two tiers. Reply is measurable today; positive-intent is the honest check on value rather than attention, and it needs more sample before it becomes load-bearing.

Tier 1 — reply rate (primary today).

Metric: reply-within-24h rate on perfect-fit alerts ÷ reply-within-24h rate on recurring listings digests.
Threshold form: ratio ≥ X, sustained (per Measurement conventions) over a rolling N-week window.
Starting values: X = TBD (proposed 2.0 — alerts must out-convert digests by at least 2×); N = TBD (proposed 8 weeks).
Tripwire band: ratio < 1.1 (from the tripwire section below). Between 1.1 and the 2.0 starting target is yellow territory — tolerable while instrumentation matures, flagged in the quarterly review.
Weakness acknowledged: reply rate is a weak proxy for value. A provocative alert can earn replies without leading anywhere. The tier below is the honest check.

Tier 2 — positive-intent signal rate (secondary, activates with sample).

Metric: rate of positive-intent signals per surfaced option, compared alert-vs-digest. Positive-intent signals are: viewing requests, availability enquiries, specific questions about a surfaced unit (timing, amenity, tenure), and expressed intent to apply. Generic replies ("thanks", "not this one") are not positive intent.
Volume gate: not load-bearing until sample reaches ≥ 20 surfaced-option responses per week per surface type (alert, digest) — i.e. from roughly the 10/week signed-lease regime onward. Before that, report the rate but do not fire tripwires on it.
Threshold form: alert positive-intent rate / digest positive-intent rate ≥ M, sustained over a rolling N-week window.
Starting values: M = TBD (proposed 2.0, matching the reply-rate differentiation target); N = TBD (proposed 8 weeks, once the volume gate is crossed).
Instrumentation: manual tag in the spreadsheet today (each surfaced option gets a positive-intent flag on review); automate once conversation infrastructure captures event-level response classification.

Instrumentation today: reply rate is computable from timestamps in the spreadsheet; positive-intent flags are added on weekly review. Automate both once platform infrastructure captures event-level conversation data.

Sampling cadence by volume

1/week (pre-seed): every conversation reviewed. No sampling needed.
10/week (post-seed): random sample of ~5 conversations per week, weighted toward those that progressed to viewing-or-later and those that stalled at a stage.
25/week (Series A): random sample plus AI-assisted scoring — see ai-assisted-process-quality-scoring below.

Failure tripwires

The six tripwires below instrument ADR 0003 · Failure tripwires — Tier 1 and Tier 2 of the alert-vs-digest test are split into separate rows here for operational clarity. Each tripwire's numeric threshold is reviewed quarterly; if fired, the response is investigation, not optimisation. "Sustained" takes its definition from Measurement conventions above — two consecutive measurement windows. Starting values are first-draft and must be replaced with data-backed thresholds before the first quarterly tripwire review.

Response latency tripwire. Median human-escalated response latency > Y hours sustained over a rolling 4-week window. Starting Y: TBD (proposed 8 business hours; breach threshold is 2× the responsiveness dimension threshold above).
Generic-response tripwire. Share of AI-generated responses containing no renter-specific reasoning > Z% sustained over a rolling 4-week window. Starting Z: TBD (proposed 10%).
Unjustified-options tripwire. Fraction of surfaced options (alerts and digests combined) that do not carry an inspectable per-renter justification against the renter's current brief > Z% sustained over a rolling 8-week window. Starting Z: TBD (proposed 5% — alerts must be at 0%, digests may carry a small unjustified tail while the reasoning chain is maturing). Paired tripwire — silence-rate floor: on days when no active renter's brief is met by new inventory, silence rate < F% sustained over a rolling 8-week window. Starting F: TBD (proposed 50% — if fewer than half of no-brief-match days result in silence or nurture-only, the system is pushing synthetic matches). Either condition fires investigation; simultaneous breach is an honest-brokerage failure of the concierge thesis.
Alert-vs-digest tripwire (Tier 1 — reply). Alert reply-rate ÷ digest reply-rate < 1.1 sustained over a rolling 8-week window. Starting threshold: 1.1 (a stricter "must materially exceed 1.0" test; the differentiation target is 2.0).
Alert-vs-digest tripwire (Tier 2 — positive intent). Alert positive-intent-rate ÷ digest positive-intent-rate < 1.1 sustained over a rolling 8-week window, once the Tier-2 volume gate has been crossed. Before the gate, this tripwire is dormant. After the gate, Tier 1 and Tier 2 fire independently — each is its own investigation trigger.
CAC drift tripwire. Rolling 90-day CAC per channel × bedroom trending up at constant or falling spend over a rolling 8-week window. No absolute value — the tripwire is directional.

Outcome-quality layer (deferred)

Held for future instrumentation once post-signature signal exists. These are not claimed as current metrics; they are named here so that the instrumentation roadmap is explicit.

Retention. 6- and 12-month retention on signed leases. Requires operator-side residency data; partially available via White Label partners' PMS.
Post-move satisfaction. Signal from the renter post-move — a targeted survey, a CSAT prompt, a reply to a "how's the first month going?" outreach.
Referral rate. Share of new renters whose earliest RID touch is attributable to a prior Rentiful renter's referral.

These become primary Renter Loop metrics once sample size is sufficient to support stable rates — likely ≥12 months after reaching 10 signed leases per week, to give the residency window time to produce signal.

Volume thresholds — what is steerable when

Volume	Quantitative metrics that become meaningful	Qualitative only
1/week (pre-seed)	Cumulative CAC per channel × bedroom; lease count; spend	Every funnel stage rate; all six process-quality dimensions (reviewed on every conversation)
10/week (post-seed)	Rolling 90-day CAC; viewing-to-signed rate; qualified-to-viewing rate; click-to-enquiry rate	Click-to-conversation rate (still noisy from creative churn); process dimensions via sampling
25/week (Series A)	Full funnel at channel × bedroom granularity with weekly precision; cohort CAC once platform join exists	Process dimensions via sampling, ideally AI-assisted

At 1/week, the quantitative layer is mostly a count exercise; the steering signal is qualitative. At 10/week, the middle and late funnel become measurable. At 25/week, the whole stack becomes a real steering instrument, and channel reallocation from measured rates becomes a responsible activity rather than a guess.

Platform specs produced by this document

Two capabilities are called out here because they are prerequisites for parts of the metric stack above, not loop-level concerns.

RID resolution for phone-sourced conversations

The RID system today is email-hash based. Conversations happen on email and WhatsApp; WhatsApp is phone-sourced. Until phone-number-to-RID resolution exists, every WhatsApp-first conversation is a split-identity problem for any funnel join between conversation-started and signed-lease. Manually reconcilable at 1/week; a blocker at 10/week. Belongs in the Platform Loop queue as a specific spec.

AI-assisted process-quality scoring

At 25/week and beyond, the six process-quality dimensions cannot be sampled manually at a rate that gives reliable signal. Automated scoring of each dimension — at least for responsiveness, specific reasoning, and feedback-loop capture, which are structured enough to be graded; less so for honest brokerage and escalation judgement, which require judgement — becomes a platform capability. Should be specified once the sampling operation is operational and a corpus of human-graded conversations exists as training/reference data.

Open items for founder review

The four-field qualification definition (location, bedrooms, budget, timescale) is correct today but will become insufficient as the product matures. A revision threshold (e.g. "add timing specificity, furnished/unfurnished, pet preferences when sample exceeds N") should be noted in the qualification operation itself.
The £350 CAC ceiling is the current guardrail. If the ceiling is ever raised or lowered, the change should be recorded in this file with a dated rationale, because CAC ceiling changes are a material strategic decision.
The CAC reporting convention here should be promoted to an ADR at the point where any element of it is challenged in practice, or before the first formal investor reporting cycle — whichever comes first.