Chapter 8

Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar and the Superset-Language Framing
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

NOTE

Length-rewrite (run 2606010800)

This chapter was rewritten under the thesis length-rewrite pathway (contract v1.4.0). The eleven sec_* source modules were redesigned into fourteen modules (Option A; the framing pair, the corpus pair, and the two-part topological and configurational sections), composed afresh from the Stage-B claim registers. Of the frozen pre-rewrite source files, only the eight tbl_* table modules remain on disk as audit-trail records; no sec_* or fig_* source modules were retained. The canonical Chapter 8 is the fourteen ch--artefact_empirical--* module files listed above.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction (this section)
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.2 The problem: bridging theoretical commitment and empirical grounding

A formal notation for architectural floor plans makes implicit commitments about the world it describes, however internally consistent it is on its own terms. The nine-module taxonomy formalised in Chapter 6 and the planimetric grammar of Chapter 7 commit this study to a specific decomposition of the Australian residential dwelling. Entrance sequences, circulation corridors, sanitary zones, sleeping spaces, living areas, kitchen and service spaces, and outdoor extensions are treated as semantically distinct, spatially countable units standing in stable topological relations. The procedural generator of Chapter 9 commits further still: it must produce configurations that are simultaneously grammatically well-formed, dimensionally plausible, room-compositionally credible, adjacency-respecting, and geometrically diverse. Neither the notation nor the generator can discharge these commitments from within a purely theoretical frame. Both require an empirical substrate: a rigorously constructed, independently validated body of evidence about the dimensional, spatial-frequency, topological, and configurational regularities of Australian residential construction, at a scale sufficient to expose stable patterns rather than anecdote.

We build that substrate here. The chapter answers a question that precedes the generator and cannot be deferred to it: what are the empirically grounded dimensional, topological, and configurational regularities of Australian residential construction that a procedural floor-plan generator must respect? Two evident routes prove inadequate. Reading the regularities off design standards will not suffice, since the SDA Design Standard analysed in Chapter 5 specifies regulatory minima rather than market norms. Synthesising the existing literature is equally unworkable, because the Australian residential evidence base at the scale and specificity required here has not yet been assembled. We therefore answer the question by construction: two purpose-built corpora, four independent analytical pipelines run against them, a verdict-loop audit of each pipeline’s headline finding, and four handoff contracts packaged for Chapter 9. The chapter’s contribution is that construction. The empirical substrate is a research artefact in its own right.

The chapter’s position within the artefact suite differs structurally from the others, and the difference is worth stating plainly. The standardisation schema (Chapter 5) analysed a pre-existing regulatory document; the Governed Kernel Architecture (Chapter 6) specified a taxonomy derived from that analysis; the notation (Chapter 7) developed a formal grammar grounded in the taxonomy. Each worked from a defined, bounded source object. The empirical substrate, the present chapter, works from data that did not exist in usable form at the outset, and that had to be collected, cleaned, extracted, validated, and sealed before any analysis could begin. The generator (Chapter 9) then builds the procedural engine that consumes its outputs. In Hevner, March, Park, and Ram’s terms, the substrate contributes to the knowledge base of the design science research cycle rather than drawing on it: it populates that base with purpose-built empirical artefacts, and does so in a way that makes them available to, and auditable by, the design artefact they support.¹ In Gregor and Hevner’s contribution typology the result is an improvement: a mature theoretical frame (modular co-ordination, access-graph topology, polyomino combinatorics) applied in a new and non-obvious way to produce a concrete output with externally verifiable properties.² In Wieringa’s vocabulary it performs empirical validation: it gathers data about the world the design must fit, characterises that data with enough rigour to support formal claims, and declares the scope within which those claims hold.³ Both framings carry the same implication: the substrate must be documented so completely that an examiner, or a successor researcher, could in principle re-run the analytical pipeline and recover the same evidence. The chapter’s apparatus (sealed corpus, pinned scripts, verdict-loop record, lineage dossier) is built to that standard.

8.3 What the substrate establishes: one compound claim, ten instantiations

The chapter’s central argument is a single compound claim, and the ten numbered claims developed across Sections 8.19 to 8.50 are its instantiations rather than ten separate findings. The claim is this: Australian residential dwellings, examined at scale through four parallel evidence pipelines, expose a set of dimensional, spatial-frequency, topological, and configurational regularities that are not self-evident from design standards or theoretical models, yet are stable enough, surviving the fourfold expansion of the corpus and robust to the removal of any one product category or any single graph, to serve as the parameter base for a procedural floor-plan generator. Each pipeline supplies one face of that claim.

The dimensional pipeline (Section 8.19) establishes that the retail dimensional vocabulary is dominated by a 50 to 100 mm metric grid, standard sheet and door sizes, with 25 mm a common sub-module; this is consistent with, but not unique to, ISO 2848:1984.⁴ Across 40,342 dimension values parsed from 23,048 products in a national retail catalogue, and searching candidate base modules with no normative floor while scoring lift over a rounded-to-5 mm reporting-convention null, 50 mm carries the highest lift of any standard grain in every cohort, 100 mm the next, and 25 mm registers as a genuine but secondary sub-module: a ranking robust to the removal of any single product cohort and to a matching-tolerance sweep, and disturbed nowhere in the search except by the fine divisor m = 4 mm, which out-lifts 50 mm by 0.002 in the building cohort at exact match alone and survives no tolerance. The occurrence pipeline (Section 8.26) establishes the frequency structure of the space vocabulary: 14,554 individual spaces across 745 plans (room-grain convention, under which a built-in robe is recorded as a fixture rather than an enumerated zone), distributed over 72 categories in a markedly heavy-tailed pattern whose top-ten membership is largely stable across a 4.3-fold corpus expansion, from which we derive a three-tier required/common/rare classification. The topological pipeline (Section 8.33) establishes the access-graph coupling structure: a census required-adjacency map of 49 reliably co-located category pairs and 50 common-but-optional ones, together with a residual set of pairings that a codification-development observation records as consistently avoided: a descriptive dispreference signal from the early codification, largely uncodified in the National Construction Code, that the generator carries as a soft bias rather than a prohibition. The configurational pipeline (Section 8.42) establishes the combinatorial structure of the arrangement space: fifty canonical module arrangements and one hundred canonical packings on an 8×12 boundary, each characterised by three geometric indices. The integration argument (Section 8.50) encodes the four outputs as the handoff contracts HC-8A through HC-8D and audits their mutual coherence through six pairwise probes, declaring the two that the current substrate cannot test rather than absorbing them into hedging. Of the ten claims that operationalise the compound claim, six resolve as supported at high confidence and four as declared-limited at medium-to-high confidence with the specific limit named; none is unsupported.

8.4 The research question and its four-pipeline answer

The question driving the chapter is an enabling one. The study’s primary question concerns how SDA floor plans should be formally represented for compliance assessment and procedural generation, and Chapters 5 through 7 addressed it at the level of artefact construction. Given that representational commitment, Chapter 8 asks what empirical regularities the downstream generator must be built to respect. The question carries four natural sub-questions, one per pipeline. The dimensional sub-question asks at what metric grain the generator’s spatial units should be defined so that its output is commensurable with the Australian product environment. The occurrence sub-question asks which space categories appear often and reliably enough to count as required components of a valid plan, and which are optional or exceptional. The topological sub-question asks which adjacency relationships are stably required, stably forbidden, or unconstrained in practice, irrespective of what standards prescribe. The configurational sub-question asks how many geometrically distinct two-dimensional arrangements of the module set are feasible within a residential boundary, and whether they exhibit enough metric diversity to constitute a useful design space.

Each sub-question is answered by a dedicated corpus, a dedicated instrument, and a dedicated validation protocol: a coverage analysis over a product corpus, a frequency analysis over the 745-plan floor-plan census, a coupling-matrix classification over the access graphs of that same 745-plan census, and a systematic polyomino enumeration. The architecture of four independent pipelines converging on one substrate-level claim is the chapter’s principal safeguard against a failure mode that a single-pipeline design would invite: over-committing the generator to the artefacts of one instrument. A dimensional corpus cannot establish which adjacencies are required; a topological corpus cannot establish the metric grain. The four pipelines address incommensurable questions and deliver incommensurable answers, and it is their joint coherence, which the six probes of Section 8.50 audit, that licenses the substrate to be read as a single artefact rather than four separate studies. The reader who has formed a view on whether an empirical substrate chapter is warranted may turn directly to Section 8.19; the reader approaching it fresh should proceed in order, since the methodological decisions of Sections 8.5 and 8.10 constrain the interpretation of all the evidence that follows.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn (this section)
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.6 Why theoretical specification is not enough

The artefact suite developed in Chapters 3 through 7 establishes what a representational governance system for adaptation-heavy housing must do and how it must be formally constituted. Chapter 6 specifies the nine-module taxonomy and its library entries; Chapter 7 formalises the two-layer planimetric notation through which both module constraints and plan-level compliance can be expressed.⁵ Taken together, those chapters make a theoretical claim: that a representational governance system of the kind described is possible, internally consistent, and formally grounded in the regulatory corpus it is meant to govern. That claim is necessary, but it is not sufficient. A system that can be formally specified differs from one that can be practically instantiated, and the difference is exactly the set of values the specification leaves open. Whether the base grid should be 25 mm, 100 mm, or 300 mm; which of the nine module types appear in a high proportion of compliant dwellings and which appear only occasionally; which adjacency pairs are empirically required, which forbidden, and which neither, none of these is fixed by the modularity mechanism of Chapter 3 or the taxonomy of Chapter 6. They are settled, if at all, by the actual stock of Australian residential construction rather than by an idealised model.

This dependence follows from the design science research framework in which the study operates.⁶ Hevner and colleagues distinguish three components of the design cycle: the environment supplies the problem, the knowledge base supplies the kernel theories and methods, and the design artefact solves the problem. Chapter 3’s theoretical framework is the kernel theory; Chapters 5 to 7 are earlier artefacts in the sequence. But neither the kernel nor the earlier artefacts can supply the design’s empirical parameters; those must come from the environment, through a structured engagement with the domain’s empirical record. In March and Smith’s artefact typology the substrate is a model, a representation that abstracts and specifies the target system’s properties.⁷ Wieringa sharpens the same distinction: technical validation establishes that an artefact is internally coherent and meets its specification, whereas empirical validation establishes that the specification is drawn from, and corroborated by, the domain it claims to govern.⁸ Chapter 8 is the study’s primary empirical-validation layer. Without it, the dimensional constants, frequency thresholds, adjacency rules, and arrangement constraints of Chapter 9 would be technically valid yet empirically ungrounded, calibrated against a hypothetical dwelling rather than against the dwellings actually built and registered under the scheme. The objection has force, and the chapter is built to answer it: it converts “on theoretical plausibility” into “on the empirical record of 745 Australian residential floor plans, corroborated by product-geometry data whose metric grid is consistent with ISO 2848 among other mutually-aligned referents.”

8.7 What an empirical substrate is, and what it contributes

The term empirical substrate needs a precise definition, because it occupies a bounded position easily confused with adjacent categories. The substrate departs from a literature review in that it constructs new knowledge from original data rather than synthesising existing claims; from a design proposal in that it introduces no new grammar, taxonomy, or notation; and from an evaluation of the Chapter 9 generator in that it predates the generator and supplies its inputs. What it is, in March and Smith’s terms, is a developed artefact, a model of the domain, which required corpus construction, analysis, and validation work that could not be skipped without forfeiting the empirical-validation argument. In Gregor and Hevner’s contribution typology the result is an improvement of a specific kind: the substrate exapts established methods (building-product geometry analysis, floor-plan topology, and polyomino arrangement enumeration) and applies them, in coordination, to supply empirically grounded parameters for a novel artefact class.⁹ The novelty is the coordination, not any single method; what makes the resulting parameter values defensible as a doctoral contribution rather than informed guesses is the governance protocol under which they were derived.

That protocol, rather than administrative overhead, is the feature that distinguishes an empirical substrate from a background chapter. Three requirements carry the weight. First, every corpus plan must be extractable under a sealed, published calibration ruleset, so that the extraction is reproducible in principle.¹⁰ Second, every claim derived from the corpus must be registered, versioned, and assigned a confidence level and a verdict-loop count.¹¹ Third, every parameter passed to Chapter 9 must travel through a formal handoff contract that records the value alongside its derivation conditions, the scope limits that apply, and the verdict-loop evidence certifying its stability.¹² The protocol also defines the substrate’s failure mode. Because the four pipelines are independent triangulations rather than a sequential derivation chain, the substrate fails gracefully: if one pipeline’s data is compromised or its method is later found unsuitable, the handoff contract for that pipeline alone is withdrawn, and the other three are unaffected. That structural independence follows from treating the four parameter classes (dimensions, frequencies, adjacencies, arrangements) as logically irreducible to one another.

8.8 Four questions, four pipelines: the triangulation strategy

The generator of Chapter 9 requires four distinct classes of parameter, and each class is naturally answered by a different data source. Dimensional parameters are product-geometry questions: what are the actual dimensions of Australian building products, and at what grain do they converge on a common grid? Floor plans do not record the nominal dimensions of studs, plasterboard sheets, or glass panels; they record the finished room dimensions that result from assembling those products. The dimensional pipeline therefore draws on a product corpus, the Bunnings residential catalogue, analysed under a lift criterion that identifies which candidate base modules the product dimensions favour over a rounded-to-5 mm reporting-convention null, with no normative floor imposed on the search. Frequency parameters are house-type questions: what proportion of dwellings include a bedroom, an ensuite, a separate living space, a covered outdoor area? Product catalogues record component dimensions, not which room types co-occur; the frequency pipeline therefore draws on the 745-plan floor-plan corpus, counting module-type occurrences per plan and threshold-classifying them against the frequency distribution. Topological parameters are adjacency questions: which room-type pairs are consistently connected by a door or opening, which consistently are not, and which show no systematic pattern? Frequency is necessary but not sufficient (knowing that ninety-one per cent of plans contain both a bedroom and an ensuite does not say whether the two are directly connected), so the topological pipeline computes graph connectivity on per-plan access graphs and pools them into a coupling matrix. Configurational parameters are spatial-arrangement questions: in what arrangements can the module instances be placed within a valid boundary? A corpus records the arrangements that exist; it does not enumerate all that are feasible under the governance system’s constraints, so the configurational pipeline uses a combinatorial enumeration over the arrangement space the topological constraints define.

The independence of these four data-source requirements is what motivates the four-pipeline architecture, and it must be stated with one important qualification. The pipelines are fully independent at the level of the analytical instrument, but not at the level of the data source. The Dimensional pipeline operates on the retail-product corpus and the Configurational pipeline on a synthetic enumeration; these two are independent of each other and of the floor-plan corpus. The Occurrence and Topological pipelines, however, both operate on the 745-plan floor-plan corpus, so any systematic bias in the floor-plan extraction would propagate to both. They are independent at the analytical-question level, frequency versus adjacency, but share a common data-source failure mode, and we treat them accordingly: coherence between Occurrence and Topological is necessary but not sufficient, since coherence under shared bias is not corroboration. The cross-pipeline checks that carry the real triangulation weight (Section 8.50’s P-DT-1, P-DC-1, and P-DO-1) are those whose data sources do not overlap. The reward for the architecture is examiner-grade accountability: each handoff contract can be inspected on its own, its data source and derivation method understood, and its specific scope limits assessed, without those limits contaminating the other three.

8.9 The substrate’s position in the study’s arc

Within the study’s arc, the substrate carries consequences for three downstream chapters. Its role for Chapter 9 is direct: the procedural generator inherits the four handoff contracts and must be consistent with all four. HC-8A’s 50 to 100 mm base grid, with 25 mm as its finest sub-module, sets the generator’s metric resolution; HC-8B’s frequency thresholds determine how its module library classifies each module as required, optional, or absent; HC-8C’s interaction rules determine which adjacency pairs its constraint engine treats as required and which it biases against as dispreferred; HC-8D’s configurational space determines the class of arrangements it may produce. Were any of these absent, the generator would be implementable but empirically unmotivated, and an examiner reviewing Chapter 9 in isolation could not tell whether its parameters reflected Australian practice or an arbitrary specification. Its role for Chapter 10 is evaluative: the benchmarks against which generated plans are compared are not Chapter 10’s to construct but the substrate’s to supply: the 50 to 100 mm metric grid with its 25 mm sub-module, the frequency tiers, the required-adjacency structure and the soft dispreference signals, the arrangement feasibility range. Its role for Chapter 11 is generalisability: the external validity of the study’s conclusions rests in part on the scope-limit declarations of Section 8.63, which state, with evidence, the boundaries beyond which each finding may not transfer. The positioning statement is therefore precise. Chapter 8 is the empirical pivot that separates the theoretical mode of Chapters 3 to 7 from the generative and evaluative modes of Chapters 9 to 11, the point at which the study moves from asking what the governance system must look like to asking what it must respect. That second question is the one that grounds the system in the material facts of Australian residential construction, and answering it is what this chapter does.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration (this section)
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.11 Why a corpus, and why this one

The empirical claims developed across Sections 8.19 to 8.42 depend, prior to any analysis, on an evidentially adequate substrate. A representational governance theory for Specialist Disability Accommodation cannot be defended at submission level on hand-picked exemplars or a single jurisdiction’s design-guide imagery. It needs a corpus large enough to expose distributional regularities, narrow enough to retain regulatory specificity, and documented in enough detail that an examiner, or a successor researcher, could in principle re-run the pipeline and recover the same evidence.¹³

The substrate is a single corpus: a census of 745 Australian residential floor plans, extracted agent-manually, on which the topological pipeline of Section 8.33 and the configurational pipeline of Section 8.42 jointly rest. It is a complete enumeration of the screened, successfully extracted stock rather than a probability sample, and it is built as one corpus in two construction strata, 572 plans from the October stratum and 173 from the August stratum, which differ only in when and from which portal collection they were drawn, not in how they were treated. A second, smaller corpus supports a different evidence stream: the Bunnings residential-product survey underpinning the dimensional evidence of Section 8.19, whose construction logic (a single-retailer crawl, deterministic parsing of title-stated dimensions, and governed arbitration of the ambiguous cases with no normative search floor) is documented in the dimensional-corpus appendix bundle. Both raw and cleaned counts appear below; the substantive methodological work of this section concerns the floor-plan census.

Corpus	Raw count	Cleaned count	Cleaning applied	Units
Bunnings product survey	49,300 products	40,342 values / 23,048 products	deterministic title-parse; governed arbitration of ambiguous values; ~48% of products state no title dimension	mm
Floor-plan census (October + August strata)	746	745 plans	−1 missing image; agent-manual extraction under v2.0 ruleset	access edges

The choice of 745 is a design decision answering to two warrants, neither of which is a claim to statistical representativeness. The first, and the operative one, is analytical adequacy: the corpus is large enough to expose the distributional regularities of Sections 8.26 to 8.42 and to stabilise the coupling structure of Section 8.33, yet small enough that each plan can be read and extracted under the per-plan agent-manual rigour the method demands, within the wave budget the critical pathway allows. The second is coverage context, offered for credibility rather than inference. As a rough indication of scale, 745 plans enumerate on the order of one in ten thousand of Australia’s separate-house stock: the 2021 Census records 10,852,208 private dwellings, of which 70.1 per cent, close to 7.6 million, are separate houses.¹⁴ That figure speaks only to the corpus’s non-triviality; it is explicitly not a basis for generalising to the national stock, because the corpus is a census of the assembled set rather than a probability sample of Australian housing, and the descriptive-only ceiling of Section 4.3, no inferential statistics anywhere in the empirical chapters, holds unchanged.

The restriction to Australian listings is itself a deliberate scope decision, declared as scope limit SL-02 and treated formally in Section 8.63. Its warrant is specificity: the four Design Categories prescribed in the SDA Design Standard (Improved Liveability, Fully Accessible, Robust, and High Physical Support) are not specified in any corresponding non-Australian regime, and their distinguishing configurational signatures (separate overnight-assistance rooms, two-way circulation between living and wet areas, ceiling-hoist-compatible bedroom-ensuite couplings) are not reliably present in the international general-residential stock.¹⁵

The corpus arrived at this form through deliberate piloting, and saying so is part of the evidence rather than an aside. The extraction protocol that makes the census trustworthy (the nine-module space taxonomy, the ten-principle ruleset, and the coupling-classification method the rest of the chapter relies on) was not specified in advance and applied blind; it was worked out on small batches first, in the ordinary design-science manner of building a method, evaluating it against real plans, and refining it before committing it at scale. Three pilot stages did that work. An initial methodology pilot trialled an API-driven extraction route and found it inadequate, which prompted the pivot to agent-manual extraction that Section 8.12 sets out. A first-batch validation pilot then processed the opening plans of the corpus to confirm that the agent-manual method held and to calibrate the ruleset against the cases it had to decide. The remainder of the corpus was processed in roughly fifteen monitored waves of about fifty plans each, every wave gated by the nine-loop protocol of Section 8.13 that refined the principles from version 1 to version 2.0 and audited each wave’s closure before the next began. The 745-plan census, processed under that calibrated protocol, is the evidence the chapter carries forward.

8.12 The methodology pivot: agent-manual over API extraction

The chapter’s most consequential methodological commitment is the use of agent-manual extraction by a documentation-bearing multimodal-vision agent in place of API-driven floor-plan extraction.¹⁶ The case for it is empirical, not stylistic, and it is the first thing the piloting settled. The methodology pilot trialled an API-driven extraction route, batching plan images through an external service, and abandoned it: the API route returned roughly 18 per cent pre-repair connectivity, against roughly 62 per cent for the agent-manual route processing the same kind of plan, and it did so at non-zero cost where the agent-manual route runs at zero API cost. Under the agent-manual protocol each plan is instead processed in-session by the master agent or a sub-agent reading pre-hydrated images directly, with no API calls and no external batch service, and across the full corpus the protocol yields 745 of 745 strict-schema-valid outputs.¹⁷

The two connectivity rates are not a head-to-head measurement, and we do not read the pivot as one. The 62 per cent agent-manual figure and the API-pilot figure were computed under different opening definitions and different handling of windows and outbuildings, so they are not directly comparable as rates; the comparison that actually carried the decision is whether the extracted graphs were correct on the principles (whether each space, opening, and connection in the plan was recovered as the ruleset of Section 8.13 requires) and the agent-manual route was the one that produced principled, auditable graphs at all. The 62 per cent figure itself warrants a framing note for the same reason. It is computed against the access-graph projection onto the canonical-nine module-type taxonomy of Chapter 6, Section 6.2, not against the full 72-category space-label vocabulary the schema admits. Under the canonical-nine frame a graph is connected when every space present is reachable from every other through the extracted opening edges; under the 72-category vocabulary the same graph would more often read as disconnected, because rare label combinations admit outlying spaces (remote outbuildings, alfresco fragments) that the canonical-nine projection absorbs into their parent aggregate. The post-repair figure is reported against the same canonical-nine frame, so any uplift is a within-frame comparison rather than a re-keyed metric. We treat the agent-manual commitment as substantive rather than expedient: the API route is recorded as an Anti-Pattern Register entry, and the full run-config, prompt, and schema bundle is published in the generator-specification appendix so that the extraction can be reproduced under the same harness.¹⁸

8.13 The ten-principle calibration ruleset

The substance of agent-manual extraction is governed by a sealed ten-principle ruleset (v2.0; approved 2026-04-26), each principle binding the extracting agent to a specific decision rule sourced to either an August-pipeline prompt, the Chapter 6 module taxonomy, or the PlaniSyn coupling specification of Chapter 7.¹⁹ In order, the principles are: every distinct labelled element is a space, with its own identifier; the nine-module taxonomy (ENT, CIR, SAN, BED, LIV, KIT, SVC, EXT, DWL) is the closed semantic vocabulary for category normalisation, derived by Jaccard clustering of the 611-clause SDA corpus at a height of 0.35 with a silhouette of 0.72; boundary types map to a wall / implied / partial / mixed lattice; spatial-type function distinguishes habitable, circulation, service, utility, storage, parking, and outdoor; outdoor spaces connect only to internal spaces, never directly to a generic exterior; every door, window, or marked aperture is an opening; connections are bilateral, so every internal space appears in at least one opening’s connection list; vertical-circulation segments are spaces; the five known schema simplifications relative to the August pipeline are documented with explicit workarounds; and print-text fidelity is preserved exactly while the lowercase semantic category is carried separately. The ruleset reached version 2.0 through the piloting rather than by fiat: the first-batch validation pilot exposed recurring patterns the version 1 rules under-determined, and three sub-principles were added at the v2.0 amendment to resolve them: the treatment of compound open-plan zones, of accessory spaces such as walk-in robes and pantries, and of the alfresco / porch / balcony distinction. The amendments were validated against the principles register before merge, with seventeen of twenty-two affected plans passing and none failing.

The ruleset is layered with a nine-loop monitoring protocol (per-plan diary, per-wave quality assurance, principles refinement, categoriser coverage, user checkpoint, drift detection, schema-revision-flag accumulation, reproducibility spot-check, and per-wave rigour-gate enforcement) and it is this protocol that governs the roughly fifteen processing waves through which the census was built. The ninth loop is the structural barrier that converts the prior eight from session-discretionary into plan-binding: a wave cannot launch until the preceding wave’s fourteen closure artefacts (the per-plan files and diaries, the wave quality log, the spot-check and reproducibility records, the verdict and decision forms, the flag counts, the convergence tracker, and the signed closure checklist) are present with no outstanding items and validated by the closure script. The presence of that gate is the feature that makes agent-manual extraction defensible at doctoral rather than research grade: it is what lets the per-plan reasoning capacity of the vision agent stand in direct, audited contact with each image without the discipline of the process resting on goodwill.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing (this section)
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.15 The five-stage extraction pipeline

The extraction pipeline runs in five stages, each with a defined input, transformation, output, and observed pass rate. The full data flow is shown in the figure below.

fig_corpus_construction_pipeline — **Corpus construction data flow** Data flow from raw sources to clean, analysis-ready datasets. Left branch: Bunnings product pages are scraped and parsed deterministically for title-stated dimensions; governed arbitration of ambiguous values yields 40,342 validated dimension values across 23,048 products. Right branch: floor-plan listing images are agent-manually extracted to space-and-opening records and normalised to `normalized_merged_project_data.json`; a connectivity audit retains each plan’s largest connected component, producing 745 single-component access graphs, the complete census on which the Section 8.33 topological analysis runs. Both branches feed the four evidence streams of Sections 8.19 to 8.42Source: lineage dossier v5.0; `cp_d5_merged_corpus_topology.py`.

Stage 1: ingestion. The input is the set of source listing pages, 745 paired image-and-metadata records drawn from state-level and SDA portal collections. Files-On-Demand image stubs are pre-hydrated so the in-session read resolves the bytes locally. A single missing-image case on one plan is retained in the manifest as a skipped record, so that the manifest preserves all 746 original rows while the effective extracted corpus is 745, a pass rate of 745 of 746.²⁰

Stage 2: vision extraction. The agent reads each hydrated image, applies the ten principles and their sub-principles, and writes a strict-schema response of spaces and openings sharing one identifier space. Each plan yields a per-plan record and a per-plan diary recording the visual scan, the principles invoked, the edge cases, and any schema-revision flags raised. The pass rate is 745 of 745 strictly schema-valid.

Stage 3: schema validation. A local validator parses each record, checks identifier-space referential integrity, and records diagnostics. There are zero parse failures across the 745 plans; schema-revision flags accumulate without blocking validation.

Stage 4: repair by largest-connected-component retention. For plans whose access graph is multi-component before repair, the largest connected component is retained as the canonical graph. The repair preserves all extracted plans as single-component graphs; the corpus implements no random-missing exclusion category. Marginal pre-repair connectivity across the census is 57.85 per cent (431 of 745). That marginal figure is stratum-conditioned rather than a methodological regression, and the conditioning is legible in the data: the subset of plans without detached outbuildings (472 plans) reaches 90.89 per cent (429 of 472), against the first-batch validation pilot (the 0-detached subset of the first wave, 33 plans) at 93.94 per cent. The shortfall is concentrated in the rural-acreage, multi-shed, and not-in-position-outbuilding strata that the census admits in greater proportion; it is not evidence of degraded extraction. These rates are reported descriptively and are not like-for-like with the connectivity figures of the development pilots, which were computed against earlier codifications of the access graph; for that reason no rate is set against another as a difference, and none carries an interval or test.²¹

Stage 5: topology computation and merge. Per-pair weights and co-occurrence counts are computed across all 745 graphs, and each realised category-pair is classified as hard (weight ≥ 100 and co-occurrence ≥ 20), soft (20 ≤ weight < 100), absent (weight = 0 with co-occurrence ≥ 20), sparse, or insufficient. Across the census the pipeline yields 72 unique space categories, 32,594 directed access edges, and 680 unique adjacency patterns; the classification returns 49 hard pairs, 50 soft pairs, and 581 insufficient pairs, with no realised absent or sparse pairs.²² Why the census records no absent class (sensitivity finding S5-3, which turns on the census instrument recording co-presence only through realised adjacency) is taken up substantively in Section 8.33.

8.16 The August-plus-October merge

The 745-plan census is constructed in two strata: the August stratum (173 plans) and the October stratum (572 plans), both extracted under the v2.0 principles, with provenance, merge identifiers, and seal timestamps enumerated in the floor-plan-corpus bundle. The two strata are a construction detail of the one census, not two corpora; their stems carry no cross-overlap, and filename-stem deduplication is effective at construction time, so the merge produced no plan-level identity collisions. The census coupling matrix is read over the space of categories the census actually realises: 680 realised category-pairs across 72 categories. An earlier codification of the coupling space (the superseded API-methodology pilot, which enumerated a 63-category, 1,953-pair grid) survives only in the back-of-house lineage record for auditability; it is not a prior corpus and not a denominator the census prose carries. The census 72-category, 680-realised-pair result is the result reported throughout Section 8.33.²³

8.17 Sealing and provenance

The census is sealed at two layers. The substrate seal records the census content (the 745 plans, their categorical and topological statistics, and the lineage blocks) as the canonical base for Sections 8.19 to 8.42. The stability-event seal overlays it without changing it, recording two consecutive verdict-loop exits: one testing whether the census topology requires any amendment to the topological handoff contract, the other testing whether the schema-revision flags or anti-patterns drift its text. Both returned upholding verdicts across all six clauses of the contract.²⁴ Provenance is preserved at three levels: every plan traces to its source-listing URL and its extraction-run identifier; every wave’s signed fourteen-artefact closure bundle is validated; and the merged canonical record is byte-identical to the input on which the Section 8.33 and Section 8.42 analyses run. A successor researcher with the same source images and harness could re-execute the pipeline from the reproduction notes and recover the v5.0 census numbers within the documented agent-manual envelope.²⁵

8.18 Residual scope-limit declarations

Four of the chapter’s seven scope limits bind the corpus constructed here. SL-02 confines the floor-plan corpus to Australian listing-site sources in a 2025 snapshot. SL-05 records that two of the cross-pipeline coherence probes are untestable under the current data, since room edge-run lengths are absent from the schema and port-attachment is unimplemented in the v2 generator. SL-06 records the coupling-matrix sensitivity at the weight-100 hard/soft boundary, declared on the census coupling: the boundary is a defensible but discretionary cut, and a plan-level census cannot adjudicate it against an external referent. SL-07 records that inter-rater reliability for the agent-manual pipeline is structurally self-asserted rather than externally corroborated: no Cohen’s κ or Krippendorff’s α has been reported across an independent re-extraction subset, and we declare that absence explicitly rather than implying a reliability the corpus has not measured. The remaining three limits (SL-01 on the dimensional corpus, SL-03 on the configurational enumeration, and SL-04 on the frequency-threshold derivation) bind other pipelines and are treated in their respective sections; all seven are declared in full in Section 8.63.²⁶ The chapter now turns to the four evidence streams the corpus supports: dimensional (Section 8.19), occurrence (Section 8.26), topological (Section 8.33), and configurational (Section 8.42).

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence (this section)
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.20 The dimensional question: a metric grid and its standards

The dimensional question is inductive: it asks what dimensional grid, if any, the Australian residential product market actually exhibits, and only then how that grid relates to the standards that might explain it. Modular co-ordination is a long tradition with two routes to module selection: a deductive route that derives the module from a proportional system, and an inductive route that extracts it from observed production.²⁷ We take the inductive route deliberately and without a normative prior: the candidate range is the full integer span m ∈ [2, 300 mm] with no lower bound set to any standard’s value, because anchoring the search at a standard’s sub-module would build the conclusion into the method, the error an earlier analysis of this corpus committed when it fixed the floor at the ISO M/4 value of 25 mm and then recovered 25 mm. The standards enter afterwards, as candidate explanations of whatever grid the data show, not as constraints on the search.

ISO 2848:1984 is the most prominent such standard: it fixes the basic module M = 100 mm and recognises sub-modules at M/2, M/4, and M/8 (50 mm, 25 mm, and 12.5 mm) for components and fittings whose dimensions cannot be expressed in multiples of the basic module.²⁸ It is not the only referent a metric grid in this market could express. AS 1684 specifies 25 mm and 50 mm dimensional increments for stud sections and embeds a 300 mm and 600 mm rhythm through framing at 600 mm centres;²⁹ and metricated-imperial sheet and lumber sizing supplies the same coarse values by a different route: 2,400 mm ≈ 8 ft, 1,200 mm ≈ 4 ft, 600 mm ≈ 2 ft. These referents are mutually consistent at the values they share, so a market grid that matches them cannot, by itself, single out which one explains it. The analysis can establish what grid the market exhibits; it cannot establish that any one standard caused it.

The Bunnings corpus is the retail expression of this standards-shaped industry rather than a random cross-section of manufactured goods, and the analysis reads it as a census of the listings as captured (Section 8.10), descriptively and without inferential generalisation to the market beyond it.

8.21 The module sweep and the 50 mm result

The dimensional measurement is rebuilt from the raw catalogue rather than inherited. Each product’s title is parsed deterministically for stated dimensions (no value is inferred by a language model on a per-row basis) and the ambiguous cases (mixed units, imperial marks, pack counts, gauge-and-length fasteners) are resolved by a governed arbitration whose rules were fixed in advance and validated against a held-out sample; the instrument passed its conformance audit at a measured precision of 1.000 with a 0.0 per cent unit-error rate on the random stratum.³⁰ Of 49,300 products parsed, 25,488, about 52 per cent, state a dimension in the title; after arbitration and cleaning, 23,048 products contributing 40,342 dimension values enter the analysis. Coverage is left sparse and honest: the 48 per cent of products that state no title dimension are absent rather than imputed. Because a dwelling-relevant claim must not be inflated by furniture, appliances, and consumables, the corpus is partitioned a priori into a building cohort (24,539 values) and a non-building cohort (15,803 values), and every module result is reported for each cohort separately.

A base module is scored not by raw coverage but by its lift: the gain in exact-divisibility coverage over a null that models the mere convention of reporting dimensions rounded to the nearest 5 mm. The null is what the earlier analysis lacked. Raw coverage is a degenerate measure: it rises monotonically as the grain shrinks, so an un-baselined sweep trivially selects the smallest candidate, here m = 2 mm, and certifies nothing. Lift over the rounded-to-5 null removes the part of any module’s coverage that reporting convention alone explains, leaving the part that reflects a genuine dimensional grid; on this measure the result is unambiguous, and it is not 25 mm. The strongest single base module is 50 mm. It carries the highest lift in the pooled and non-building cohorts. In the building cohort at exact match it is second, by 0.002, to m = 4 mm (0.339 against 0.341), the only candidate anywhere in the sweep to out-lift it, and it leads the building cohort at every non-zero matching tolerance (Section 8.22), where the 4 mm reading does not survive. 100 mm is next; 25 mm is a real but secondary sub-module:

Base module m (mm)	Building lift	Non-building lift	Pooled lift
50	0.339	0.537	0.417
100	0.305	0.466	0.368
25	0.283	0.466	0.354
20	0.280	0.379	0.319
10	0.174	0.330	0.235
5	−0.198	−0.114	−0.165

50 mm dominates 25 mm in every cohort; 100 mm out-lifts 25 mm in the building and pooled cohorts and is level with it in the non-building cohort (0.466 each). The fine grains do not merely fail to win; 5 mm carries negative lift, meaning the values are less divisible by 5 mm than pure round-number reporting predicts, because they concentrate on 50, 100, and 300 mm multiples rather than on a fine 5 mm grain. The finding is therefore a 50 mm dominant grain with 25 mm a genuine sub-module beneath it, not a 25 mm base module.

fig_dimension_module_lift_by_cohort — **Base-module lift over the reporting-convention null, by cohort** Lift in exact-divisibility coverage over a rounded-to-nearest-5 mm null for each candidate base module, plotted separately for the building cohort (24,539 values), the non-building cohort (15,803 values), and the pooled corpus (40,342 values), across the search range m ∈ [2, 300] mm with no normative floor. The 50 mm candidate carries the highest lift in the pooled and non-building cohorts, and the highest lift of any standard grain in the building cohort, where at exact match the fine divisor m = 4 mm edges it by 0.002; 100 mm is second; 25 mm is a secondary sub-module, level with 100 mm only in the non-building cohort. The 5 mm candidate sits below zero, confirming that the corpus concentrates on a 50 to 100 mm grid rather than a fine sub-gridSource: dimensional-corpus data bundle, `module_curves_by_cohort.csv`.

The vocabulary that produces this grid is the standard metric sheet-and-door range. The most frequent values across the corpus are 600, 1,200, 900, 2,400, 450, 100, 50, 1,800, and 2,040 mm: panel, sheet, stud-spacing, and door dimensions, with 2,040 mm the standard Australian door height and 820 mm the standard door width both prominent in the building cohort. These are multiples of the 100 mm ISO basic module, but they are equally the AS 1684 framing grid and the metricated-imperial sheet sizes, so the grid is consistent with ISO 2848 without being unique to it; a residue of genuinely imperial-marked values confirms a metric-imperial hybrid substrate rather than a pure ISO one.

fig_dimension_top_values_building — **Most frequent dimension values in the building cohort** The most frequently stated dimension values in the building cohort, in millimetres. The distribution concentrates on standard sheet, panel, stud-spacing, and door sizes: 600, 1,200, 900, 2,400, 450, 2,040 (door height), 820 (door width), each a multiple of the 50 mm grain and most also of 100 mmSource: dimensional-corpus data bundle, `value_frequency_by_cohort.csv` (building cohort).

The relationship to ISO 2848 is accordingly one of consistency, not adoption. The corpus exhibits a metric grid whose dominant grain is 50 mm with 25 mm beneath it; ISO 2848 is one of several mutually consistent explanations for that grid, alongside AS 1684 and metricated-imperial sizing, and the census cannot single out one as causal. What the corpus does establish, and what the earlier 25 mm reading obscured, is that the market’s effective component grain is coarser than the finest sub-module the standard permits: products cluster at 50 mm, not at the 25 mm or 12.5 mm ISO also allows.

8.22 Robustness: cohort boundary and matching tolerance

The 50 mm result could in principle be an artefact of where the building/non-building boundary is drawn, so the partition is stress-tested under three cuts: a strict cut (structural shell and openings only), the default cut (adding fixed fit-out and finishes), and a broad cut (adding fixings and hardware). 50 mm and 100 mm out-lift 25 mm in every cut, so the ordering does not depend on the cohort boundary. The strict cut also supplies a face-validity check: its most frequent values are 1,200, 2,040, 40, 820, 35, and 2,400 mm: a clean structure-and-openings signature of sheet widths, the standard door height and width, and stud thicknesses, with furniture, appliances, and tools cleanly excluded.

A second check varies the matching tolerance rather than the cohort. Coverage was recomputed at an exact match, at ±1 mm, and at a proportional ±0.2 per cent tolerance, each against uniform, rounded-to-5, and rounded-to-10 nulls. 50 mm or 100 mm carries the highest lift at every non-zero tolerance in every cohort, including the building cohort, where at exact match alone the fine divisor m = 4 mm out-lifts 50 mm by 0.002 and then does not survive any tolerance at all. The test discriminates between the candidates in two respects worth stating plainly. The first is that same fragility, which is not confined to 4 mm: under the proportional ±0.2 per cent tolerance the lift at 25 mm collapses toward zero, in the building cohort from 0.282 to 0.001, while the lift at 50 mm and 100 mm holds. The 50 to 100 mm grid is robust to how forgivingly a dimension is matched; the 25 mm reading is fragile to it. We report these as plain descriptive recurrences over the complete corpus, not as interval estimates or resampling distributions: the catalogue is a census of the listings as captured, not a probability sample, and carries no sampling distribution to summarise.

fig_dimension_tolerance_bands_building — **Module lift across matching tolerances, building cohort** Lift over the rounded-to-5 null for the leading candidate base modules in the building cohort, recomputed at an exact match, ±1 mm, and a proportional ±0.2 per cent tolerance. The 50 mm and 100 mm bands hold their lift across tolerances; the 25 mm band collapses under the proportional tolerance, marking it as the fragile candidateSource: dimensional-corpus data bundle, `tau_curves.csv` (building cohort).

8.23 Where the grid holds: structure across product tiers

A grain that co-ordinated only one kind of product would not support the multi-product assembly that modular co-ordination exists to enable, so the module analysis is also run separately within each of nine a-priori product tiers. 50 mm is the dominant module by lift in seven of the nine tiers. The structural shell and the fixed fit-out, the tiers whose co-ordination with the building grid matters most, sit on a clean 50 mm grid, with lift at 50 mm of roughly 0.41 to 0.49; the thin finishes tier is 100 mm-modular. The instructive exception is the openings tier, the least metric-modular of the building tiers (lift at 50 mm near 0.10): doors do not follow the 50 to 100 mm grid but a small set of bespoke standard sizes (2,040 and 2,340 mm heights, 820 mm width) which is itself a finding, since it locates the one building class where a fixed-size catalogue, rather than a grid, governs co-ordination. Fixings are only weakly modular, as their gauge-and-length sizing would predict. The pattern is coherent: the 50 mm grain is a property of the dimensioned building fabric, concentrated exactly in the tiers where dimensional co-ordination does structural work and absent exactly where catalogue-standard sizing takes over. The three thinnest tiers, each under 1,500 values, are read descriptively only.

fig_dimension_per_tier_lift — **Module lift by product tier** Lift over the rounded-to-5 null for the leading base modules within each of the nine a-priori product tiers, shown as small multiples. The 50 mm grain dominates the structural-shell and fixed-fit-out tiers; the finishes tier favours 100 mm; the openings tier is weakly modular, its co-ordination governed instead by bespoke standard door sizes. The three thinnest tiers are read descriptively onlySource: dimensional-corpus data bundle, `per_tier.csv`.

8.24 The micro grain and the three-tier grid

The product analysis fixes one tier of the dimensional system: the micro grain at which components and fittings are dimensioned. That grain is 50 mm, the dominant module the corpus exhibits, with 25 mm the finest sub-module beneath it, the value that survives as a real but secondary increment for the fine fittings ISO 2848 also dimensions at M/4. The micro grain is not, however, the grain at which a dwelling composes. A system that let every room take any multiple of 50 mm would still generate an unbounded vocabulary of room sizes, and two rooms drawn independently from it would rarely share an edge; the grain at which whole rooms and clusters of rooms are dimensioned so that they compose is a separate and coarser choice, the meso grain, and a third, coarser again, governs the planning of the envelope, the macro multimodule. The substrate fixes all three: 50 mm micro, 150 mm meso, 300 mm macro, with 600 mm and the larger ISO multimodules as macro multiples. The three nest cleanly: 150 mm is three times the micro grain and 300 mm is six times it, so a component dimensioned at the micro grain, a room at the meso grain, and an envelope at the macro grain are mutually commensurate by construction.

The meso grain is the load-bearing one for the dwelling, and the case for fixing it at 150 mm is not made on the product catalogue (where, as Section 8.21 established, coverage always rewards the finer grain) but on the floor-plan geometry. 150 mm is the coarsest grid at which the room dimensions the national stock actually uses still compose without combinatorial misfit, within a build tolerance, and that argument is made on the geometry census at the integration of Section 8.50, from which the three-tier grid is carried forward into the generator of Chapter 9. The product corpus establishes only the micro grain and its sub-module; the meso and macro grains are geometry’s contribution, and the chapter assembles all three at integration.

8.25 Handoff to the frequency question

Dimensional analysis establishes three things that the sections to follow take as given: that the product market’s component grain is 50 mm, with 25 mm a secondary sub-module, on a metric grid consistent with, but not unique to, ISO 2848; that this grain is a corpus-wide property, holding across cohort boundaries, across matching tolerances, and in seven of nine product tiers; and that it is read descriptively over a census, with no inferential generalisation beyond the captured listings. Together these fix the metric grain of the system. They leave open the complementary question of which specific dimension values the grain must produce; a library of all multiples of 50 mm is unboundedly large, and knowing the grain does not specify its vocabulary. That the full multiple-of-50 library is unbounded is the dimensionality problem of Sections 2.6 and 2.7 in miniature: fixing the grain removes the millimetre-scale freedom but leaves a still-unmanageable enumeration, so a second enabling constraint, pruning to the sizes that recur often enough to be worth standardising, is required before the system is navigable. That second pruning is what the occurrence pipeline of Section 8.26 constructs, moving from the grain of the dimensional system to the specific sizes that grain must produce.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence (this section)
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.27 What the occurrence pipeline measures

The occurrence pipeline asks the question the dimensional pipeline cannot: which functional spaces actually appear in Australian residential plans, in what relative quantities, and with what corpus-wide stability. Where Section 8.19 worked at the level of millimetre-grain product dimensions, the present section works at the level of the labelled room or zone, the unit at which clients articulate a brief, examiners read a plan, and the National Construction Code partitions space. The pipeline takes the 745-plan merged floor-plan corpus, enumerates every space under the controlled taxonomy fixed in Chapter 6, Section 6.2, aggregates counts at the per-plan, per-stratum, and corpus-wide scales, and outputs a frequency distribution that is itself the substrate the threshold derivation of Section 8.30 operationalises.³¹

8.28 The 72-category frequency distribution

The corpus enumerates 14,554 individual spaces across the 745 plans, distributed over 72 distinct space categories, at a mean of 19.54 spaces per plan; the per-stratum split (19.75 in the October delta, 18.83 in the August stratum) is close enough to confirm that the merge introduces no structural shift in plan complexity. The count applies the room-grain convention under which a built-in robe is recorded as a fixture of its bedroom rather than as a separately enumerated zone, so the 1,403 built-in robes the corpus records are carried as fixtures of their parent spaces and are not counted here; a walk-in robe, which is an entered compartment, remains a space.³² The distribution is markedly heavy-tailed. The top ten categories (secondary bedroom, bathroom, living, kitchen, hallway, laundry, ensuite, garage, foyer, and dining) together account for 8,882 of the 14,554 spaces, or 61.0 per cent of the corpus, while the remaining 5,672 spaces spread across 62 long-tail categories down to nine singletons.³³

fig_frequency_distribution_histogram — **Space-category frequency distribution (745-plan corpus)** The space-category frequency distribution across the 745-plan merged corpus (14,554 spaces under the room-grain convention, 72 categories). The distribution is markedly heavy-tailed: a small core of categories appears in nearly every plan, an intermediate band in many but not all, and a long tail in only a handful. The three-component structure shown is descriptive of the distribution’s shape; it is not the source of the threshold values, which are set as a design decision (Section 8.30) rather than derived from a model fitSource: `space_category_frequencies_merged.csv`.

The shape is the recognisable signature of a heavy-tailed empirical distribution.³⁴ Whether the underlying process is strictly power-law, log-normal, or stretched-exponential is a question this study does not adjudicate: the goodness-of-fit machinery for that adjudication would require both a much larger sample and a different research question than the one Chapter 8 poses.³⁵ The qualitative claim is what the chapter needs, and it is stable: a small core appears in nearly every plan, an intermediate band in many, and a long tail in only a few. Its cross-stratum stability is a more demanding test than its shape, and the corpus meets it. The top ten share nine of their ten members between the August and October strata (the wardrobe holds the tenth place in the October stratum and the alfresco in the August one, while the other nine recur in both with only their internal order shifting), and the single substantive movement is the elevation of the ensuite from rank eight in the October stratum to rank two in the August one, attributable to the August stratum’s higher proportion of SDA listings (37.6 per cent against 3.7 per cent), where ensuites accompany every bedroom rather than only the master. Beyond the top ten the perturbation stays modest: the top-twenty sets share eighteen of twenty members, differing by only two categories across the boundary. This degree of set-stability holds as the count of plans scales roughly fourfold from the 173-plan August stratum to the full 745-plan census, which is what supports using the census frequencies as the canonical reference distribution.

8.29 Opening types and the topological aggregate

The opening-type distribution sits beside the category frequencies because openings are what link the categories into the access graph that Section 8.33 reads. The merged corpus contains 18,644 openings, of which 16,297 (87.4 per cent) are two-element interior or attached openings and 2,347 (12.6 per cent) one-element openings to detached exterior endpoints; the split is materially identical across the strata. The directed-edge synthesis yields 32,594 directed edges, a mean of 43.75 per graph, 680 unique adjacency patterns, and zero isolated nodes after the closure validator and largest-connected-component pass. These access-graph figures are computed over the full set of enumerated zones and retain the built-in robe as a node for adjacency purposes, even though the occurrence count of Section 8.28 carries it as a fixture under the room-grain convention; the coupling structure of Sections 8.33 and 8.39 rests on this graph and is therefore unaffected by that convention. One schema gap should be flagged here rather than discovered by the Section 8.33 reader: window openings were not preserved through the v0 extraction schema, so the opening-type distribution should be read as an interior-connectivity inventory, with aperture-bearing relations to the external envelope absent.³⁶

8.30 The frequency-threshold derivation

The threshold derivation converts the 72-row distribution into a three-class library: a required core (Tier U), a common-but-optional band (Tier C), and a rare residual (Tier R). The thresholds (required at ≥ 90 per cent cumulative coverage of corpus instances, common at the 20 to 89 per cent band, rare below 20 per cent) are the same thresholds applied in the dimensional pipeline of Section 8.19. A direct disclosure is necessary, and it is recorded as scope limit SL-04: the numerical values are not derived from the boundaries of a model fit; they are hardcoded in the analysis script and re-used here as a methodological choice.³⁷ The 90 per cent and 20 per cent lines should be read as principled but conventional choices rather than natural breakpoints in the distribution; HC-8B inherits this status and is carried to Chapter 9 at medium rather than high confidence.

Applied to the merged corpus, the cumulative-coverage operation first reaches 90 per cent at the 22nd category, so Tier U comprises the top 22 categories (secondary bedroom, bathroom, living, kitchen, hallway, laundry, ensuite, garage, foyer, dining, wardrobe, alfresco, porch, pantry, linen, toilet, cupboard, store, patio, stairs, shed, and family). Tier C runs from roughly rank 23 to rank 50, contributing about 9.4 per cent of instances in aggregate, and Tier R is the residual long tail of roughly 22 categories below 0.6 per cent, including several singletons.³⁸ The interpretation is design-theoretic rather than statistical. Tier U is the Australian residential-vocabulary core that nearly every plan instantiates; Tier C is the optional-amenity band (studies, media rooms, courtyards) whose presence signals a particular brief; Tier R is the specialty-and-residual band of rare amenities and taxonomy-churn artefacts.

8.31 Stability of the threshold partition

The stability of the Tier U / Tier C / Tier R partition rests on the cross-stratum evidence of Section 8.28 rather than on a within-corpus resampling test. A direct plans-resampling stability test was not executed within the writing-plan budget for the occurrence pipeline, and we do not present one: the partition should be read as a fixed-corpus computation, and its robustness is the robustness of the rank ordering it derives from. That robustness is real and quantified. Because the top-twenty rank ordering is preserved within a few positions as the plan count scales roughly fourfold from the 173-plan August stratum to the full 745-plan census, the rank at which cumulative coverage crosses 90 per cent can move only by a small number of positions, so the boundary between Tier U and Tier C is approximately ±2 categories around the rank-22 cut rather than fixed at rank 22 exactly. Re-running a plans-resampling check under the same protocol used for the topological pipeline would tighten this hedge, and it is registered as remediable future work for HC-8B rather than a structural blocker.

8.32 The meso-tier module synthesis

The meso-tier synthesis pulls the frequency analysis up one level, asking which functional groupings the 72 categories resolve into when read against the module taxonomy of Chapter 6. The grouping is not derived here; it aligns the occurrence read with the SDA Module Reference of Chapter 6 and re-validates it against the merged corpus. Each of the 72 categories projects onto one of nine functional groupings: bedroom (kernel BED), wetarea (kernel SAN), kitchen (KIT), living (LIV), circulation (CIR), entry (ENT), storage (within SVC), outdoor (EXT), and a parking grouping for garages and car spaces, plus four residual catch-alls. The grouping aligns with the Chapter 6 module taxonomy without reproducing it exactly: the kernel’s service module (SVC) spans the occurrence read’s storage and service-or-utility groupings, and the kernel’s dwelling-envelope module (DWL) is the whole-of-dwelling referent rather than a room-projection target, so the occurrence read surfaces a parking grouping in its place.

fig_meso_module_synthesis — **Meso-tier module synthesis** Aggregation of the 72 space categories onto the nine-module taxonomy of Chapter 6, with per-module instance totals. The nine canonical modules account for 13,612 of 14,554 enumerated spaces (93.5 per cent of the corpus, room-grain convention); the residual 6.5 per cent sits in the study, service, utility, and other catch-alls. Every plan instantiates spaces from at least seven of the nine modules, and the modal plan instantiates all nineSource: `space_category_frequencies_merged.csv`, grouped by module prefix.

The meso reading underwrites two propositions Chapter 9 inherits. The first is coverage: the nine canonical modules account for 13,612 of 14,554 spaces (93.5 per cent), with the residual concentrated in service (dominated by the laundry), study, utility, and other. The second is plan-level coverage: every plan instantiates spaces from at least seven of the nine modules, and the modal plan all nine, with low-end outliers typically apartment-format plans that omit garage and outdoor in favour of car space and balcony. Together these underwrite the typology and frequency handoffs (HC-8A’s library of room types and HC-8B’s required/optional/absent thresholds) that the Chapter 9 generator consumes. The 72 categories of this section become the row-and-column index of the coupling matrix that Section 8.33 partitions, and the heavy-tailed shape established here is what makes that matrix sparse: most category pairings are between a Tier U category and a Tier R one, where co-presence is too rare to support a coupling judgement. Sections 8.26 and 8.33 may be read as a single argument in two parts.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence (this section)
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.34 The topological question: access graphs and coupling

The occurrence evidence of Section 8.26 establishes which categories appear, and in what proportions; it does not resolve the design problem that motivates Chapter 9. A procedural generator needs more than a catalogue of rooms: it needs a specification of which rooms must be mutually adjacent, which may be, and which, by consistent empirical practice, are not. That question is categorically distinct from occurrence: it concerns the relational geometry of the access graph rather than the frequency distribution of the node catalogue. Environmental design research has a well-established tradition of treating residential organisation as a directed graph: Hillier and Hanson’s justified graph formalises the idea that spatial organisation encodes social logic, and Turner and colleagues showed that the culturally stabilised grammar of a building type is recoverable from graph structure across a large enough corpus.³⁹ The present analysis inverts the canonical question, asking what graph structure a large empirical corpus reveals instead of what behaviours a known graph structure predicts, and that inversion is directly precedented in computational design, where Merrell and colleagues derived an adjacency grammar from a corpus of plans and used it as a generative input.⁴⁰ Our target differs from Merrell’s in purpose: a deterministic set of interaction rules for the specific sub-population of SDA-eligible Australian dwellings, expressed as the HC-8C contract that Chapter 9’s generator consumes, rather than a probabilistic model of residential plans in general.

The corpus that supplies the answer is the 745-plan census introduced in Section 8.10, the complete, agent-manually extracted stock rather than a probability sample of it. Across the census the space taxonomy resolves 72 controlled-vocabulary categories joined by 32,594 directed adjacency edges, and the distribution of that adjacency mass is markedly heavy-tailed: a small subset of high-valency categories carries most of the connections. That concentration is ecologically sensible, since dwellings organise circulation through a few high-accessibility nodes, and it is the first structural fact the coupling analysis has to read.

8.35 The circulation hallway as organising hub

The node that carries the most connective load at census is the circulation hallway, the spine. It is not the most frequent room, but it is the one through which the dwelling’s other rooms reach each other, and the realised-adjacency record makes that role plain. The hallway stands at one end of the heaviest required adjacency in the whole census, bedroom-to-hallway, realised at a connection weight of 3,850 across 651 plans; and it recurs through the upper band of the required-adjacency structure: hallway-to-bathroom (weight 1,304 across 590 plans), hallway-to-living (774 across 361), hallway-to-laundry (722 across 357), hallway-to-foyer (662 across 323), and hallway-to-toilet (488 across 232). Of the 49 required adjacencies the census resolves, the hallway participates in seventeen (more than a third), so that no other category approaches its combination of connection weight, neighbour diversity, and prevalence. We read this hallway-spine centrality as a structural invariant of Australian residential circulation: the corridor is the device through which the plan keeps its private sleeping and service domains reachable without routing movement through the shared living domain, and the accessible-width circulation that SDA compliance concentrates along that corridor only deepens its load-bearing role.

The entry foyer, by contrast, is a strong entry node rather than the dominant hub. It anchors the way the dwelling meets the street (foyer-to-porch is realised at weight 704 across 341 plans, foyer-to-hallway at 662 across 323, and foyer-to-living at 570 across 281), so it governs the threshold between the semi-public approach and the interior. But ranked by how often it participates in the required-adjacency structure, the foyer sits roughly tenth, well below the hallway: its work is the entry transition, not the internal distribution of movement. The distinction matters for Chapter 9, because a generator that treated the foyer as the dwelling’s circulation centre would mis-place the spine the census actually exhibits; the corridor is the organising element, and the foyer is the gateway feeding into it.

8.36 The coupling partition: required and preferred adjacencies

With 72 categories the undirected pair space is large, and not every pair can be classified with equal authority. Pairs whose two categories co-occur in fewer than 20 plans fall below the evidential floor and are designated insufficient (581 pairs); the pairs both present together in at least 20 plans constitute the evidentially supported subset on which the classification operates, using the undirected connection weight (the total directed edges between the pair, summed across plans) and a hard/soft boundary set at weight 100. The result is a two-class partition of the supported pairs: 49 hard and 50 soft, with no pair classifiable as absent or sparse at census.

fig_space_category_coupling_network — **Space-category coupling network (745-plan census)** Adjacency coupling network of the space categories resolved across the 745-plan floor-plan census. Edge colour encodes coupling class: hard (49 pairs; weight ≥ 100 and co-occurrence ≥ 20) in red; soft (50 pairs; 20 ≤ weight < 100) in blue. At the census no pair is classifiable as absent or sparse; the 581 insufficient pairs (co-occurrence < 20) are omitted for want of joint evidence. Node size is proportional to degree, with the circulation hallway the highest-load node; category labels are shown for the higher-degree nodes. The weight-100 hard/soft boundary and the co-occurrence-20 floor are design decisions whose sensitivity is declared as scope limit SL-06Source: `coupling_class_summary.csv` and `coupling_matrix_realised_pairs.csv` (745-plan census).

fig_space_category_connectivity_heatmap — **Space-category connectivity frequency heatmap** Matrix heatmap of space-category adjacency coupling across the 745-plan floor-plan census. Rows and columns are the 72 space categories; cell colour intensity indicates the proportion of plans in which the row and column categories share a direct adjacency. The diagonal is suppressed. Cell weight is 2×(directed_AB+directed_BA) summed across the census access graphs, restricted to category pairs with sufficient joint evidence (co_occur_graphs≥20). Warm, high-weight cells identify routinely realised pairs, the hard couplings, 49 in all (weight≥100), led by the circulation hallway as the organising hub (bedroom↔︎hallway, hallway↔︎bathroom); cooler, lower-weight cells identify the 50 soft, context-dependent couplings (20≤weight<100). At the census no pair is classifiable as absent or sparse; the 581 insufficient pairs fall below the joint-evidence threshold and are not weighed. The threshold-boundary sensitivity of the hard/soft split is visible as the band of weight≈100 cells along the hard boundary, flagged threshold-sensitive (SL-06) rather than asserted as a fixed cutSource: floor-plan corpus appendix bundle (`coupling_matrix_realised_pairs.csv` and `render-coupling-census.py`).

Hard coupling (49 pairs) marks both high co-occurrence and high weight: adjacencies consistently realised wherever both categories appear; these are the required adjacencies. A weight of 100 or more across 20 or more co-occurrence plans means the adjacency is realised in essentially every dwelling where both spaces are present, and the heaviest of them are unmistakable structural staples of the Australian plan: bedroom-to-hallway (weight 3,850 across 651 plans), bedroom-to-wardrobe (3,536 across 630), hallway-to-bathroom (1,304 across 590), kitchen-to-dining (910 across 444), and kitchen-to-pantry (840 across 400). These are the spaces a plan does not leave un-joined. Soft coupling (50 pairs) marks adequate co-occurrence but moderate weight: preferred adjacencies that are common but not universal. Living-to-garage (weight 98 across 49 plans) and bedroom-to-kitchen (82 across 35) are realised often enough to register as a tendency, yet not so consistently as to read as required. The two classes are the whole of the supported structure; everything else is insufficient for want of joint evidence rather than for any positive finding about it.

The census carries no absent class, and the reason is a property of the instrument rather than a discovery about dwellings. The finalised codification records co-presence only through realised adjacency: a pair that two plans contain but never place beside each other leaves no realised edge, so it has no co-occurrence denominator and cannot be weighed against the opportunities it had to occur (this is sensitivity finding S5-3). An adjacency that is consistently avoided (bathroom-to-garage, bathroom-to-front-porch) therefore falls to insufficient at census rather than resolving as a measured zero. That avoidance signal is real, but it is a codification-development observation, not a census finding: the early codification recorded full co-presence and so could observe the avoidance directly, whereas the finalised v0 codification records realised adjacency only. We carry the signal forward descriptively, as a soft generator dispreference (a bias the generator is steered away from, not a prohibition it is forbidden to cross), and we are explicit that the census, as built, cannot re-confirm it.

fig_graph_connectivity_segregation — **Hard- and soft-constraint adjacency segregation subgraph** Segregated subgraph of the space-category coupling network at the 745-plan census. Left panel (red): the 49 hard-coupling pairs (weight≥100 ∧ co_occur_graphs≥20); these feed Chapter 9’s required-adjacency interaction rules, with the weight-100 boundary pairs flagged threshold-sensitive (SL-06). Right panel (blue): the 50 soft-coupling pairs (20≤weight<100; context-dependent), which feed Chapter 9’s preference-weighted (optional) interaction predicates. At the census no pair is classifiable as absent or sparse; the 581 insufficient pairs (co_occur_graphs<20) are set aside for want of joint evidence. A consistent adjacency-avoidance signal (categories that co-occur yet are never placed adjacent) was observed only during codification development: the early codification recorded full co-presence and so could weigh a never-realised pairing, whereas the finalised codification records realised adjacency alone, leaving the census instrument with no co-presence denominator to re-confirm it (sensitivity finding S5-3). It is therefore carried descriptively, as a soft generator dispreference rather than a regulatory prohibition; a sweep of the National Construction Code finds the avoidances cultural rather than codified, with adjacency essentially unregulated (CL-8-05)Source: `coupling_matrix_realised_pairs.csv`; `render-coupling-census.py`.

8.37 Threshold sensitivity and the census read directly

The weight-100 boundary between hard and soft is a design threshold, and the pairs whose census weight sits near it are the ones whose classification is least secure. The appropriate test of that sensitivity is not a resampling of the corpus: over a census treated as the enumerated stock for the strata it covers, a resampling distribution has no sampling parameter to estimate, as Section 4.3 sets out, and a per-pair stability proportion read from subsamples would be a statistic without a referent. The structure is therefore read off the census directly, with its boundary sensitivity stated plainly rather than smoothed by a resampling figure. The hard and soft interiors (pairs sitting well above and well below weight 100) are the secure part of the result; the handful of pairs lying within a narrow band of the weight-100 line are the part whose hard-or-soft assignment a different cut-off would move, and that margin is declared as scope limit SL-06. HC-8C is contracted on the interior (the hard class as required adjacencies and the soft class as preference weights), so the pairs at the threshold’s edge inform the contract’s softer band rather than its load-bearing core.

8.38 Are the dispreferences regulated? The NCC sweep (CL-8-05)

The avoidance observation invites a regulatory question: are these dispreferences codified, or are they cultural? We checked rather than assumed. A comprehensive sweep of NCC 2022 Volume 2, conducted on 2026-05-03 across all thirteen Sections of the Adopted Housing Provisions, examined each Section for any rule of the form “space type X must not be adjacent to space type Y” and identified exactly one such clause: Clause 10.6.3, which prohibits a sanitary compartment opening directly into a kitchen or pantry except where access is via an airlock, hallway, or other intermediary room, or where mechanical-exhaust ventilation is provided.⁴¹ Of the adjacencies the codification-development record shows as consistently avoided, only this one (bathroom-to-pantry, to the extent that the controlled-vocabulary bathroom category encloses a sanitary compartment) is even partially codified, and Clause 10.6.3 admits it under a ventilation exemption that the corpus consistently satisfies through physical separation instead. We therefore state CL-8-05 descriptively rather than as a catalogue of prohibitions: the avoidances the corpus exhibits are, with this single partial exception, matters of professional and cultural practice rather than code, because NCC Volume 2 and the SDA Standard govern accessibility outcomes and sanitary-facility provision rather than room-to-room adjacency.

The cultural-norm reading is supported by the mechanism through which the norms are transmitted. Experienced residential architects carry an implicit grammar of space organisation, acquired through practice and reinforced through client expectation and design critique, and the justified-graph tradition treats that grammar as recoverable from large plan corpora precisely because it is stable enough to leave a signal in aggregate topology.⁴² The avoidance signal the codification-development stage supplies is of exactly that kind: the early codification recorded co-presence even where adjacency was never realised, and so could see the dwelling decline a pairing it had every opportunity to make. The consequence for Chapter 9 is that the dispreference signals carried to the generator are empirically derived rather than regulatory in origin, which is appropriate given that SDA compliance does not prescribe spatial organisation beyond accessibility dimensions and sanitary-facility counts. One evidential caveat travels with them: the finalised census records adjacency only where it is realised, so the strength of any individual avoidance is bounded by what the codification-development stage could observe rather than asserted as a rule, and the signals are weighted in the generator as dispreferences rather than imposed as hard exclusions.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure (this section)
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.40 The census coupling structure and sensitivity finding S5-3

The coupling structure the census realises is the evidence the chapter carries forward. Across the 745-plan census, the classification resolves the 680 realised category-pairs into 49 hard pairs, 50 soft pairs, and 581 insufficient, over 72 space categories, with no pair classified absent or sparse. This is the positive structural result: a census-scale map of which SDA-eligible Australian space categories reliably co-locate (the 49 hard pairs, read as required adjacencies) and which commonly do (the 50 soft pairs, read as preferences). The required-adjacency interior is the secure part of the result, and it is the part HC-8C is contracted on.

The census records no avoidance class, and the honest reading of why is sensitivity finding S5-3, a property of the codification rather than a finding that the avoidances were found to occur. The finalised codification, in the form sealed at the close of piloting, records co-presence only through realised adjacency: a pair that is adjacent somewhere leaves an edge in the realised-pair matrix, while a pair that two categories share a dwelling with yet never realise as adjacent leaves no row at all. Such a never-realised pair therefore carries no co-presence denominator, cannot be weighed against the opportunities it had to occur, and falls to insufficient by construction rather than to a measurable avoidance class.⁴³ S5-3 is accordingly the recognition that the census instrument, as built, cannot weigh a never-realised adjacency, not a discovery that the dispreferred pairings became common. A forbidden-adjacency claim (that two co-present categories are never adjacent) needs a co-presence denominator the census does not carry, and we decline to assert one on missing data.

The conservative resolution follows the census methodology of Section 4.3: a census establishes what the stock realises, not what it forbids. The 49 hard pairs are a positive realisation finding the census supports directly, and they are the structure HC-8C rests on. The adjacency-avoidance signal (categories that share a dwelling yet are never placed adjacent, such as bathroom and garage) was observable to an earlier codification iteration that recorded full co-presence, but it is a piloting-stage observation of how the codification was worked out rather than a census finding; the finalised instrument cannot re-confirm it at scale. It is retained, descriptively and pilot-bounded, as a soft generator dispreference (a bias against the pairing, never a prohibition), with the open caveat that the census cannot re-weigh any individual avoidance against its realised opportunities. Where those per-pair co-presence counts would be needed to revisit a specific avoidance, that recomputation is registered as appendix follow-up rather than asserted here.

8.41 Handoff to the configurational pipeline

The topological evidence is the coupling-structure input to the configurational pipeline. The HC-8C contract encodes, at the level of category-pair relationships, the spatial grammar the census supports: the 49 hard pairs as required adjacencies (the spaces that reliably touch) and the 50 soft pairs as preference weights that inform the generator’s search without constraining it. The avoidance signals pass to the generator as soft dispreferences rather than hard stops, in keeping with their descriptive, pilot-bounded status; the generator is biased away from the dispreferred pairings but not forbidden to realise one, which is the appropriate strength for a signal the census cannot re-confirm. Section 8.42 asks whether a polyomino-based arrangement generator, taking the HC-8C required adjacencies as input constraints and the soft weights as biases, can reproduce and thereby descriptively validate these regularities by producing arrangements recognisable as SDA-eligible dwelling plans.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence (this section)
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.43 The configurational question: polyomino grammar and enumeration

The topological substrate of Section 8.33 defines the modules as typed nodes in an adjacency graph; it does not settle a question of immediate empirical consequence: how many geometrically distinct arrangements of those modules are feasible within the prescribed boundary, and do they exhibit the metric diversity needed to support design selection? Answering this requires moving from the graph-theoretic register into a discrete geometry capable of representing two-dimensional placement, orientation, and contact: the theory of polyominoes. A polyomino is a plane figure formed by joining unit squares edge-to-edge under 4-neighbour adjacency, where two cells are connected only when they share a complete edge.⁴⁴ Golomb’s classification distinguishes free polyominoes (identified up to rotation and reflection), one-sided (up to rotation only), and fixed (each orientation distinct); for the four-cell case these yield 5, 7, and 19 distinct shapes, which shows how strongly the choice of symmetry group determines the size of the combinatorial space. The Plans Generator’s problem is structurally analogous but not identical: its tiles are not anonymous cells but labelled functional modules, denoted T1, T2, T3, T4, and the symmetry group therefore acts on labelled configurations rather than on the anonymous shapes of the free-polyomino classification, a distinction between labelled and unlabelled combinatorics that is central to interpreting the cardinality claims that follow.

The arrangement space is finite but super-polynomially large: Klarner’s analysis established that the number of n-cell polyominoes grows exponentially, with a growth constant whose modern numerical estimate is near 4.06, and Redelmeier’s cell-addition algorithm showed that systematic enumeration is feasible only with aggressive pruning and symmetry-based deduplication.⁴⁵ The generator’s two-stage pipeline applies structurally analogous strategies: it constrains the boundary to an 8×12 unit canvas, enforces 4-neighbour adjacency as the feasibility predicate, and applies dihedral-orbit deduplication to identify canonical representatives. The result is a governed, reproducible enumeration whose output is credible evidence about the structural diversity of the arrangement space, even under the cap-bounded scope condition of Section 8.48.

8.44 The feasibility rules and the symmetry group

The Plans-Generator Specification states the feasibility rules in three layers. The domain constraint requires that all configurations be contained within the 8×12 canvas with no two cells sharing a position: the non-overlap and containment predicate screens candidate placements before adjacency evaluation. The adjacency predicate requires that any two modules declared neighbours in the topology share at least one complete unit edge, with attachment constrained to edge-to-edge contact with opposing outward normals, so that modules abut flush rather than overlap or float. The canonicalisation rule determines which configurations are equivalent. The Specification labels this “D4 canonicalisation,” the conventional shorthand for the 8-element dihedral symmetry group of the square (four rotations and four reflections), and the run log records the same group under the alternative label “D8.”

fig_polyomino_d4_canonicalisation — **The dihedral symmetry group and canonical form** The 8-element dihedral symmetry group of the square applied to a representative polyomino: four rotations (0°, 90°, 180°, 270°) and four reflections yield eight orientations. The canonical form (the lexicographically smallest normalised coordinate tuple under the group action) is highlighted; the seven non-canonical orientations are struck through. The packing generator deduplicates with the full group; the arrangement generator applies its four-rotation subgroup only (mirror transforms disabled in the stored source)Source: `_Specification/Specification.md`.

The two labels denote the same mathematical object, not different ones: the generator’s “D8” is the same 8-element group (four rotations plus four reflections) that standard notation calls D4, and it should not be confused with the 16-element dihedral group of the regular octagon. The distinction matters, because had the 16-element group been applied the deduplication would have been more aggressive and the retained count smaller; the 8-element group is the correct one for a square-grid problem whose boundary introduces a preferred long axis. Applying Burnside’s orbit-counting lemma to this group yields the basis on which the canonical representative of each equivalence class is selected.⁴⁶ One implementation caveat attaches, and the generator-output statistics table records it: the stored arrangement source applies only the rotation subgroup of this group (its mirror transforms are disabled, and its internal flag names that four-rotation subgroup “D4”), so arrangement deduplication is rotation-canonical, one-sided in Golomb’s classification, while the packing generator carries the full eight-element group under its “D8” flag. The caveat in fact extends to both deposited runs: the packing register (packing-permutations-2511041248.json) records mirrors_enabled: false exactly as the arrangement register does, so each run’s deduplication is rotation-canonical as executed, one-sided in Golomb’s classification. The retained sets are nonetheless distinct under the full group in both cases, which is the property the claim needs: no two of the 50 retained arrangements are mirror-equivalent, and no two of the 100 retained packings are either. Rotation-only deduplication can only ever retain more configurations than the full group would, never fewer, so a mirror-free retained set is canonical under the full group whatever flag produced it; both sets are mirror-free on inspection. Together the three layers constitute a shape grammar in the sense of Stiny and Gips, deliberately restricted to structural placement rules; the significance of that restriction is taken up in Section 8.48.⁴⁷

8.45 Arrangements, packings, and the deduplication result

The arrangement generator traverses placement combinations for the four module types over the canvas under the four-rotation deduplication, stopping when the count of unique canonical arrangements reaches a cap of 50; the run reached that cap, producing 50 unique arrangements as a cap-bounded count rather than a complete enumeration of the feasible space (the cap is recorded as a permanent scope-limit at Section 8.63). These are labelled arrangements: T1, T2, T3, and T4 carry distinct functional identities, so an arrangement with T1 north and T2 south is combinatorially distinct from its exchange, even where the assembled outline is identical. This is Golomb’s fixed-versus-free distinction applied to a labelled tile set, and it is why the 50-arrangement count is structurally larger than the free-tetromino count of 5: the same cells, differently labelled, constitute a different configuration.

polyomino_arrangement_examples — **Representative arrangement permutations from the generator** A twelve-panel (4×3) sample drawn from the 50 rotation-canonical unique arrangements produced by the arrangement generator (v1; Minimal Input configuration, four-atomic-item input set). Each panel shows one canonical arrangement of the polyomino modules; rotation-canonical deduplication under the four-rotation subgroup (mirror transforms disabled in the stored source) ensures all 50 outputs are distinct up to rotation. The full set of 50 arrangements is available in the appendix data bundle (`appendix-data/ch8-configurational-bundle/`). Source images: `251026_Arrangement_Permutation_Generator.py` rendered JPG outputs.

The packing generator applies deduplication across the full 8-element group and produced 100 unique packings from a feasibility filter that explored 132 placement combinations. The filter works in two passes: the adjacency and non-overlap predicates retain 129 of the 132 combinations (a 97.73 per cent survival rate, the three rejections being geometrically infeasible placements), and the dihedral-orbit deduplication retains 100 of the 129 as canonical representatives (a 22.48 per cent reduction). The reduction rate is consistent with the module set’s structure: T1 and T4 share a shape class, as do T2 and T3, and these pairings produce systematic cross-permutation symmetries that the orbit identifies and collapses, providing empirical confirmation, through Burnside’s lemma, that the deduplication functions as specified, neither over-collapsing (which would signal a canonicalisation error) nor under-collapsing (which would signal an orbit-computation fault).

polyomino_packing_examples — **Representative packing permutations showing port and attachment semantics** A nine-panel (3×3) sample drawn from the 100 canonical unique packings produced by the packing generator (v2; 8×12 boundary, four-atomic-item input set), illustrating port and attachment semantics. Each panel shows one unique canonical packing under deduplication by the eight-element dihedral group (four rotations and four reflections), annotated with its Corner, Enclosure, and Bounding Efficiency Index values (CEI, EEI, BEI) from the canonical run statistics (CEI mean = 0.9353, SD = 0.011; EEI mean = 0.5648, SD = 0.049; BEI mean = 0.6078, SD = 0.058). The full output of 100 packings is available in the appendix data bundle (`appendix-data/ch8-configurational-bundle/`). Source images: `251111_Packing_Permutation_Generator.py` rendered JPG outputs.

8.46 The three geometric indices

The packing generator computes three structural geometric indices per packing, defined in the generator source. The Corner Efficiency Index (CEI) measures the count of corner vertices relative to its bounds, so that a higher value indicates a smoother silhouette; the Enclosure Efficiency Index (EEI) is a normalised isoperimetric quotient, 16·area/perimeter², so that a higher value indicates a more compact form; and the Bounding Efficiency Index (BEI) is the ratio of filled area to bounding-box area, so that a higher value indicates less bounding-box waste. The summary across the 100 packings is reported below.

Index	n	Min	Max	Mean	SD
CEI (Corner)	100	0.9206	0.9706	0.9353	0.011
EEI (Enclosure)	100	0.4672	0.6900	0.5648	0.049
BEI (Bounding)	100	0.4679	0.7292	0.6078	0.058

The CEI distribution is near-ceiling with very low variance, which is not a metric failure but a structurally predictable consequence of canonical deduplication: by selecting the lexicographically minimal representative of each orbit, the procedure systematically favours compact configurations, which have fewer equivalent representations and so are more likely to be retained. The EEI and BEI distributions span roughly a fifth of their possible ranges (where meaningful configuration-to-configuration variation lives) and so provide design-selection discrimination that CEI does not. The three indices co-vary moderately: the corner-and-enclosure pair (CEI-EEI, r = 0.578) and the enclosure-and-bounding pair (EEI-BEI, r = 0.597) move together, while corner efficiency is only weakly tied to bounding-box tightness (CEI-BEI, r = 0.330). These coefficients are reported descriptively across the 100-packing set; we attach no interval estimate or significance test, because the packings are the deterministic output of an exact generator (a confirmed lower bound over the explored ordering subset, per the scope-limit at Section 8.63), not a probability sample drawn through a random process, so a resampling distribution or confidence interval computed on them would have no referent. The weak third pairing is precisely the evidence that BEI and CEI together carry genuinely complementary information: a configuration may fill its bounding box efficiently while presenting an irregular perimeter, or the reverse. That partial decoupling justifies retaining all three indices rather than reducing them to a single composite.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar (this section)
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.48 The partial grammar and the superset-language framing (CL-8-06)

The v2 Plans Generator applies a proper subset of the full RecPol shape grammar. The grammar audit of 2026-04-24 confirmed the implemented rules (symmetry-orbit canonicalisation, with the full dihedral orbit in the packing generator and the rotation orbit in the stored arrangement source; 4-neighbour adjacency evaluation; edge-run attachment with opposing outward normals; and the containment and non-overlap predicates) and the rules not implemented: composite-join semantics, port-attachment expressions, and reference-modifier rules. In Stiny’s shape-grammar formalism the missing rules are primarily restrictive: they would eliminate configurations the current grammar accepts rather than generate new ones.⁴⁸ Because they are restrictive rather than generative, the v2 grammar is underconstrained relative to the full Specification; it generates a superset of the configurations the fully constrained grammar would produce. This is the correct direction of constraint error for the present claim, and it rests on a standard result in formal-language theory: a grammar that omits restrictive rules produces a language that is a superset of the target language.⁴⁹

The consequence for the cardinality claim is constructive rather than damaging. Because the v2 output is a superset of the fully constrained output, every packing that would survive the full grammar is present in the 100-packing output, so the 100 canonical packings are a confirmed lower bound on the number of configurations the full grammar can produce: at minimum, 100 valid packings exist. The limitation, declared as CL-8-06 (declared-limited), is that the count may overstate the configurations that survive full-grammar evaluation, since some of the 100 may be eliminated by the missing restrictive rules. This is acknowledged and recorded, but it does not undermine the substrate’s empirical value: the research question requires “at least N” confirmable configurations, not an exhaustive cardinality count, and the superset framing supplies exactly that assurance. The cap-bounded nature of the exploration supports rather than contradicts this reading. The arrangement run reached its cap of 50 and the packing run its cap of 100 before exhausting the permutation space (the packing run covered only two of twenty-four orderings), so the appropriate interpretive claim is that the generator confirms at least 100 canonical packings within the boundary under the implemented grammar. That is a scope statement about the extent of enumeration, not a weakness in the evidence: Klarner’s growth bound implies the full space is vastly larger than 100, and the cap was chosen to provide a tractable, sufficient lower-bound confirmation rather than to characterise the whole space. The unexplored remainder is declared as scope limit SL-03 in Section 8.63.

8.49 Handoff to integration (HC-8D)

The configurational evidence is the generator-interface that the Chapter 9 pipeline consumes, packaged as the handoff contract HC-8D in three components: the 100 canonical configurations (the combinatorial substrate over which the generator operates), the edge-run contact predicates derived from the 4-neighbour and outward-normal rules (the attachment grammar that lets the generator assemble configurations incrementally rather than evaluating all placements at once), and the three-index evaluation layer (the ranking protocol by which it orders candidate configurations).

substrate_to_generator_handoff — **The four handoff contracts from the empirical substrate to the generator** The four handoff contracts from the empirical substrate (Chapter 8) to the generator (procedural generation, Chapter 9). HC-8A: dimensional evidence → a 50 to 100 mm base grid (50 mm dominant, 100 mm next, 25 mm a secondary sub-module), recovered over m ∈ [2, 300] mm with no normative floor and consistent with ISO 2848 among other referents. HC-8B (closed): occurrence evidence → required/optional frequency thresholds (hardcoded; SL-04). HC-8C: topological evidence → the census required-adjacency structure of 49 hard pairs (required adjacencies) and 50 soft pairs (preference weights), with legacy avoidance signals carried as soft dispreferences (SL-06 boundary margin). HC-8D: configurational evidence → the canonical polyomino space under 4-neighbour adjacency and edge-run attachment, partially realised in the v2 generator (composite joins, port-attachment, and reference modifiers specified but not yet parsed; 100-packing cap, SL-03)Source: `ch8-handoff-contracts-v0.md`.

Integrating these three components with the topological evidence of Section 8.33 establishes the empirical substrate of Chapter 8. Section 8.50 draws the full substrate together, characterises its logical structure, confirms its internal consistency, and states the interface conditions under which Chapter 9 may draw on it. The residual cap-bounded exploration (SL-03) is a scope condition Chapter 9 inherits without prejudice: the generator operates on the 100 confirmed packings as its starting population and may extend that population incrementally if evaluation reveals the lower bound to be insufficient for the generation objectives that chapter specifies.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration (this section)
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion

8.51 The integration stance: triangulation, not redundancy

The four pipelines of Sections 8.19 to 8.42 were designed, as Section 8.5 set out, as four independent triangulations onto one substrate-level claim: that the Australian residential dwelling, treated as a representational object, exposes a regular and stable interface against which a downstream generator can be built without re-discovering the regularities for itself. Each pipeline addresses a different face of that claim, using a different corpus, method, and instrument, and the joint commitment of all four (that they produce mutually coherent evidence when audited against one another) is methodological strength rather than redundancy. A single-pipeline argument would stake the substrate-truth claim on whatever instrument that one pipeline uses; a four-pipeline argument that reports its cross-pipeline coherence verdicts, and its scope-limit declarations where coherence is untestable, stakes the claim against four instruments at once and reports honestly where the audit is complete and where it is not. This section synthesises the verdict-loop record, the lineage dossier, the claim bank, the handoff-contracts ledger, and the scope-limits register into a single integration statement and forwards them to Chapter 9.

8.52 Cross-pipeline coherence: the six probes

Six pairwise probes audit the pipelines along the natural cross-pipeline axes, and their verdicts are recorded in the claim bank under CL-8-08.⁵⁰ Two return COHERENT. P-DT-1 tested whether the product market’s 50 mm base grain is corroborated by floor-plan room dimensions: about 89 per cent of printed room dimensions in the topological corpus sit on a 50 mm grid (the same share as on a 25 mm grid, since only four printed values are multiples of 25 but not 50), which is the substrate confirmation that licenses HC-8A. P-DO-1 tested whether the top dimension families (600, 900, 1,200, 2,400 mm) align with the ISO 2848 preferred series: they do exactly, at 6M, 9M, 12M, and 24M, though, being equally AS 1684 and metricated-imperial values, the alignment is read as consistency rather than as evidence of ISO in particular. Two return PARTIALLY COHERENT, with the partiality declared rather than absorbed. P-DT-2 found that the topological corpus’s room dimensions align with the ISO multimodules at only 24.8 per cent density, well below the dimensional corpus’s near-uniform discipline; an honest reading is that retail product geometries are governed by manufacturer-side modular discipline more strictly than as-built room geometries, which carry cumulative tolerance and post-construction modification. P-DC-1 found that the configurational unit cell does not map cleanly onto the millimetre-scale module, because the v2 generator’s unit cell is dimensionless and the mapping is verbal rather than parsed by the generator code.

Two probes return UNTESTABLE, and this is where the integration argument is most honest. P-TC-1 (whether the corpus’s per-room edge-run length data corroborate the configurational edge-run attachment grammar) cannot be tested, because room-edge-length data is absent from the normalised corpus, which records categorical adjacencies rather than metric edge runs. P-TC-2 (whether the port-attachment grammar produces the same forbidden-pair set as the topological pipeline) cannot be tested, because port-attachment is specified but unimplemented in the v2 generator, so no port-driven enumeration exists to compare. Both are declared as scope limit SL-05, and the topological-by-configurational coherence dimension is therefore acknowledged as unverified; Chapter 9 inherits this gap as an open empirical question, not a silent omission. The aggregate finding is that no probe returns INCOHERENT: the four testable probes return COHERENT or PARTIALLY COHERENT, the partial verdicts mapping to specific named substrate facts, and the two untestable probes are declared, scoped, and referred forward. CL-8-08 is therefore declared-limited at medium confidence rather than supported at high.

The chapter’s substantive contribution is one architectonic claim, instantiated in ten numbered claims that map to their pipeline origins rather than standing as ten independent contributions. Three belong to the dimensional pipeline (CL-8-01, CL-8-02, CL-8-03: the dominant 50 mm base grain with 25 mm a secondary sub-module, the multimodule families, and the grain’s robustness across cohort cuts and matching tolerances over the unconstrained search range), two to the topological (CL-8-04, CL-8-05: the census required-adjacency structure and the descriptive observation that the pilot-stage avoidance signal is largely uncodified in the NCC), two to the configurational (CL-8-06, CL-8-07: the partial grammar and the cap-bounded enumeration), and three are cross-pipeline integration claims no single pipeline could establish (CL-8-08 coherence, CL-8-09 reproducibility, CL-8-10 bounded coverage).⁵¹ Of the ten, six register as supported at high confidence and four as declared-limited at medium-to-high with the specific bound named; none is unsupported. The declared-limited status is the chapter’s principal honesty mechanism: CL-8-04 names the weight-100 boundary-margin sensitivity of the coupling classification (SL-06), CL-8-06 the partial grammar realisation, CL-8-08 the hardcoded thresholds and untestable probes (SL-04, SL-05), and CL-8-09 the two pipelines not yet formally replayed, so that Chapter 9’s generator and the examiner alike can see exactly what the substrate supplies and what it does not.

8.54 The four handoff contracts

The four handoff contracts are the binding interface Chapter 9 consumes. HC-8A (dimensional base grid) is stable at v1: it commits the generator to a 50 mm base grain with a 25 mm sub-module, recovered over the unconstrained range m ∈ [2, 300] by lift over a reporting-convention null, consistent with ISO 2848 among other mutually-aligned referents, and with the P-DT-1 floor-plan corroboration (about 89 per cent of room dimensions on a 50 mm grid) in the evidence basis. HC-8B (frequency thresholds) is closed at v1 under the hardcoded contingency: the ≥ 90 per cent required and 20 to 89 per cent optional thresholds are methodologically chosen and validated against the dimensional corpus’s three-component structure, but hardcoded rather than derived, with SL-04 as the corresponding scope limit. HC-8C (topological interaction rules) is stable at v1 with two consecutive upholding verdicts across the corpus-expansion and schema-drift audits of 2026-05-02; it commits the generator’s interaction rules to the census required-adjacency structure of Section 8.40 (49 hard pairs as required adjacencies and 50 soft pairs as preference weights), with the pilot-stage avoidance observation carried as a soft generator dispreference, and the weight-100 boundary pairs flagged threshold-sensitive under SL-06. HC-8D (configurational search space) is stable at v1 and partially realised: symmetry-orbit canonicalisation (the rotation orbit for arrangements, the full dihedral orbit for packings), 4-neighbour adjacency, and edge-run attachment are operational, while composite-join semantics, port-attachment expressions, reference modifiers, and instance-block notation are specified but not yet parsed, and the 100-packing cap (SL-03) is permanent.

fig_generator_search_space — **Generator constraint funnel: from all polyominoes to feasible packings** Flowchart of the successive constraint filters applied in the packing generator, with cardinality annotations at each stage illustrating the feasibility-reduction ratio. Stage 1: all four-cell polyominoes (unbounded cardinality). Stage 2: rotation-canonical arrangement deduplication (four rotations; mirror transforms disabled in the stored source) yields the 50 unique arrangements (50-arrangement cap declared as SL-03). Stage 3: edge-run contact feasibility under 4-neighbour adjacency (contact constraint). Stage 4: port-attachment feasibility (composite-join semantics, reference modifiers, and instance-block notation are defined in the Specification but not yet parsed by the v2 generator; CL-8-06 declared-limited). Stage 5: dihedral-canonical deduplication under the eight-element group (four rotations and four reflections) yields the 100 unique packings. Stage 6: CEI/EEI/BEI evaluation (the optional Chapter 9 ranking layer; mean CEI = 0.9353, EEI = 0.5648, BEI = 0.6078). Final output: 50 arrangements and 100 packingsSource: `Plans Generator/_Specification/Specification.md`.

Read as one object, the funnel states the chapter’s design logic in a third register. Chapter 2 developed bounded optionality as a socio-technical target (Sections 2.6 to 2.7), and Chapter 3’s interface contracts declare degrees of freedom in the governance sense; the funnel is the same discipline in generative form. Each stage removes degrees of freedom that the substrate’s evidence forbids, and the count of surviving configurations at each stage measures how completely that evidence has been specified into the generator: enumeration proposes, constraints dispose, and what remains is the bounded option space Chapter 9 consumes. The three senses are distinct but compatible, and the generative sense is the one the cardinality annotations in the figure quantify.

8.54a The three-tier modular grid: micro, meso, macro

HC-8A fixes the base grid, but the dwelling does not compose at the base grain, and the integration argument needs the distinction the chapter has so far left implicit. Two things the word “module” runs together must be separated: the grid module (the dimensional step on which sizes are quantised) and the modules proper (the rooms and clusters of rooms that are assembled into a dwelling). The micro grain at which components and fittings are dimensioned is 50 mm, with 25 mm as the finest sub-module beneath it, established on the product corpus in Section 8.21. It is the wrong grain at which to dimension rooms: a system that lets every room take any multiple of 50 mm generates an unbounded vocabulary of distinct room sizes, and two rooms drawn independently from that vocabulary rarely share an edge, so they do not compose. The grain at which the modules themselves are dimensioned is a separate and coarser choice (the meso grid), and a third, coarser again, governs the planning of the envelope (the macro multimodule). The substrate fixes all three: 50 mm micro, 150 mm meso, 300 mm macro, with 600 mm and the larger ISO multimodules as macro multiples.

The meso grid is not selected by fit, which Section 8.21 established is a degenerate criterion that always rewards the finer grain. It is selected by inter-module composability under bounded optionality, the design logic of Sections 2.6 and 2.7: more distinct module sizes do not enlarge the space of usable dwellings; they enlarge the combinatorial space of interfaces that must be reconciled, and beyond a point that growth converts more options into more misfit. Degrees of freedom in the grain are not optionality in the dwelling. The criterion is therefore the coarsest grid that collapses the module vocabulary enough for modules to compose, without coarsening so far that it forecloses the substantive room variation the stock genuinely uses.

The floor-plan geometry census quantifies the trade-off directly.⁵² Snapping the corpus’s room dimensions to a candidate grid and counting the distinct module rectangles it produces gives the size of the module catalogue (the combinatorial vocabulary a builder, a generator, or an occupant must reason across) alongside a direct test of whether abutting rooms share a commensurate edge:

grid module	distinct room-module rectangles	effective module vocabulary	abutting rooms sharing a commensurate edge	room moved, 95th percentile
100 mm (current room grain)	1,770	349	20.2 %	30 mm
150 mm (meso module grid)	1,117	203	23.5 %	50 mm
300 mm (macro multimodule)	502	66	31.9 %	100 mm

Moving from the 100 mm room grain to the 150 mm meso grid cuts the distinct module catalogue by 37 per cent (1,770 to 1,117) and the effective module vocabulary by 42 per cent (349 to 203); because the space of possible module-to-module pairings scales as the square of the catalogue, the pairing space contracts by roughly 60 per cent. The direct adjacency test moves the same way: among the 3,655 interior adjacencies whose two rooms are both dimensioned, the share that share a commensurate edge rises from 20.2 per cent at 100 mm to 23.5 per cent at 150 mm. The stock already behaves as a kit of parts at this grain (a median plan draws its rooms from about seven distinct geometric primitives at a reuse ratio of 2.4, and ten primitives account for 91.6 per cent of all enumerated spaces), which is the regularity the meso grid governs.⁵³

The selection of 150 mm rather than a finer or coarser value is not a free judgement but the answer to a constrained optimisation, and stating it that way is what makes it defensible. Coarsening the grid always improves composability (there is no interior optimum on that axis alone, since the catalogue and the pairing space contract monotonically all the way to 600 mm), so the operative question is how far the grid may coarsen before it forecloses the substantive room variation it exists to preserve. The natural bound is a distortion tolerance: the largest displacement a room dimension may suffer, when snapped to the grid, while still reading as the same room. Read as a lossy quantiser, the grid admits a unique answer at each tolerance: the coarsest grid whose 95th-percentile room stays within it. Taken off the census, that grid is a 75 mm grain within a 25 mm tolerance, 150 mm within a 50 mm tolerance, 175 mm within 75 mm, and only within a 100 mm tolerance (a perceptible ten-centimetre shift) does the 300 mm grid become admissible. The 150 mm meso grid is therefore exactly the choice of a 50 mm distortion tolerance, and 50 mm is the principled value: it is the ISO half-module M/2, the coarsest sub-module the standard itself recognises, and it sits below the threshold at which a room reads to its occupant as a different size. Under that tolerance the optimisation is closed: 150 mm is the coarsest admissible grid, and it captures the composability gain in full. The effective module vocabulary falls to 203 from the 100 mm grain’s 349, a rate of 7.7 against 8.5 bits to name a module, for a 95th-percentile displacement of 50 mm. A 300 mm grid would name a module in 6.0 bits, but at a 100 mm displacement that discards the common room sizes the stock uses (2,900 and 3,100 mm onto 3,000 mm, 3,200 mm onto 3,300 mm, 3,500 mm onto 3,600 mm) and the 450 and 750 mm half-step family, which is the over-pruning Section 2.7 warns of: a constraint “set too coarse” contracting the option space in welfare-reducing directions.

The result is stable across the census, and its one exception is informative rather than damaging. Both strata return the same 50 mm 95th-percentile displacement at 150 mm, and the grid holds within that tolerance for every habitable module type (bedroom, living, kitchen, dwelling-other, and outdoor each at a 50 mm 95th percentile). Only the small service modules (sanitary, circulation, and entry) reach 60 mm, which is the expected signature of the tightly packed wet and circulation spaces whose internal components are dimensioned at the finer 50 mm micro grain rather than at the room meso grid. The meso grid governs the habitable room modules that make up the bulk of the dwelling; the service modules sit a tier down, exactly as the micro-meso-macro separation predicts. The full calibration battery across all candidate grids and every module type is reported in Appendix H. The 150 mm meso grid is in this sense a design prescription (the grain at which the generator dimensions its room modules), not a description of current practice, which is drawn on the finer 100 mm grain; the prescription trades a within-tolerance fidelity cost for the composability the stock’s own kit-of-parts behaviour is already reaching toward.

The meso grid carries a second payoff the micro and macro grains do not, and it can be stated as a standardisation-coverage figure. At the 150 mm meso grid a range of just 318 distinct room modules covers 80 per cent of every room in the national stock, and 525 cover 90 per cent, against 641 and 1,042 at the 100 mm grain, roughly half the catalogue for the same reach. A stocked, prefabricated, or governed module range of that size is the difference between a standardisable kit and a bespoke one, and the network-effect logic by which a shared dimensional grammar becomes cheaper to adopt than to deviate from (set out for products in the appendix grounding of Section 8.21) applies to room modules exactly as it applies to fittings. The meso grid is on this reading at once the scaffold that minimises combinatorial misfit and the standard that makes the modules buildable at scale, and it is 150 mm (equal to 1.5M and to three times the 50 mm micro grain) that does both, seated cleanly on the micro grain below it and dividing the macro multimodules above it. Chapter 9’s generator inherits the meso grid as the grain at which it dimensions and composes its room modules.

8.55 The merged-corpus coupling matrix

The integration argument is anchored at the substrate level by the cross-pipeline product no single pipeline produces in isolation: the 745-plan merged-corpus coupling matrix sealed under the v5.0 dossier. The matrix records 680 realised category-pair adjacencies across 72 categories and 32,594 directed edges; under the classification rule the realised pairs partition into 49 hard (7.2 per cent), 50 soft (7.4 per cent), and 581 insufficient (85.4 per cent), with the absent and sparse classes collapsing to zero under the expansion, corresponding to sensitivity finding S5-3 of Section 8.40.

corpus_coverage_map — **Geographic and market coverage of the merged corpus** Coverage of the merged 745-plan floor-plan corpus across Australian states and stratum types, alongside the Bunnings product corpus (40,342 dimension values across 23,048 products; national retail, one retailer, 2025 snapshot). The floor-plan corpus is Australian residential, drawn from public property listings (SL-02); the 72-category taxonomy spans the standard Australian residential space types. Both corpora are scope-limited to the Australian residential context (Section 8.63); neither is claimed to represent non-retail markets, non-residential typologies, or non-Australian jurisdictionsSource: lineage dossier v5.0.

The substrate-level reading follows the design-theoretic one of Section 8.33: hard pairs encode the required adjacencies the generator’s interaction rules must respect (the entry-foyer hub structure, comprising foyer-porch, foyer-hallway, and foyer-living, is directly visible in the merged corpus), soft pairs the optional adjacencies the generator may invoke, and the insufficient class the long tail of pairings too rare to support either inclusion or exclusion without further evidence. The threshold-boundary sensitivity that SL-06 captures is precisely the warning that the hard/soft cut and the realised/insufficient cut produce a secure interior with a more volatile margin: under HC-8C the 49 hard pairs are the reliable required-adjacency constraints and the 50 soft pairs the preferences, while the pairs sitting within the weight-100 margin (SL-06) are the declared sensitivity carried at lower confidence.

8.56 Scope limits and the forward to Chapter 9

Six scope limits bind the integration claim, each scoping a specific contract: SL-01 confines the dimensional substrate to a single Australian retail corpus; SL-02 confines the topological substrate to Australian listing-site plans; SL-03 declares the configurational cap permanent; SL-04 records the hardcoded thresholds; SL-05 declares the two untestable probes; and SL-06 declares the coupling-matrix sensitivity. The full declarations and their remediation paths are presented in Section 8.63. The writing-plan handoff gate of 2026-05-02 confirmed all four contracts stable across the last two verdict-loop exits, passing ten of ten criteria, so the chapter forwards to Chapter 9 a stable, versioned, and adversarially audited contract surface: a binding empirical interface with a formal evidence trail, rather than a provisional sketch. Chapter 9 inherits the four contracts in their final v1 state, the v5.1-sealed lineage dossier as the binding evidence record, the ten claim-bank entries with their declared statuses, and the appendix bundles that publish the underlying numbers.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation (this section)
8.63 Limitations and External Validity
8.66 Conclusion

8.58 Why three validation dimensions

In a design science research thesis that produces an empirical substrate rather than a runnable system, validation is the act of stating what the substrate licenses Chapter 9 to assume and what it does not. Three dimensions frame the declaration, following Wieringa’s distinction between technical, empirical, and scope validation: reproducibility asks whether the canonical scripts, re-run, return the same numbers; stability asks whether perturbing the corpus shifts the headline findings; and coverage asks whether the corpora span the population the claims ground.⁵⁴ The three rows of the scorecard below return a partial pass, a mixed verdict, and a partial pass respectively; each is honest rather than rhetorical.

Criterion	What was tested	Result	Verdict
Reproducibility	Re-run of the canonical scripts on the canonical inputs	Dimensional and Topological formally replayed to parity; Occurrence and Configurational are deterministic by construction, pending formal replay	Partial pass (2 of 4)
Stability	Re-derivation of the headline findings under cohort variation, a matching-tolerance sweep, and direct census read	Dimensional: 50 mm carries the highest lift of any standard grain in every cohort, 100 mm next, 25 mm a secondary sub-module, robust across three cohort cuts and a tolerance sweep (the fine divisor m = 4 mm out-lifts it by 0.002 in the building cohort at exact match alone, and at no tolerance); Topological: the census coupling structure resolves to 49 hard and 50 soft pairs, secure on the interior with the threshold-band pairs the only movable assignment	Stable; SL-06 margin
Coverage	Adequacy of the three corpora for the module-library and adjacency outputs	Dimensional, Topological, and Configurational each adequate within SL-01 to SL-03	Partial pass

8.59 Reproducibility: two of four pipelines verified

The reproducibility verdict is a partial pass: two of the four pipelines have been formally re-run on the canonical corpus and confirmed to parity. The Dimensional pipeline replays the lift ranking (50 mm at 0.339, 100 mm at 0.305, and 25 mm at 0.283 in the building cohort), confirming 50 mm as the dominant base module with 25 mm a secondary sub-module; the Topological pipeline replays the 745-plan census topology (the 72 space categories, the 32,594 directed adjacency edges, and the 49-hard / 50-soft coupling partition) deterministically, an exact match. Both replays are recorded in the dossier provenance block. (The dossier also retains, for back-of-house auditability, a parity replay of the superseded API-methodology pilot, the abandoned extraction iteration; that record is an audit-trail artefact of how the codification was worked out, not a reader-facing corpus, and it is not the topology the chapter carries.) The Occurrence and Configurational pipelines are strictly deterministic by construction (seeded initialisation and hardcoded intervals for the one, a fixed enumeration for the other) and their canonical outputs are audit-trailed, so a formal replay of each is administrative rather than substantive and is declared as scope-limited future work; the small fourth-decimal differences a replay would show on the dimensional figures are floating-point round-off rather than sampling variation, since the pipelines are deterministic. The corresponding claim, CL-8-09, is therefore stated at medium confidence and declared-limited: the epistemic warrant for the two pending pipelines is the determinism of their scripts plus the audit trail confirming their outputs; the formal warrant awaits replay during the Chapter 9 build phase, at which point the two-of-four count becomes four-of-four.

8.60 Stability: dimensional stable, topological secure on the interior

The Dimensional headline (the 50 mm dominant base module of Section 8.19, with 100 mm next and 25 mm a secondary sub-module) is stable in the sense that matters for a design threshold. Searching m ∈ [2, 300] mm with no normative floor and scoring lift over a rounded-to-5 mm reporting-convention null, 50 mm carries the highest lift of any standard grain across all three cohort cuts (building-cohort lift 0.339 against 0.305 for 100 mm and 0.283 for 25 mm), and the ranking holds under a matching-tolerance sweep. One candidate out-lifts it anywhere in the search: the fine divisor m = 4 mm, in the building cohort at exact match alone, by 0.002 (0.341 against 0.339); it leads at no non-zero tolerance and in no other cohort, so it does not disturb the ranking the sweep is testing. Only under a proportional ±0.2 per cent tolerance does the 25 mm lift collapse while 50 and 100 mm hold, which is itself diagnostic of 25 mm’s secondary status. We report this robustness as a plain cohort-and-tolerance sensitivity rather than as a resampling distribution, consistent with the descriptive-evidence commitment of Section 4.3. The Topological classification is not tested by resampling, because the 745-plan census is the enumerated stock for the strata it covers and a resampling distribution would have no sampling parameter to estimate, as Section 4.3 sets out; the appropriate stability test reads the structure off the census directly and asks which of its assignments a different design cut-off would move. Read that way the classification is secure on its interior: the census resolves the supported pairs into 49 hard required adjacencies and 50 soft preferences, and the pairs sitting well above and well below the weight-100 line are unambiguous. Only the handful of pairs whose census weight falls within a narrow band of that line have an assignment a different threshold could flip, and that margin is the SL-06 scope limit reported in Section 8.37. HC-8C is therefore contracted on the census hard and soft interiors (49 required adjacencies and 50 preference weights), with the threshold-band pairs informing its softer band rather than its load-bearing core, and CL-8-04 is stated at medium-to-high confidence with the boundary margin named as its limiting condition.

8.61 Coverage and the single-author limit

The coverage verdict is a partial pass across the three evidence dimensions. The Dimensional evidence (40,342 dimension values parsed from 23,048 products, of which about 52 per cent state a title dimension; a deterministic title-parse with governed arbitration, instrument-validated at precision 1.000 and 0.0 per cent unit-error) and the Topological evidence (the 745-plan census, resolving 72 space categories across 32,594 directed adjacency edges over the standard Australian residential taxonomy) are adequate within their declared scope, and the Configurational evidence is demonstrably non-empty (at least 100 canonical packings under the SL-03 cap), which is the adequacy threshold HC-8D requires. The one residual gap that none of the scope limits absorbs is that the adequacy of the census 72-category taxonomy for the specific module list Chapter 9 will instantiate cannot be confirmed at this gate, because that list is itself an output of the Chapter 9 design cycle; this is an honest forward-pointed check rather than a defect, and it closes once Chapter 9 declares its module list.

A final acknowledgment qualifies the whole of this section. All the validation activities reported here (the replays, the stability tests, the coverage-adequacy statements, and the verdict-loop record) are conducted by the author against an internally defined rubric; no independent external auditor, co-supervisor sign-off, or domain-expert re-extraction has yet validated the assessments. This is a structural feature of single-author doctoral substrate validation rather than a methodological choice, but it remains a material constraint on the validation’s external warrant, and we record it as such rather than absorbing it into the partial-pass and mixed verdicts. The remediation is a single domain-expert cross-check (a registered architect or accessible-design specialist with SDA project experience re-extracting a five-to-ten-per-cent subset of the corpus and reviewing the four contract texts for compatibility with practitioner intuition), and it is independently actionable as post-submission work, registered in the open-work register of Section 8.66.

8.62 The verdict-loop record and the anti-fabrication posture

The validation rests on a verdict-loop, evidence-loop, and coherence-loop protocol that runs adversarial audits against each pipeline’s headline finding, and its function (rather than its full catalogue) is what matters here: it makes the record audit-traceable, so that any claim of Sections 8.19 to 8.50 can be back-traced to the loop that audited it. Sixteen loop exits stand at the v5.1 seal: the fourteen original exits plus two appended on 2026-05-02 (a census-structure audit and a schema-revision-drift audit), both returning upholding verdicts across all six clauses of HC-8C and completing the stability clock for the four contracts.⁵⁵ CL-8-09, and with it CL-8-04 and CL-8-08, sit at the medium, declared-limited tier: not a weakness disguised as caution, but the study’s anti-fabrication regime in action. Where the warrant is partial, the claim is stated at the warrant level and the residual routed to the scope-limit register of Section 8.63 rather than absorbed into rhetorical hedging. The construct contributed here is therefore a validated empirical substrate, not a fully verified one, and the difference is exactly the partial-pass and mixed verdicts above and the four future-validation pathways Section 8.66 names. Each pathway (replaying the two deterministic scripts, re-running the topological stability test on the census, cross-checking the taxonomy against Chapter 9’s module list, and optionally lifting the packing cap) is concrete, independently testable, and independently absorbing of one scope limit; none is prerequisite to Chapter 9 building, and all are prerequisite to closing the scorecard at full pass.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity (this section)
8.66 Conclusion

8.64 The function of declared limitations

A limitations section in a design-science thesis is epistemic calibration rather than apology. It identifies the population each evidence stream was drawn from, the conditions under which each finding holds, and, with equal precision, what those findings permit downstream artefacts to do and what they do not. Wieringa’s scope-validation criterion frames the obligation: the substantive question is whether the corpus suffices to support the specific design claims being made, not whether it is complete.⁵⁶ A research artefact that admits its constraints is one whose claims can be tested, replicated, and superseded; an unbounded substrate would be unfalsifiable, and so methodologically weaker rather than stronger. The seven scope limits are each stated in the same four parts: what the limit is, what evidential warrant it imposes on its handoff contract, what it does not prevent Chapter 9 from doing, and the remediation that would retire or narrow it.

SL-01: the dimensional corpus: single retailer, jurisdiction, snapshot. The dimensional pipeline uses one source, the Bunnings catalogue as crawled in August 2025. Specialist trade suppliers are excluded by design, the snapshot encodes a moment in time, and the corpus is Australian-only. But HC-8A targets a base module for residential rather than bespoke construction, and the dominant 50 to 100 mm metric grid (with 25 mm a common sub-module) is interpretable across ISO-aligned jurisdictions, ISO 2848 being one of several mutually-consistent referents rather than a privileged one; a US metricated-imperial replication would be expected to recover a comparable coarse grid within measurement tolerances rather than refuting the Australian finding. The remediation is to process a successor corpus, from a different retailer, jurisdiction, or snapshot, through the same pipeline; this is a data-collection task, not a methodological revision.

A second qualification attaches to the same corpus. The building and non-building split on which every cohort result rests is assigned a priori at the level of the retail category, not the individual product: each catalogue breadcrumb is mapped once to one of nine tiers (the built-in building fabric of structure, openings, fixed fit-out, hard finishes, and fixings, set against the loose furnishings, soft-goods, appliances, and consumables that occupy a dwelling without being built into it), so a mixed category is resolved by its dominant membership and no product is reclassified case by case. The fixed-versus-loose line is the substantive basis of the split, and the categories that sit near it are disclosed rather than smoothed over (carpet, read as a soft finish against the hard flooring it abuts; free-standing wardrobes against built-in cabinetry; door hardware against the general fixings tier), and these borderline categories are exactly the ones the three robustness cuts of Section 8.22 move between cohorts, which is why the 50 mm result is reported as holding across the boundary rather than resting on any single placement. The remediation, were a finer split required, is a per-product reclassification of the borderline categories; the robustness cuts already establish that the dominant-grain finding does not depend on it.

SL-02: the topological and occurrence corpus: Australian listing-site, 2025 snapshot. The 745-plan corpus supports both the occurrence and the topological pipelines and carries three constraints: geographic and typological specificity (no warrant beyond Australian residential to commercial, institutional, or non-Australian markets); listing-site selection bias (over-representing newer, renovated, presentation-ready stock relative to permit-application or as-built plans); and SDA over-representation in the August stratum, which inflates ensuite and wet-area frequencies. The cross-corpus stability of Section 8.28 confirms the top-twenty rank order across the strata, and HC-8B is closed at medium confidence for the wet-area cell accordingly. The shape of that over-representation has now been characterised directly rather than left as a supposition: a descriptive reading of the eighty-six Specialist Disability Accommodation plans against the six hundred and fifty-nine general-market plans (held within the census and triangulated against the collection wave, the recovered state, and a single-dwelling restriction) locates the difference in internal proportion rather than scale. The Specialist Disability Accommodation plans are not larger overall, total dwelling area being indistinguishable between the strata, a reading the original listings corroborate at a near-identical building-area median of 210 against 202 square metres; they carry more sanitary spaces, however (a median of three against two), and, where those rooms are dimensioned, larger ones, with the same direction recorded independently in the listings’ own stated bathroom counts.⁵⁷ The inflation this limit records is therefore real, specifically wet-area, and not a general size effect, and the occurrence frequencies of Section 8.26 are to be read with that stratum-specific re-weighting in mind. The remediation is targeted corpus augmentation (a proportionate-incidence SDA sample and a supplement of permit-application plans), neither required before Chapter 9.

SL-03: the configurational enumeration: the 100-packing cap. The packing generator exited after exploring two of twenty-four orderings, yielding the 100 canonical packings of HC-8D, so the count is a confirmed lower bound rather than the cardinality. Chapter 9 does not require exhaustive cardinality; it requires a confirmed lower bound sufficient to populate the three evaluation indices that guide its design search, and the 100-packing set provides a structurally diverse sample for that purpose. The epistemic consequence is that HC-8D commits Chapter 9 to operating within the enumerated space, not to exhausting it. The remediation is an uncapped run with a dedicated pruning strategy, a well-defined v3 engineering extension bounded above by Klarner’s exponential growth result.⁵⁸

SL-04: the frequency thresholds: hardcoded, not model-derived. The 90 per cent and 20 per cent cumulative-coverage thresholds are hardcoded in the analysis script rather than extracted from the boundaries of a model fit, so HC-8B’s classification rests on a principled but conventional choice. The substantive consequence is bounded: the rank-order stability of Section 8.28 confirms that moving the threshold from 90 to 85 or 95 per cent would shift at most two or three categories between tiers without restructuring the library. The remediation is a single-script modification that extracts the model component boundaries and compares them to the hardcoded thresholds; if they coincide within a few percentile units, the concern is discharged.

SL-05: two coherence probes untestable. Of the six cross-pipeline probes, the two at the topological-by-configurational interface are untestable: one requires per-room edge-run lengths absent from the v0 schema, the other requires port-attachment parsing unimplemented in the v2 generator. That coherence dimension is therefore unverified at v1 (the principal outstanding gap in the substrate’s internal-consistency case), though neither probe blocks Chapter 9’s construction. The remediation is an edge-run extraction pass over the corpus and a v3 port-attachment implementation, each independently fundable as post-submission validation.

SL-06: coupling-matrix boundary-margin sensitivity, and the absent class unmeasurable at census. Two distinct considerations qualify the topological confidence. The first is the genuine SL-06: the weight-100 hard/soft boundary is a design threshold, and the handful of pairs whose corpus weight sits within a narrow band of it are the part of the classification the census reclassifies, so HC-8C carries the hard and soft interiors as its secure commitments and the boundary-zone pairs at medium confidence. The second is sensitivity finding S5-3, which is a property of how the extraction codification matured rather than a census measurement. The early codification recorded full co-presence, and so could register an avoidance (categories that co-occur in a plan yet are never placed adjacent) as a development-stage observation; the finalised v0 codification records only realised adjacency, so the census instrument retains no co-presence denominator and a pair that is never adjacent falls to insufficient by construction. The avoidance signal is therefore a codification-development observation, not a census finding, and its disappearance at 745 plans is an instrument limit of the finalised codification rather than evidence that the dispreferred pairings occur. We therefore do not carry hard-stop forbidden pairs to Chapter 9. HC-8C commits the generator to the census required-adjacency structure (the 49 hard pairs as required adjacencies and the 50 soft pairs as preference weights), while the pilot-stage avoidance observation is forwarded as a soft generator dispreference, biasing the generator away from the dispreferred pairings without forbidding any of them.⁵⁹

SL-07: inter-rater reliability not established. The agent-manual extraction was executed by a single vision agent against the sealed v2.0 ruleset across all 745 plans; no second extractor re-extracts a subset, so Cohen’s κ, Krippendorff’s α, and Fleiss’ κ are unreported and the extraction’s inter-rater reliability is structurally self-asserted. The reliability of the boundary judgements is plausibly high (the principles are unusually explicit and the pre-repair connectivity rate is consistent with disciplined application), but plausibility is not a measured statistic, and we record the absence rather than implying a reliability the corpus has not measured. HC-8A is unaffected (its product dimensions are machine-readable) and HC-8D is unaffected (its enumeration is synthetic). The remediation is a five-to-ten-per-cent subset re-extraction by an independent extractor with an agreement coefficient reported per schema field.

8.65 External validity: what the substrate licenses

The aggregate question across the seven limits is whether the substrate suffices to license the four contracts Chapter 9 depends on, given that completeness was never the design intent, and the answer is a qualified affirmation: each contract is licensed at its declared confidence level, and the declared levels are honest. The substrate licenses Chapter 9 to operate on a 50 to 100 mm base grid with a 25 mm sub-module (HC-8A, high confidence), to use the 90/20 frequency thresholds (HC-8B, medium-to-high), to treat the census required-adjacency structure (49 hard pairs as required adjacencies and 50 soft pairs as preference weights) as its normative interaction constraints (HC-8C, high for the hard and soft interiors and medium for the weight-100 boundary zone), and to use the 100-packing set as its arrangement-space seed (HC-8D, high for the lower-bound claim). What it does not claim is equally explicit: not that the 50 to 100 mm grid is the universal base module for all global construction, but that it is the dominant metric grid (consistent with ISO 2848 among other referents, not uniquely caused by it) derived from one Australian retail corpus at one point in time; not that 100 packings exhaust the space, but that they are a sufficient lower bound; not that the topological adjacencies generalise to commercial or institutional types, since they are calibrated to the Australian residential corpus. The lift method is jurisdiction-agnostic and the coarse-grid finding is re-interpretable across ISO-aligned jurisdictions, but the topological and occurrence findings reflect Australian conventions and would be expected to differ in other markets. Three validation pathways remain open as post-submission work (replaying the two deterministic scripts, re-running the topological stability test against the larger corpus, and cross-checking the taxonomy against Chapter 9’s module list), and none blocks Chapter 9. The limits are instruments of precision: they specify the conditions under which the contracts hold, the conditions under which they would be superseded, and the work that would extend their validity, and Chapter 9 operates within those conditions by design.

Chapter 8: Evidence from a Census of Australian Floor Plans

8.1 Introduction
8.5 The Empirical Turn
8.10 Corpus Construction: Method and Calibration
8.14 Corpus Pipeline, Merge, and Sealing
8.19 Dimensional Evidence
8.26 Occurrence Evidence
8.33 Topological Evidence
8.39 The Census Coupling Structure
8.42 Configurational Evidence
8.47 The Partial Grammar
8.50 Integration
8.57 Validation
8.63 Limitations and External Validity
8.66 Conclusion (this section)

8.67 What the chapter constructed

The empirical substrate is exactly that, an empirical substrate, distinct from a generator, a model, or a set of design rules inferred from first principles. Its single architectonic claim, stated in Section 8.5 and substantiated across Sections 8.19 to 8.42, is that the Australian residential dwelling, treated as a representational object amenable to systematic corpus analysis, exposes a regular and stable interface (dimensional, spatial-frequency, topological, and configurational) on which a downstream procedural generator can be built without the generator having to re-discover the regularities for itself. The chapter’s contribution is that interface, and four independent evidence pipelines and four formally defined handoff contracts hold it open at a stable, bounded state and deliver it to Chapter 9 as a trustworthy, scope-bounded empirical foundation. The four pipelines are independent triangulations: the Dimensional pipeline queried a product corpus of 40,342 dimension values across 23,048 products to identify a dominant 50 to 100 mm metric grid, with 25 mm a secondary sub-module, consistent with ISO 2848 among other referents; the Occurrence pipeline queried a 745-plan floor-plan corpus to derive 72 space categories and threshold-bounded frequency tiers; the Topological pipeline derived the coupling structure on the 745-plan census, yielding the required-adjacency structure of 49 hard pairs with 50 soft preferences; and the Configurational pipeline generated fifty canonical arrangements and one hundred canonical packings. The integration argument of Section 8.50 establishes that no cross-pipeline coherence probe returns incoherent (four return coherent or partially coherent, and the two untestable ones are declared as scope limit SL-05 rather than absorbed into silence), which is the evidentiary condition under which the joint triangulation claim holds.

8.68 The ten claims and their final status

The chapter’s substantive contribution is formalised in ten claims, registered in the claim bank, that are facets of the one substrate claim rather than ten independent contributions. Six are supported at high confidence: the dominant 50 to 100 mm metric grid recovered over the unconstrained search range m ∈ [2, 300] mm and its consistency with ISO 2848 among other referents (CL-8-01, CL-8-02); the lift-ranking that places 50 mm first, 100 mm next, and 25 mm as a secondary sub-module, robust across three cohort cuts and a matching-tolerance sweep (CL-8-03); the descriptive finding that the adjacency-avoidance signals observed during codification development are, bar one partial exception at Clause 10.6.3, uncodified in the National Construction Code, an observation about where residential regulation is prescriptive rather than a census forbidden-adjacency finding (CL-8-05); the confirmed lower bound of at least 100 canonical packings within the SL-03 cap (CL-8-07); and the bounded adequacy of the three corpora (CL-8-10). Four are declared-limited at medium-to-high confidence with the limiting condition named: the census required-adjacency structure under the weight-100 boundary sensitivity of SL-06 (CL-8-04); the partially realised grammar (CL-8-06); the cross-pipeline coherence under the hardcoded thresholds and the two untestable probes (CL-8-08); and the reproducibility, with two of four pipelines formally replayed (CL-8-09). No claim is unsupported, and the writing-plan handoff gate passed ten of ten criteria. The architecture is one of honesty: each declared-limited claim has an identifiable evidential address an examiner can interrogate (the census coupling summary behind CL-8-04, the two deterministic scripts behind CL-8-09), so that the claims are propositions with addresses rather than rhetorical positions.

8.69 The four handoff contracts and the open pathways

The four handoff contracts are the mechanism by which the substrate’s findings become operational commitments for Chapter 9. HC-8A is stable at v1 (a 50 to 100 mm base grid with a 25 mm sub-module, recovered over the unconstrained range, consistent with ISO 2848 among other referents); HC-8B is closed at v1 (the frequency thresholds, hardcoded, with SL-04); HC-8C is stable at v1 with two consecutive upholding verdicts (the census required-adjacency structure of 49 hard pairs as required adjacencies and 50 soft pairs as preference weights, with the adjacency-avoidance signals from codification development carried as soft dispreferences); and HC-8D is stable at v1 and partially realised (the canonical polyomino space with 4-neighbour adjacency and edge-run attachment operational, composite joins and port-attachment specified but not yet parsed, the 100-packing cap permanent). Four validation pathways would convert the declared-limited claims to supported, and none is prerequisite to Chapter 9 building: replaying the two pending deterministic scripts would lift CL-8-09 and close the reproducibility row at four of four; a sensitivity check of the census coupling structure across the weight-100 boundary, which would test the 49 hard and 50 soft pairs against the SL-06 threshold margin, is what closes CL-8-04, subject to the standing caveat that the census instrument records only realised adjacency and so has no co-presence denominator against which to weigh a never-realised pairing (sensitivity finding S5-3), which is why the adjacency-avoidance observation from codification development is carried descriptively, as a soft generator dispreference, rather than as a census finding; cross-checking the taxonomy against Chapter 9’s module list would close the coverage row for CL-8-10; and an optional uncapped generator run would replace the lower-bound claim with an exact count. The pathways describe what the substrate would look like under complete validation (supported rather than declared-limited), not what Chapter 9 requires before it can begin.

8.70 Design knowledge from the geometry read

Beyond the four pipelines, the geometry read of the same census (the canonical 12,849-space layer reconciled in Section 8.26) yields a layer of directly usable design knowledge that the four-pipeline substrate did not set out to produce, and which Chapter 9 and the discussion of Chapter 11 draw on. It takes three forms, each descriptive over the complete enumeration and each detailed in Appendix H.

The first is a set of numerical design references (Appendix H, Section H.5). Each room category carries an observed median floor area with its interquartile range and a quartile coefficient of dispersion: a secondary bedroom sits at a median 10.8 m² and a main bedroom at 14.1 m², a living room at 21.4 m², a bathroom at 5.5 m², a garage at 33.6 m². The dispersion measure does analytic work: it separates the categories sized to a fixed object, garage and main bedroom and media room, which are the most uniform, from the leftover and outdoor spaces, store and porch and alfresco, which absorb whatever area remains. Two further references accompany the size bands: Australian project-home room dimensions sit on a 50/100 mm grid, aligned with the dominant 50 to 100 mm product grid of Section 8.19 while staying coarser than its 25 mm sub-module, and a median plan draws its spaces from only about seven distinct geometric primitives, a kit-reuse ratio near 2.4. These two readings converge on the chapter’s organising result for the dimensional system, drawn together at the integration of Section 8.50: a three-tier grid of a 50 mm micro grain at which components are dimensioned (with 25 mm its finest sub-module), a 150 mm meso grid at which whole room modules are dimensioned so that they compose with least combinatorial misfit, and the 300 mm macro multimodule that plans the envelope, the meso and macro tiers sitting at three and six times the micro grain respectively. The 150 mm meso grid is the calibrated centre of that structure (the coarsest grid that collapses the module vocabulary enough for modules to compose while holding the displacement of any room within build tolerance), and it is a design prescription rather than the finer 100 mm grain on which the existing stock is drawn. Unlike a normative handbook such as Neufert’s, whose figures are prescribed targets, these are measured central tendencies and observed spreads read off what builders actually drew, describing the dimensioned subset (structurally the habitable rooms), not fitted targets.

The second is a small set of empirical design rules, each stated with a measured scope and an explicit falsifier, the observation that, had it held, would have refuted the rule (Appendix H, Section H.6). The headline rules are four: a larger dwelling repeats standard rooms rather than enlarging them, since across bedroom bands the space count rises about sixty-nine per cent while area per room barely moves; containment governs size more than nominal type, since enclosed, internal, and walled spaces cluster near 12 m² and tight while open and external spaces sit near 16 to 17 m² and variable; the obligation of an adjacency predicts its opening, since the doorless open-threshold share falls from 22 per cent on required couplings to 4 per cent on insufficient ones; and form follows fixed content, since the rooms sized to an object are the most dimensionally uniform. The set is deliberately small and honestly bounded: of the candidate higher-order room bundles, only the premium tier (where a media room is present, an alfresco is present 86 per cent of the time against a 61 per cent base rate) rises materially above its base rate, while the apparent master-suite and open-plan bundles sit at their base rates and are coincidences rather than patterns. This is the contribution’s point of contact with Alexander’s pattern language, extended by giving each pattern a measured scope and a falsifiable parameter rather than an author’s confidence mark.

The third is a componential space lexicon that defines room categories as bundles of distinctive features rather than by name or by what they are drawn next to (Appendix H, Section H.7). Four features (function, host, fixture signature, and edge) differentiate the realised categories into thirty-six distinct bundles, collapsing the remainder as genuine synonyms. Its analytic move is to treat access as a topology rather than a vague notion of privacy: an ensuite is a bathroom nested in a bedroom, a walk-in robe and a walk-in pantry are the same nested access structure differing only in host, and these feature definitions in turn predict size, since a category’s dimensions are the intersection of its features. The lexicon thus does explanatory work: it says why a bathroom is small and tight and an alfresco large and variable, and gives the Chapter 9 generator a vocabulary whose categories carry the same meaning outside this corpus.

These three outputs are derived design knowledge built on the validated substrate, descriptive over the complete census throughout, and they are advanced as such: derived from Contribution 5 rather than constituting a separate contribution and read under the contribution synthesis of Chapter 11.

8.71 The chapter’s position in the study

The structural position of the chapter is specific. It is preceded by the theory and artefact chapters (Chapter 3 on modularity and design science, Chapter 4 on methodology, Chapter 5 on the standardisation schema (standardised accessibility), Chapter 6 on the Governed Kernel Architecture, and Chapter 7 on the notation (the planimetric notation)) and it is followed by Chapter 9, in which the substrate is consumed by the generator (procedural generation), and Chapter 10, in which the consumed artefacts are evaluated against external criteria. Its role in that sequence is precisely the role the term empirical substrate designates: it translates the dwelling’s regularity (asserted by modular theory and the SDA model) into measured, corpus-grounded, claim-banked findings that the Chapter 9 generator does not have to establish independently. The study’s design-science argument requires that gap to be closed, because a generator built on unverified regularities incorporates unverified assumptions, and any evaluation of its outputs inherits them. We make the regularities explicit, test them against corpus evidence, and forward them at a declared confidence level; the four handoff contracts are the mechanism for that forwarding, attaching a contract clause to each empirical commitment so that an examiner can trace an audit chain from measured fact through design constraint to evaluation criterion at any link.

The contribution of the chapter to the study is therefore twofold. First, it supplies the evidential warrant for the design decisions Chapter 9 makes: the 50 to 100 mm grid with its 25 mm sub-module, the space categories and their frequency tiers, the census required-adjacency structure of 49 hard and 50 soft pairs, and the canonical polyomino space. These are therefore empirically grounded design decisions rather than stipulations, and the grounding is publicly legible in the claim bank, the handoff contracts, and the scope-limits register. Second, it demonstrates the study’s anti-fabrication methodology in its fullest form: it does not round partial evidence up to a strong claim, absorb scope limits into rhetorical hedging, or treat pending validation as completed, but states what the evidence supports, what it does not yet support, and the roadmap between. The substrate inherited by Chapter 9 is bounded, versioned, and adversarially audited: validated but not fully verified, conditional in four named ways and unconditional in six, and honest about every distinction between those two states. That is the foundation on which a trustworthy procedural generator can be built.

Notes

A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75-105, 2004. The design-cycle model distinguishes the knowledge base, from which design draws, from the environment, in which design is applied; Chapter 8 contributes to the knowledge base. ↩︎
S. Gregor and A. R. Hevner, “Positioning and Presenting Design Science Research for Maximum Impact,” MIS Quarterly, vol. 37, no. 2, pp. 337-355, 2013. ↩︎
R. J. Wieringa, Design Science Methodology for Information Systems and Software Engineering. Berlin: Springer, 2014, ch. 12 (“Validation Research”) and ch. 18 (“Sample-Based Generalisation”). ↩︎
International Organization for Standardization, ISO 2848:1984 — Building construction — Modular coordination — Principles and rules. Geneva: ISO, 1984. ↩︎
That a floor plan can be carried as a structured textual specification rather than a drawing is not a novel premise: Appendix C, “Text as a Building Medium”, traces historical precedents (Greek, Japanese, and Vietnamese proportional and modular systems) in which non-graphic specification carried replayable construction knowledge, and sets out the disclosure conditions under which a text-based plan remains inspectable and revisable, the same conditions the census extraction in this chapter relies on. ↩︎
A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75-105, 2004. ↩︎
S. T. March and G. F. Smith, “Design and Natural Science Research on Information Technology,” Decision Support Systems, vol. 15, no. 4, pp. 251-266, 1995. ↩︎
R. J. Wieringa, Design Science Methodology for Information Systems and Software Engineering. Berlin: Springer, 2014, ch. 12 and ch. 18. ↩︎
S. Gregor and A. R. Hevner, “Positioning and Presenting Design Science Research for Maximum Impact,” MIS Quarterly, vol. 37, no. 2, pp. 337-355, 2013. ↩︎
The chapter’s extraction calibration principles (v2.0; sealed 2026-04-26) are republished verbatim in appendix data bundle appendix-data-ch8-generator-specification (held under publish-data/). ↩︎
The chapter’s claim bank (v0.1) registers ten claims CL-8-01..CL-8-10 with status, confidence, verdict-loops completed, and active scope-limit fields. ↩︎
The chapter’s handoff-contracts ledger (v0.1) holds the four contracts HC-8A through HC-8D, each with a Contract Evolution Ledger. ↩︎
The chapter’s evidence lineage dossier (v5.0; sealed 2026-04-27) records this requirement as DRQ-1 through DRQ-5: corpus size, stratum coverage, schema validity, repair audit, and reproducibility spot-check. ↩︎
Australian Bureau of Statistics, Census of Population and Housing, 2021, dwelling structure (STRD), Australia: 10,852,208 private dwellings, 70.1 per cent separate houses. The ratio is offered as a scale indication only and not as a sampling fraction. ↩︎
The chapter’s scope-limits register (v1.1) records SL-02 verbatim: “Australian residential floor-plan topology, from public property listing websites, 2025-08-23 snapshot. Findings may not generalise to non-Australian residential conventions, non-listing-site architectural plans, or floor plan styles from different periods.” ↩︎
The methodological commitment is canonical to this research; the operational rationale and per-iteration decision record are reproduced in the chapter’s lineage dossier (v5.0), Section 5 LIN-T-02. ↩︎
On large-language-model annotation and multimodal-vision extraction: Z. Tan et al., “Large language models for data annotation: A survey,” EMNLP Findings 2024, arXiv:2402.13446; F. Gilardi, M. Alizadeh, and M. Kubli, “ChatGPT outperforms crowd-workers for text-annotation tasks,” PNAS, vol. 120, no. 30, e2305016120, 2023; R. Y. Pang et al., “Understanding the LLM-as-a-Judge: Position bias in pairwise evaluation,” arXiv:2406.07791, 2024. ↩︎
Generator-specification appendix bundle index, Section 2 file manifest (fpvisionrunconfig.json, fpvisionschemav0.json, fpvisionpromptv0.md, extractioncalibrationprinciplesv2.md). ↩︎
The extraction calibration principles (v2.0; sealed 2026-04-26) are republished verbatim in appendix-data-ch8-generator-specification, file extractioncalibrationprinciples_v2.md. ↩︎
Floor-plan corpus appendix bundle index, Sections 2-3 (plancorpusmanifest.csv, the 746-row authoritative provenance ledger). ↩︎
Lineage dossier v5.0, Section 5 LIN-T-04 dual-anchor reporting under the v3 Section A-04 amendment; the comparison artefact is held with the chapter’s reproducibility evidence. The connectivity rates are reported descriptively over the census; no interval estimate or significance test is attached, since the corpus is a complete census of the extracted set rather than a probability sample. ↩︎
Lineage dossier v5.0, Section 5 LIN-T-05 (category, edge, and adjacency-pattern counts) and LIN-T-06 (coupling classification counts). ↩︎
Floor-plan corpus appendix bundle index, Section 1 sibling-bundle note: the v5.0 census bundle is the reader-cited record; the earlier API-pilot codification is retained back-of-house as superseded, per the lineage-dossier v5.0 sealing of 2026-04-27. ↩︎
The chapter’s lineage dossier and verdict-loop record (V-02 and V-03 cycles, 2026-05-02). ↩︎
Floor-plan corpus appendix bundle index, Section 4 reproduction notes. ↩︎
The chapter’s scope-limits register (v1.1) carries entries SL-01 through SL-07 in their authoritative form. ↩︎
On the tradition: N. J. Habraken, Supports: An Alternative to Mass Housing. London: Architectural Press, 1972; A. F. Bemis, The Evolving House, Volume III: Rational Design. Cambridge, MA: MIT Press, 1936; E. D. Ehrenkrantz, The Modular Number Pattern. London: Alec Tiranti, 1961, whose corpus-specific Californian school grid is the cautionary case for inductive extraction that does not generalise. ↩︎
International Organization for Standardization, ISO 2848:1984 — Building construction — Modular co-ordination — Principles and rules. Geneva: ISO, 1984; the M/2, M/4, M/8 sub-module hierarchy is set out at Section 4.2. ↩︎
J. Chapman, Timber Wall Framing. Canberra: Australian Building Research Board, 1981, p. 47; J. Jiang, L. Ottenhaus, and J. M. Gattas, “A parametric design framework for timber framing span tables,” Australian Journal of Structural Engineering, vol. 24, no. 3, pp. 226-240, 2023. ↩︎
The measurement pipeline, its pre-registration, and the instrument-conformance audit are documented in the dimensional-corpus data bundle (Appendix H). The rebuild was undertaken because an earlier pass measured dimensions by language-model inference from title strings, an un-instrumented procedure that did not survive methodological audit; the present finding supersedes it. ↩︎
Appendix data bundle appendix-data-ch8-floor-plan-corpus, index Sections 1-4, gives the field-by-field derivation from the canonical run output. ↩︎
This is decision D-RECON-1 (ratified 2026-06-14): the built-in robe is a fixture, not a space. The reconciliation (the topology read enumerates 15,957 zones, of which 1,403 are built-in robes) is recorded in the floor-plan corpus appendix bundle; applying the convention yields 14,554 enumerated spaces. A parallel geometry extraction of the same 745 plans (the read on which the dimensional-coverage and configurational analyses rest) counts 12,849 spaces under the same room-grain convention; this geometry read is the canonical thesis-wide space count (D-1, reopened 2026-06-15), while the occurrence read reported here remains the basis for the category-frequency distribution and the coupling graph. The 1,705-space difference between the two reads is counting variance over the same plans (split/merge and node-alignment between two independent extractions, ~76 per cent high-confidence match), not a separate corpus, and is reconciled in the floor-plan geometry reconciliation record. The complementary roughly 24 per cent reflects unaligned nodes between two single-agent extractions of the same plans; no external inter-rater reliability statistic (Cohen’s κ or Krippendorff’s α) has been computed for the extraction, so the substrate’s reliability rests on this inter-extraction agreement rather than on an external reliability coefficient, and an independent inter-rater exercise is declared future work under scope-limit SL-07. ↩︎
Sum of the top-ten merged counts in spacecategoryfrequencies_merged.csv (room-grain convention): 8,882 of 14,554 = 61.0 per cent. The figure is reported descriptively over the census. ↩︎
M. E. J. Newman, “Power laws, Pareto distributions and Zipf’s law,” Contemporary Physics, vol. 46, no. 5, pp. 323-351, 2005. ↩︎
A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-law distributions in empirical data,” SIAM Review, vol. 51, no. 4, pp. 661-703, 2009; the maximum-likelihood-plus-likelihood-ratio protocol is out of scope here, whose claim is qualitative. ↩︎
Schema gap I-1-13, recorded in the lineage dossier; the legacy 73.5 per cent connectivity statistic is not directly comparable to the v5.0 marginal 57.85 per cent precisely because the underlying opening definitions differ. ↩︎
Scope-Limit SL-04 (active); the O-V-01 verdict (hardcoded) was reached on 2026-04-24 by direct read of the script source. The thresholds are validated against the three-component structure of the dimensional corpus (they sit near the small/mid boundary), but they are not extracted from it. ↩︎
Computed from spacecategoryfrequencies_merged.csv (room-grain convention): the cumulative sum of merged counts first crosses 90 per cent at the 22nd-ranked category. The walk-in robe (wardrobe) sits at rank 11 once built-in robes are carried as fixtures. ↩︎
B. Hillier and J. Hanson, The Social Logic of Space. Cambridge University Press, 1984; A. Turner, M. Doxa, D. O’Sullivan, and A. Penn, “From isovists to visibility graphs,” Environment and Planning B, vol. 28, no. 1, pp. 103-121, 2001. ↩︎
D. Grierson and S. Khajehpour, “Method for conceptual design applied to office buildings,” Journal of Computing in Civil Engineering, vol. 16, no. 2, pp. 83-103, 2002; P. Merrell, E. Schkufza, and V. Koltun, “Computer-generated residential building layouts,” ACM Transactions on Graphics, vol. 29, no. 6, art. 181, 2010. ↩︎
Australian Building Codes Board, National Construction Code Volume 2: ABCB Housing Provisions, NCC 2022. The thirteen-Section sweep (Sections 1 through 13, with the per-Section findings recorded) is reproduced and archived at experiments/ch8-cw03-ncc-volume2-sweep/sweep_report.md. The earlier spot-check had used the legacy NCC 2019 numbering, reconciled to the NCC 2022 Sections in the debt register. ↩︎
A. Rapoport, House Form and Culture. Englewood Cliffs, NJ: Prentice-Hall, 1969. ↩︎
The realised-pair matrix couplingmatrixrealisedpairs.csv carries only pairs of connection weight at least two; the stable-core survival check at experiments/ch8-cw04-stable-core-survival/survivalreport.md reads zero co-presence for the never-realised dispreferred pairs precisely because they leave no row in that weight-positive matrix, which is a missing co-presence denominator rather than a measured zero. ↩︎
S. W. Golomb, Polyominoes: Puzzles, Patterns, Problems, and Packings, 2nd ed. Princeton University Press, 1994. ↩︎
D. A. Klarner, “Cell growth problems,” Canadian Journal of Mathematics, vol. 17, pp. 851-863, 1965; D. H. Redelmeier, “Counting polyominoes: yet another attack,” Discrete Mathematics, vol. 36, no. 2, pp. 191-203, 1981. ↩︎
W. Burnside, Theory of Groups of Finite Order. Cambridge University Press, 1897. ↩︎
G. Stiny and J. Gips, “Shape grammars and the generative specification of painting and sculpture,” Information Processing 71, pp. 1460-1465, 1972. ↩︎
G. Stiny, “Kindergarten grammars,” Environment and Planning B, vol. 7, no. 4, pp. 409-462, 1980. ↩︎
N. Chomsky, “Three models for the description of language,” IRE Transactions on Information Theory, vol. 2, no. 3, pp. 113-124, 1956; J. E. Hopcroft, R. Motwani, and J. D. Ullman, Introduction to Automata Theory, Languages, and Computation, 3rd ed. Boston: Pearson Addison-Wesley, 2007. ↩︎
CL-8-08 @v1, claim bank v0.1: of the six probes, four are testable and none returns INCOHERENT; two are untestable. ↩︎
All ten claims, with version, status, confidence, verdict-loops completed, and scope-limit fields, are recorded in the claim bank (v0.1). ↩︎
All figures in this subsection are descriptive over the 745-plan geometry census (7,287 dimensioned rooms; 3,655 interior adjacencies with both endpoints dimensioned); full derivation in the floor-plan geometry appendix bundle. The corpus is a complete enumeration, not a sample, so no sampling variability attaches and the figures are reported as plain census quantities. ↩︎
Floor-plan geometry appendix bundle, kit-of-parts metrics; descriptive over the census. ↩︎
A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75-105, 2004; R. J. Wieringa, Design Science Methodology for Information Systems and Software Engineering. Berlin: Springer, 2014, ch. 12 and ch. 18. The three dimensions also map, interpretively, onto Goodman’s notational requirements of character-distinctiveness, syntactic disjointness, and semantic correspondence (the adaptation to an empirical setting being the author’s). ↩︎
CL-8-09 @v1, claim bank; the verdict-loop record (V-02 Section 3.3 and V-03 Section 4.5 verdict aggregations). ↩︎
R. J. Wieringa, Design Science Methodology for Information Systems and Software Engineering. Berlin: Springer, 2014, ch. 12 and ch. 18. ↩︎
The Specialist Disability Accommodation and general-market geometric contrast is reported in full, with its rigour audit (independent re-derivation, confound triangulation across collection wave and recovered state, cross-source validation against the original listing metadata, and a placebo control) in the supplementary Specialist Disability Accommodation and general-market geometric-contrast analysis held with the thesis’s reproducibility materials. It is descriptive over the sealed 745-plan census (86 Specialist Disability Accommodation, 659 general-market) and is advanced as an exploratory illumination rather than a load-bearing claim; its confounds are disclosed there. ↩︎
D. A. Klarner, “Cell growth problems,” Canadian Journal of Mathematics, vol. 17, pp. 851-863, 1965. ↩︎
The 2026-05-03 stable-core survival check is reproduced at experiments/ch8-cw04-stable-core-survival/survival_report.md; it is retained as the source that exposed the missing co-presence denominator, the dispreferred pairs reading a zero there because they are absent from the weight-positive realised-pair matrix rather than measured at zero, not as a confirmation of any forbidden adjacency. ↩︎

Evidence from a Census of Australian Floor Plans

8.2 The problem: bridging theoretical commitment and empirical grounding

8.3 What the substrate establishes: one compound claim, ten instantiations

8.4 The research question and its four-pipeline answer

8.6 Why theoretical specification is not enough

8.7 What an empirical substrate is, and what it contributes

8.8 Four questions, four pipelines: the triangulation strategy

8.9 The substrate’s position in the study’s arc

8.11 Why a corpus, and why this one

8.12 The methodology pivot: agent-manual over API extraction

8.13 The ten-principle calibration ruleset

8.15 The five-stage extraction pipeline

8.16 The August-plus-October merge

8.17 Sealing and provenance

8.18 Residual scope-limit declarations

8.20 The dimensional question: a metric grid and its standards

8.21 The module sweep and the 50 mm result

8.22 Robustness: cohort boundary and matching tolerance

8.23 Where the grid holds: structure across product tiers

8.24 The micro grain and the three-tier grid

8.25 Handoff to the frequency question

8.27 What the occurrence pipeline measures

8.28 The 72-category frequency distribution

8.29 Opening types and the topological aggregate

8.30 The frequency-threshold derivation

8.31 Stability of the threshold partition

8.32 The meso-tier module synthesis

8.34 The topological question: access graphs and coupling

8.35 The circulation hallway as organising hub

8.36 The coupling partition: required and preferred adjacencies

8.37 Threshold sensitivity and the census read directly

8.38 Are the dispreferences regulated? The NCC sweep (CL-8-05)

8.40 The census coupling structure and sensitivity finding S5-3

8.41 Handoff to the configurational pipeline

8.43 The configurational question: polyomino grammar and enumeration

8.44 The feasibility rules and the symmetry group

8.45 Arrangements, packings, and the deduplication result

8.46 The three geometric indices

8.48 The partial grammar and the superset-language framing (CL-8-06)

8.49 Handoff to integration (HC-8D)

8.51 The integration stance: triangulation, not redundancy

8.52 Cross-pipeline coherence: the six probes

8.53 The ten claims as facets of one substrate

8.54 The four handoff contracts

8.54a The three-tier modular grid: micro, meso, macro

8.55 The merged-corpus coupling matrix

8.56 Scope limits and the forward to Chapter 9

8.58 Why three validation dimensions

8.59 Reproducibility: two of four pipelines verified

8.60 Stability: dimensional stable, topological secure on the interior

8.61 Coverage and the single-author limit

8.62 The verdict-loop record and the anti-fabrication posture

8.64 The function of declared limitations

8.65 External validity: what the substrate licenses

8.67 What the chapter constructed

8.68 The ten claims and their final status

8.69 The four handoff contracts and the open pathways

8.70 Design knowledge from the geometry read

8.71 The chapter’s position in the study

Notes