A Queryable Schema for Accessibility Standards
5.1 Introduction
The Specialist Disability Accommodation (SDA) Design Standard carries its accessibility requirements as dispersed regulatory prose, and that form is the source of its representational burden. Many valid statements coexist whose conceptual boundaries are uneven, and the cost of reusing them stays high whenever clauses must be compared, traced, and interpreted across contexts. The standardisation schema answers that burden by transforming the prose into a controlled representational schema: a representational intervention that fixes how accessibility knowledge is segmented, normalised, and carried forward without semantic drift into the artefacts built on it. The schema is not a paraphrase of the standard but a contract over it. It governs which structural distinctions are admissible, how each clause’s role and scope are declared, and how every unit retains a stable identity once the source text has been left behind.
An external examiner is entitled to ask why a thesis that frames housing adaptation as a complex adaptive system in Chapter 2 and Chapter 3 then opens its empirical programme with a static annotation schema for a single regulatory document. The answer is structural rather than evasive. The static schema is the condition of possibility for the dynamic, feedback-aware governance the framing demands, not its replacement. Dynamic governance requires stable semantic anchors against which trajectories, feedback loops, and adaptation episodes can be assessed; without governed identity for the entities being tracked, feedback reduces to noise and adaptation cannot be distinguished from re-interpretation. The scoping is therefore protective: attempting dynamic governance before resolving the ambiguity that pervades the regulatory substrate would yield a system that adapts fluently but unreliably, because the meaning of what is being adapted would shift with each interpretation cycle. The full set of complex-adaptive properties — feedback-responsive governance, emergent configuration, adaptive trajectory management — is carried by the artefact suite and tested through the integrated demonstration in Chapter 10, not by the standardisation schema alone.
In the platform-architecture vocabulary of Baldwin and Clark, the standardisation schema occupies the governed-kernel layer: the minimal set of persistent representational commitments — stable referential identity, shared semantic anchors, a controlled interface vocabulary — that must remain invariant while complements vary across adaptation episodes.1 This layering follows Simon’s near-decomposability principle, on which complex adaptive systems achieve tractable evolution precisely because slow-changing infrastructure supplies a stable substrate for fast-changing operational variables to manoeuvre against; absent the slow-changing substrate, the fast-changing layer cannot be identified, governed, or evaluated.2 On this reading, modular stability is not an end in itself; it is the precondition for the temporal adaptability the broader argument requires, and the artefact is one evidence-bearing instantiation of the single architectural commitment that Chapter 6 then builds into a working engine.
The schema is the first operational component of the representational governance architecture that Chapter 2 identifies as the thesis’s problem class. Each ambiguous clause generates interpretive variance that compounds across assessment, producing what Chapter 3 formalises as verification debt: deferred, duplicated, or repeated assurance work arising from fragmented representations.3 By resolving or surfacing ambiguity at source, the schema lets downstream artefacts inherit governed inputs rather than reconstruct interpretations from scratch. It also constrains the interpretation space, which addresses the bounded optionality problem of Chapter 2: effective design freedom is constrained by, not opposed to, governed representational structure, so the set of plausible interpretation routes should be smaller, clearer, and more locally actionable rather than proliferating through unconstrained recombination of ambiguous terms.
Under March and Smith’s artefact taxonomy, the standardisation schema is a model with secondary method properties.4 As a model it defines representational structures: five governed layers with field-level contracts that organise how entities, relations, and qualifiers in the SDA corpus are understood and related. As a method it supplies a reproducible five-stage serialisation pipeline that practitioners can follow to generate governed clause representations from regulatory source text. Read through Gregor and Jones’s anatomy of a design theory, the artefact resolves into its components: the constructs are the stratified entity vocabulary (seven primitives and seven composites), the ten relation operators, and the five schema layers; the form and function commit it to a layered contract in which each layer encapsulates a distinct concern and exposes only governed interfaces to its neighbours; the justificatory knowledge draws on Parnas’s information-hiding, Simon’s near-decomposability, Baldwin and Clark’s option value in modular architectures, and Pustejovsky’s generative account of systematic polysemy; and the testable propositions are stated as the five success criteria of Section 5.8.5 The primitive vocabulary and relation set are derived in Chapter 6, not here.
That position is not self-contained. The standardisation schema supplies the governed accessibility substrate that the suite consumes: it stabilises concepts and clause roles so that the Governed Kernel Architecture (Chapter 6) can define reusable modules without inheriting unresolved lexical variance, after which the notation formalises a grammar over those modules and the generator packages procedural generation that depends on the prior contracts. Without the standardisation schema the downstream artefacts would encode ambiguity rather than reduce it; with it in place, they can concentrate on composition, notation, and governed reuse under declared assumptions. Figure 5.18 sets out the dependency chain and the handoff contracts that govern each transition.
Figure 5.18 — Artefact suite dependency chain with handoff contracts Four-artefact dependency chain showing how the standardisation schema (Chapter 5) provides the semantic substrate consumed by the Governed Kernel Architecture (Chapter 6), the notation (Chapter 7), and the generator (Chapter 9). Each handoff is governed by explicit contracts: primitive stability, role preservation, residual inheritance, ambiguity profile carry-forward, and provenance completeness. Downstream artefacts may reorganise module-level composition but must not redefine upstream primitive meanings, erase residual flags, or collapse role distinctions.
Three definitions are fixed here and used consistently for the rest of the chapter. Ambiguity is the observed plurality of plausible interpretation under controlled reading conditions, not stylistic complexity. Serialised text is governed clause representation with stable identifiers and recoverable provenance, not a rhetorical restatement of source prose. A standardisation schema is a representational contract that constrains permissible structure and term-role assignment across clauses and categories. The artefact claim follows from these. A governed schema reduces interpretive variance by converting diffuse clause content into stable representational units with explicit identity, scope, and relational anchoring. Here reduce does not mean eliminating all ambiguity; it means reducing ambiguity to a traceable residual that declared criteria can adjudicate, rather than implicit judgement. The artefact is accordingly assessed against clarity, traceability, and coverage conditions, not against an unrealistic expectation of total semantic closure in regulatory prose.
The artefact also discharges three of the four design constraints that Chapter 3 absorbs from rival theoretical traditions. It addresses DC-2 (institutional compatibility) directly, operating on the existing SDA Design Standard without requiring institutional reform. It addresses DC-3 (revisability) through an iterative design process and a residual-governance mechanism that preserves the capacity to refine representations as the standard or its reading evolves. It addresses DC-4 (tractability) through a constrained field structure and a confidence-labelled transparency mode that reduce rather than increase the adjudication burden a practitioner must carry. DC-1 (political negotiability) falls outside the artefact’s scope: it concerns a system-level property dependent on how the suite interacts with governance actors, and is taken up in the integrated evaluation of Chapter 10.
These commitments unfold as a staged argument across the modules listed above. Section 5.2 positions the artefact within the specialist literature on standards formalisation and lexical ambiguity. Section 5.3 sets out schema design theory, the SDA domain, and the contribution boundary. Section 5.4 declares objectives, corpus, and analytical methods, and Section 5.8 fixes the evaluation framework, the success criteria, and the design-iteration evidence. Section 5.11 establishes the empirical baseline of ambiguity and fragmentation, before Sections 5.12 and 5.13 specify the schema as a five-layer contract with its clause-representation fields, inventories, and residual governance. Section 5.14 describes the five-stage serialisation pipeline, Section 5.20 reports the evaluation results — resolution rate, deontic-force reframing, and schema comparison — and Section 5.25 consolidates the contributions and defines the handoff to Chapter 6. Primitive-level validation evidence, including holdout generalisation, corpus ablation, and IFC4 and BOT benchmarking, is reported in Chapter 6 where it supports the derivation of the primitives, as descriptive evidence over the corpus rather than an inferential claim on the present census. On that basis, Section 5.2 now positions the artefact within the literature that bears on its design decisions.
5.2 Artefact-Specific Literature: Standards Formalisation and Lexical Ambiguity
Two specialist literatures bear directly on the design decisions behind the standardisation schema. The first is the standards-formalisation lineage that has worked for more than fifteen years to render building regulation computable. The second is the word-sense disambiguation tradition, which explains why the lexical ambiguity in such regulation is systematic rather than incidental. The first establishes where the manual bottleneck in automated compliance checking sits. The second establishes how the burden of that bottleneck can be measured rather than assumed. Read together, they locate the design science research contribution developed here. That contribution is a confidence-labelled governance layer. It extends, rather than displaces, what these lineages have built.
Standards Formalisation and Regulatory Transformation
Automated compliance checking in the built environment has matured on a stable foundation: Eastman et al.’s four-stage framework — rule interpretation, model preparation, rule execution, and rule reporting.6 Surveying five deployed systems, they established rule interpretation as the critical manual bottleneck of the pipeline. Rule interpretation is the conversion of regulatory natural language into computable form. In those systems, rules are hard-coded or semi-automatically translated. The interpretive expert knowledge the conversion requires is neither recorded nor tracked. That diagnosis has organised the field ever since. The work that follows reads as a sequence of increasingly formal attempts to relieve the same bottleneck.
Hjelseth’s RASE methodology supplies the most direct antecedent for the standardisation schema’s clause-level mark-up. RASE — Requirement, Applicability, Selection, Exception — gives a four-operator scheme for decomposing regulatory text into clause-role categories. His Tx3 classification (Transcribe, Transform, Transfer) registers that some regulatory content resists direct transcription.7 RASE establishes the clause role. The standardisation schema’s clause_class field inherits that move and extends it. It adds clause modality (shall, should, may), a semantic-confidence label, and an explicit record of residual ambiguity. These are the dimensions RASE and Tx3 leave structural rather than governed. The extension is additive: it presupposes Hjelseth’s role categorisation and layers governance on top of it.
Solihin contributes the structural analogue for the standardisation schema’s field-level type contract. His BIMRL relational schema, paired with a four-class rule-complexity taxonomy, runs from explicit single-data rules through to proof-of-solution rules with multiple acceptable answers.89 The taxonomy classifies computational complexity, and its top class frames multiple valid outcomes as an implementation matter. The standardisation schema borrows the field-level contract design intact. It then reorients that contract from query efficiency toward semantic governance. What the contract records becomes interpretive uncertainty rather than execution cost.
Two formally explicit approaches mark the boundaries of the design space. Dimyadi and Amor’s LegalRuleML encodes regulatory rules with deontic modalities — obligations, permissions, prohibitions — and is the most structurally formal of the early proposals.10 It proceeds on the working assumption that regulation can be fully formalised in logic. The SDA corpus complicates that assumption, since a measurable share of its terms carries genuine polysemy. Zhang and El-Gohary’s NLP-and-logic system reaches 98.7% recall and 87.6% precision on concrete code provisions.11 That result shows how far automated interpretation can go on well-specified text. The system returns a binary compliance verdict, and the interpretive confidence behind that verdict is not itself part of the output.
Read across this lineage, ambiguity is handled in five characteristic modes. Three of them treat it as something to be removed or worked around. In the first, ambiguity is invisible, and formalised text is taken to be unambiguous.12 In the second, it is noise to eliminate, and the aim is to formalise it away.13 In the third, it is implementation difficulty, and interpretive openness is acknowledged as a barrier.14 The remaining two engage ambiguity more directly. The expert-resolved mode has domain experts settle interpretive difficulty case by case.15 The diagnosed mode identifies and classifies ambiguity as a phenomenon in the regulatory text.16 Each mode does productive work. The diagnostic mode comes nearest to treating ambiguity as a property worth recording. The standardisation schema adds a sixth, governance mode. It carries residual ambiguity forward as a tracked, confidence-labelled design object, rather than discharging it at interpretation time. Table 5.10 sets the modes out along this progression from invisible to governed, and locates the schema’s contribution as the governance step that follows diagnosis.
| Treatment of ambiguity | Representative works | Mechanism | Governance |
|---|---|---|---|
| Invisible | Eastman et al. 2009; Zhong et al. 2012; Tan et al. 2010 | Formalised text taken to be unambiguous | None |
| Noise to eliminate | Nawari 2012; Malsane et al. 2015 | Better formalisation removes ambiguity | None |
| Implementation difficulty | Solihin and Eastman 2015; Hjelseth 2015 | Interpretive openness acknowledged but not tracked | Acknowledged |
| Expert-resolved | Beach et al. 2015 | Experts settle difficulty case by case; resolution unrecorded | Implicit |
| Diagnosed | Zhang, Ma, and Nisbet 2023 | Ambiguity identified and classified (53% in healthcare building); no governance provided | Diagnostic |
| Governed (this thesis) | The standardisation schema (Chapter 5) | Confidence-labelled tracking; field-level residual governance; dual-mode resolution and transparency | Governed |
Zhang, Ma, and Nisbet’s inductive analysis of Health Building Notes is the closest existing work to the standardisation schema’s concern. They find 53% of building requirements ambiguous. They also distinguish intentional ambiguity — regulators’ deliberate flexibility — from unintentional ambiguity, which arises from language use, tacit knowledge, or the limits of automated checking. Their contribution is diagnostic: it identifies and classifies ambiguity. It thereby points toward the prescriptive, confidence-labelled, field-level schema that the standardisation schema supplies. Noardo et al.’s review of 169 Digital Building Permit publications corroborates the shape of the field.17 Its centre of gravity is checking technology, and the governance of interpretive uncertainty remains a comparatively open avenue. The standardisation schema sits on the path from diagnosis to governance. It adapts Hjelseth’s clause-role classification, extends Solihin’s field-level contract, and operationalises the prescriptive schema that Zhang et al.’s analysis anticipates.
Word-Sense Disambiguation and Lexical Ambiguity in Technical Corpora
The polysemy-measurement protocol specified in Section 5.4 rests on three traditions in word-sense disambiguation. Navigli’s survey supplies the foundational taxonomy. It separates supervised methods that need sense-annotated training data, unsupervised methods built on clustering and co-occurrence graphs, and knowledge-based methods that exploit dictionary structure.18 He documents that polysemous lemmas in WordNet average roughly 6.2 senses across benchmark datasets, against a general-English polysemy baseline near 40%. Lexical ambiguity in English is therefore pervasive rather than exceptional. Pustejovsky’s generative-lexicon theory then explains why a regulatory corpus should sit above that baseline.19 Polysemy is a systematic, qualia-governed productivity, and terms with rich qualia structures generate new senses through predictable mechanisms. Regulatory, spatial, and accessibility functions layer onto the same vocabulary here. This is precisely the setting in which the theory predicts elevated rates. Its distinction between logical and contrastive polysemy frames domain-central terms as systematically rather than randomly plural.
The evaluation tradition shapes how that polysemy should be recorded. Ide and Veronis establish that WSD evaluation is intrinsically difficult, because there is no universal agreement on what a “correct” sense assignment is.20 Some disagreement is therefore a property of the task rather than a defect of method. Kilgarriff presses the point further. He argues that word senses are abstractions over clusters of corpus citations rather than fixed ontological entities — task-relative, not pre-existing.21 On our reading, this licenses a graded confidence measure over a binary polysemous/monosemous verdict. If senses are not discrete, cross-method convergence is a more defensible evidential basis than any single dictionary inventory. The protocol treats convergence accordingly.
The embedding-dispersion component inherits a parallel rationale. Erk represents word meaning as a region in vector space rather than a single point, which enables graded sense membership.22 Camacho-Collados and Pilehvar then characterise the “meaning conflation deficiency” of standard embeddings, establishing dispersion as a recognised proxy for polysemy.23 Method choice follows Agirre and Edmonds’ principle that corpus characteristics and the availability of annotated resources should govern selection.24 Raganato et al. confirm that knowledge-based approaches are the appropriate fallback for unannotated domains.25 That is the situation of the SDA corpus, which has no pre-existing sense-annotated data. General-corpus embeddings are limited on domain-specific distinctions, so the protocol triangulates rather than relying on dispersion alone.26
These findings make the schema’s transparency mode — recording polysemy as a tracked object rather than forcing its resolution — a principled commitment rather than a methodological compromise. Kilgarriff’s account implies that forced sense assignment introduces artificial precision. Haber and Poesio show that polysemy operates along a spectrum, with contextualised language models capturing gradations earlier methods missed.27 Together these make confidence-labelled transparency the more epistemically honest response. Brun and Langseth’s review of uncertainty management in digital building-permit processing reinforces the choice. It treats interpretive uncertainty as a first-class object that yields more stable downstream decisions than premature resolution.28 The standardisation schema therefore runs a dual mode: resolution where a representational contract suffices, and transparency where external adjudication is required. That mode takes its warrant from the WSD literature’s own account of what lexical meaning is.
5.3 Schema Design Theory, the SDA Domain, and Contribution Positioning
Two further warrants underwrite the artefact, and they pull in different directions. One is design-theoretic. The five-layer standardisation schema needs a justification that the empirical resolution metrics cannot supply on their own, because two of its layers will turn out to contribute no measurable disambiguation at all. The other is domain-practical. The corpus the schema governs sits inside a regulatory instrument of direct financial and welfare consequence, which is what makes clause-level precision worth engineering. Taken together with the standards-formalisation and word-sense lineages of Section 5.2, these two warrants fix the contribution space that the standardisation schema occupies as one evidence-bearing instantiation of the thesis’s architectural claim.
Schema Design Theory: Information Hiding, Layered Architecture, and Provenance
The five-layer structure specified in Section 5.12 asks for justification beyond its evaluation results. The pointed case is Layers 4 and 5 — ontological mapping and governance audit — which contribute zero resolution delta. Why retain layers that add no measurable disambiguation? Parnas’s information-hiding criterion gives the first part of the answer.29 A system should be decomposed by the design decisions likely to change, not by its processing stages, with each module hiding one such decision behind a stable interface. Layers 4 and 5 encapsulate governance decisions — how provenance is traced, how ontological residuals are flagged — that are logically independent of the resolution decisions of Layers 1 to 3. Separating them is what keeps a change of governance policy from rippling into the validated resolution pipeline.
Simon’s near-decomposability principle supplies the second part.30 In a well-decomposed system, intra-component interactions are strong and inter-component interactions weak. Layers 1 to 3 answer to semantic-resolution pressures; Layers 4 and 5 answer to governance pressures. On this reading a zero cross-layer resolution delta is the expected signature of near decomposability, not a symptom of redundancy. The value of retaining the layers is then an option value. Baldwin and Clark show that modularity confers it: each module boundary enables independent experimentation, replacement, and evolution without system-wide redesign.31 Sullivan et al. formalise the same intuition as the real-options value inherent in information-hiding boundaries — the right, without the obligation, to change a module later, even where its present computational contribution is nil.32 Layers 4 and 5 hold that value open for future ontological interoperability and for regulatory-audit requirements.
The provenance layer carries a further, non-negotiable warrant. Buneman, Khanna, and Tan show that data provenance — the record of where data came from and how it was derived — has to be a first-class architectural element embedded at design time, because it cannot be reconstructed after the fact from results alone.33 The W3C PROV Data Model reinforces the point by treating provenance as a separate formal layer rather than as metadata bolted onto an existing schema.34 For the built-environment setting, ISO 19650-1 mandates traceability from requirements through delivery — a compliance obligation that Layer 5 is built to satisfy directly.35
Layered architecture itself has deep precedent in both systems design and knowledge representation. Dijkstra’s THE multiprogramming system showed that strict layering makes reasoning about system correctness tractable, each layer serving the one above without knowledge of higher-layer function.36 Sowa’s conceptual structures establish that knowledge-representation systems need layers separating distinct levels of meaning, from primitive semantic elements to pragmatic and contextual frames.37 Buschmann et al.’s Layers pattern prescribes that boundaries follow concern boundaries rather than metric-contribution boundaries; a layer earns its place by encapsulating responsibilities that differ in kind from those around it.38 Within the design science research frame adopted here, Gregor and Jones’s anatomy of a design theory supplies the meta-theoretical warrant.39 The resolution delta measures the performance of the testable propositions; it does not measure the adequacy of justificatory knowledge. Layers 4 and 5 are justified by kernel theories of governance and provenance, not by resolution metrics, and without them the design theory would lack the structural elements its own commitments require.
Domain Context: Specialist Disability Accommodation
The 611-clause corpus analysed here operates inside a policy ecosystem where a single clause reading carries direct financial and welfare consequence. The NDIS Specialist Disability Accommodation Design Standard was published in October 2019 and became mandatory for all new-build and new-build-refurbished SDA from 1 July 2021, replacing the former reliance on the Livable Housing Design Guidelines.40 It organises 25 design elements across four design categories — Improved Liveability, Fully Accessible, High Physical Support, and Robust — and compliance must be certified by an accredited third-party assessor independent of the provider, developer, or owner.41 That third-party-certification requirement is precisely what turns interpretive consistency across assessors into a structural requirement of the regulatory architecture rather than a matter of editorial preference.
The system these clauses govern is substantial. As of March 2025, roughly 15,099 participants actively use SDA across 11,360 enrolled dwellings, with total SDA supports expenditure reaching $592 million annually and supply growing at 21% each year.42 At full scheme rollout, approximately 28,000 participants — around 6% of the total — are expected to be eligible. Each clause in the Standard functions as a compliance gate: read one way, a clause may qualify a dwelling for a higher-funded design category; read another, it may disqualify it. The financial consequence of interpretation variance is therefore structural rather than hypothetical, and it attaches to the same terms whose polysemy Section 5.11 measures.
That precision is not an abstract concern, because SDA design quality has measurable downstream effects on tenants. Douglas et al.’s longitudinal study of adults moving into purpose-built SDA reports statistically significant improvements in wellbeing and community integration, both at a large effect, together with a trend toward reduced support hours.43 Ambiguous or inconsistently read clauses put exactly these outcomes at risk. The Standard exists in the first place because voluntary universal-design approaches did not take hold: Bringolf documents a mass-market housing industry unwilling to adopt universal design and inclined to frame accessible housing as specialised rather than mainstream, and Beer and Faulkner find that 22% of Australian households include a member affected by disability, with distinctive housing-career disruptions that follow.4445 What makes the instrument internationally distinctive is the coupling itself: it ties clause-level compliance directly to individualised disability funding through mandatory third-party certification, where comparators such as ISO 21542, BS 8300, and the ADA Standards operate instead through professional discretion or civil-rights enforcement.464748 That the ambiguity has practical bite is documented within the assessment community itself: Access Institute clarification bulletins record specific clauses on which accredited assessors reached divergent readings and that required formal remediation, and Morris Goding Access Consulting notes that ambiguity in the related provisions is generally acknowledged among practitioners.495051
Positioning Summary
The standardisation schema sits at the intersection of the four streams, and its contribution is best read as what that intersection adds. From standards formalisation it inherits Hjelseth’s clause-role classification and Solihin’s field-level contract, and it extends both with modality profiling and confidence-labelled ambiguity governance. From word-sense disambiguation it takes Navigli’s multi-method principle and Pustejovsky’s account of systematic polysemy, and it applies them to measure rather than assume the ambiguity burden of a technical regulatory corpus, generating domain-specific polysemy data where none was available before. From schema design theory it draws Parnas’s information hiding, Simon’s near-decomposability, and Baldwin and Clark’s option value to justify a five-layer architecture in which zero-delta governance layers serve an architectural purpose distinct from their resolution contribution. From the SDA domain it derives both its empirical substrate and its practical urgency, in a regulatory instrument governing $592 million in annual expenditure where clause-level variance carries financial and welfare consequence. The distinctive move is the combination: clause-role governance joined to confidence-labelled polysemy measurement, layered schema design joined to explicit residual tracking, and the whole applied to accessibility standards. Zhang, Ma, and Nisbet’s 53% ambiguity finding in healthcare building requirements is the closest diagnostic neighbour, and the standardisation schema supplies the prescriptive standardisation schema their analysis anticipates. Figure 5.14 places the artefact at this four-stream intersection.
Figure 5.14 — Literature Positioning: Four-Stream Intersection of the Standardisation Schema The standardisation schema is positioned at the intersection of four literature streams. Standards formalisation contributes clause-role classification and field-level contract design (Hjelseth, 2015; Solihin, 2015). WSD research contributes multi-method polysemy measurement and confidence-tier design (Navigli, 2009; Pustejovsky, 1995). Schema design theory contributes information hiding, near-decomposability, and option value justification for the five-layer architecture (Parnas, 1972; Simon, 1962; Baldwin and Clark, 2000). SDA domain context provides the empirical substrate and practical urgency of a $592M annual regulatory instrument. The standardisation schema’s distinctive move is the combination of all four streams. Data from Section 5.2.5.
An independent line of evidence corroborates this positioning from outside the artefact’s own design process. A systematic analysis (EXP-X1–X8) of 134 floor-plan representation publications spanning 1982 to 2026, screened for explicit treatment of accessibility concepts, finds that 15 of the 134 address accessibility at all, and it locates this governed accessibility schema in a region of the design space that the surveyed corpus does not occupy at the semantic layer. The same analysis confirms that 13 of the 14 entries in the thesis’s stratified vocabulary (seven primitives and seven composites) have supporting mentions across the surveyed publications, which grounds the vocabulary in external literature beyond the single SDA corpus from which it was derived. The result register and package inventory for this cross-check are reported in the Supplementary Ch5 Results and Data Package. On this footing, Section 5.4 specifies the artefact objectives, the corpus, and the analytical methods that operationalise these commitments.
5.4 Artefact Objectives, Corpus, and Analytical Methods
Chapter 4 establishes design science research as the thesis-level methodology and sets the evaluation strategy for the artefact suite as a whole. That strategy sets the paradigm; it does not fix the analytical instruments, the data, or the success thresholds for any single artefact. Those commitments must be declared at the level of the artefact before its evidence can be read. We therefore operationalise the Chapter 4 methodology for the standardisation schema here: declaring the objectives the schema must meet, the corpus it is built against, and the methods that measure ambiguity, ahead of the empirical baseline in Section 5.11.52
Table 5.1 carries this through measure by measure, mapping each Chapter 4 abstract specification to the concrete instrument that discharges it in this chapter, the section that reports it, and its disposition. Read together, the rows make explicit which evaluation commitments the standardisation schema settles within its own chapter and which it hands forward.
| Chapter 4 Measure | Chapter 5 Instrument | Section | Disposition |
|---|---|---|---|
| EM-4W-01 (Interpretation divergence) | Five-dimensional resolution rate (δ̄) profiler, scored per clause | Section 5.20 | Met — δ̄ = 0.2514, with class-level variation reported |
| EM-4W-02 (Verification-scope ratio) | Governed-residual tracking (residual flag, fallback count) | Section 5.25 | Partially met — residual governance demonstrated; scope ratio handed to Chapter 10 |
| EM-4W-03 (Rule-compliant variant rate) | — | — | Deferred to Chapter 10 (requires the multi-artefact suite) |
| EM-09-01 (Standards interpretability) | — | — | Deferred to Chapter 10 (summative; threshold ≥ 0.25 inherited from Section 4.4) |
| EM-09-02 (Modular-fit / bounded-check) | — | — | Deferred to Chapter 10 (summative) |
| EM-09-03 (Workflow burden delta) | — | — | Deferred to Chapter 10 (summative) |
| EM-09-04 (Exception governance quality) | — | — | Deferred to Chapter 10 (summative) |
The pattern is deliberate. The standardisation schema discharges the two within-chapter formative measures — interpretation divergence through the δ̄ profiler, verification scope through governed-residual tracking — while the four summative measures (EM-09-01 through EM-09-04) are scoped, by design, to the integrated evaluation in Chapter 10, where the artefacts can be assessed together rather than in isolation. This two-stage shape, formative within each artefact chapter and summative in the demonstration chapter, follows the evaluation strategy declared in Chapter 4 (Section 4.5).
5.5 Schema Objectives
The research objective here is to construct and validate a standardisation schema that improves the conceptual stability of Specialist Disability Accommodation (SDA) Design Standard content by enforcing consistent representation of entities, qualifiers, and clause-level role signals. This single objective is the schema’s, not the thesis’s: the standardisation schema is one evidence-bearing instantiation of the thesis’s architectural claim, and the four sub-objectives below decompose that one schema objective rather than name four independent contributions.
- Identity stabilisation — resolve referential instability in clause identifiers so that each distinct statement carries a globally unique record identity.
- Role preservation — enforce explicit clause-role classification so that requirement, rationale, and applicability frames retain their inferential force when carried into downstream artefacts.
- Ambiguity governance — represent lexical ambiguity as a tracked, confidence-labelled design object rather than suppressing or ignoring it during standardisation.
- Coverage with residual transparency — achieve complete clause coverage while explicitly declaring residual mapping burden, so that the schema never claims false semantic closure.
These are operational targets, not aspirations. Each binds at one end to a measurable baseline condition established in Section 5.11 — the ambiguity the corpus actually exhibits — and at the other to a testable acceptance criterion defined in Section 5.8. The decomposition matters because the four targets pull in different directions: identity stabilisation and role preservation are eliminable defects the schema can close, whereas ambiguity governance and residual transparency are properties the schema can only make visible and auditable. Holding both kinds under one objective is what keeps the artefact honest about where it resolves and where it merely discloses.
5.6 Corpus and Source Material
The source material for the standardisation schema is the SDA Design Standard, the regulatory instrument that sets minimum design requirements for specialist disability accommodation under the Australian National Disability Insurance Scheme. The National Disability Insurance Agency published it in October 2019, and it became mandatory for all new-build and new-build-refurbished SDA from 1 July 2021.53 We define the corpus as the complete serialised text of this standard: 611 clause records across 25 design categories — among them general requirements, external access, internal circulation, windows, sanitary facilities, kitchen and laundry, bedroom and living, controls and finishes, vertical circulation, and support systems — together with 189 design requirements extracted from 19 normative figures that fix dimensional constraints for spatial configurations.54
A controlled extraction pipeline assembled the text corpus under four governing properties. The source was Version 1 of the NDIS SDA Design Standard, the sole mandatory standard for SDA compliance certification. Extraction proceeded by clause-level decomposition, preserving each original clause identifier, the section hierarchy, and design-category scope, so that every clause survives as a complete statement with its source identifier, text, and applicability markers. Scope admitted all normative text — design requirements, rationale, applicability declarations, and supporting notes — and excluded only non-normative front matter such as the table of contents, cover page, and version history. Integrity was audited at the extraction gate, which reports zero missing required keys, zero truncation markers under the configured heuristics, and zero malformed clause headers; independent cross-checking against the source structure confirms that all 25 design categories are represented. The audit also surfaced eight duplicate clause identifiers carrying divergent text payloads. We retained these as separate records and treated the collision as a mandatory design constraint for the schema rather than an error to be quietly merged.
The figures corpus was built through a parallel pipeline over all 19 normative figures, each decomposed into individual design requirements by dimensional-value extraction — minimum clear widths, turning-circle dimensions, grab-rail heights, and the like — yielding the 189 figure-derived requirements. This channel extends rather than duplicates the text: 75.7% of figure-derived requirements have no textual equivalent in the clause corpus, which confirms that the figures carry independent normative content and that the corpus is genuinely two-channel. We treat the whole as a closed corpus for this chapter. The schema is designed against this specific regulatory text; generalisation to other standards instruments is a boundary condition, taken up as future work in Section 5.25 and Chapter 11. Because the corpus is the complete standard rather than a sample drawn from a larger population, the figures reported throughout this chapter are census quantities with explicit denominators, not estimates carrying sampling variability.
5.7 Analytical Methods
We employ six analytical methods, each targeting a distinct dimension of ambiguity, so that the four sub-objectives are tested separately rather than collapsed into a single composite score.
These six methods — integrity-gate analysis (referential identity), the multi-method polysemy protocol (lexical plurality), trigram and part-of-speech structural analysis (clause-role signalling), deontic-force classification (normative-force variation), foundational-mapping coverage (representational completeness), and figures-channel integration (cross-channel completeness) — are set out with their instruments and outputs in the Chapter 5 Reference Tables appendix, Table 1.
The multi-method polysemy protocol warrants closer description, since it is the primary instrument for the lexical-ambiguity baseline. Three independent methods are applied to each lemma at a frequency of three or more occurrences. The WordNet inventory check marks a lemma polysemous when its synset count exceeds one, capturing the lexicographic dimension against a standardised knowledge base. The Lesk check marks a lemma polysemous when simplified Lesk sense assignment, scored against WordNet glosses, varies across the lemma’s occurrences; we note the known limitation that general-domain glosses fit built-environment regulatory prose imperfectly, and report in Section 5.11 that the structural conclusion holds when the Lesk component is removed. The embedding-context dispersion check generates token-level vectors for each occurrence (spaCy’s en_core_web_md 300-dimensional GloVe model), measures dispersion as the mean pairwise cosine distance across those vectors, and marks a lemma polysemous when dispersion exceeds 0.30 and agglomerative clustering with average linkage at a distance of 0.35 returns more than one cluster.
The three thresholds — frequency of at least three, dispersion above 0.30, cluster distance of 0.35 — are transparent design decisions calibrated against the corpus-wide distribution, and the baseline analysis in Section 5.11 reports their robustness as plain counts at adjacent cut-offs rather than as an interval claim. Agreement across the three methods then assigns a confidence tier: strong where all three agree, moderate where two of three agree, and weak where one does. This tiering guards against over-claiming from any single method and converts ambiguity from an impression into a testable evidence class. On the resulting census, 121 of 204 eligible lemmas — 59.3% — are polysemous, the descriptive proportion reported in Section 5.11.55 Where the analysis later turns to whether primitive co-occurrence is structured rather than incidental, that question is one of co-occurrence structure and is taken up in the modularity argument of Chapter 6, not adjudicated here.
5.5 Evaluation Framework, Success Criteria, and DSR Iteration Evidence
Evaluation of a conceptual schema demands a method matched to the kind of object under test, and we adopt the artificial-evaluation strategy of Venable, Pries-Heje, and Baskerville’s FEDS framework for design science research.56 Artificial evaluation interrogates an artefact’s properties through analytical methods applied to the design specification and the outputs it produces, rather than through naturalistic deployment with human participants. That posture fits a standardisation schema at the present stage of development: the design commitments are testable directly against the corpus they govern, before any field study is warranted. It also fixes the artefact’s place in the thesis’s two-stage evaluation pattern, in which the formative measures are discharged within the artefact chapters and the summative measures are reserved for the integrated demonstration of Chapter 10; the present chapter therefore tests construction quality, not system-level outcome. Naturalistic evaluation drawing on practitioner judgement is positioned as priority future work in Chapter 11, not as a gap in the present argument. The framework declares three things to test — the schema’s measured effect on ambiguity, the robustness of that effect under varied design decisions, and its external grounding — each addressed in turn below and reported against in Section 5.20.
The schema’s measured effect is captured by the resolution rate, defined as the proportion of the ambiguity dimensions present in a clause that the schema resolves, averaged across all clauses. The measure is read in two modes, and the distinction between them is the core of the contribution claim. In resolution mode, the schema eliminates the ambiguity source outright, as it does for referential identity, structural role, and deontic-force collapse; the relevant evidence is an elimination rate. In transparency mode, the schema does not remove the ambiguity but renders it visible under confidence labels, as it does for lexical polysemy and mapping residuals; the relevant evidence is whether previously hidden variance is now auditable and actionable. On our reading the schema therefore claims measurable resolution where representational contracts suffice, and governed transparency where external adjudication is required — not semantic closure. Governed transparency is the schema’s commitment to carry residual ambiguity as a first-class design object rather than suppress it, and this dual-mode boundary is the definition the rest of the artefact suite inherits: the baseline of Section 5.11 measures against it, the results of Section 5.20 report against it, and the synthesis of Chapter 11 returns to it.
The robustness checks test whether the measured effect is non-trivial rather than an artefact of generous design decisions, and they are framed as design-decision sensitivity rather than as inferential testing, because the SDA corpus is a complete census of the serialised source and not a sample from a wider population. Two comparisons do the central work descriptively. A null-model comparison requires the full-schema delta to exceed an identity-only baseline by more than 0.05, establishing that the layered contract contributes beyond trivial record-numbering. A reduced-schema comparison then removes each retained layer in turn and reports its marginal contribution, so that no layer is carried without evidence that it earns its place. Two further checks examine generalisation of the primitive mappings within the corpus and the co-occurrence structure among primitives; the corresponding primitive-level validation — holdout generalisation across corpus partitions and a co-occurrence check against reshuffled assignments — is reported in Chapter 6, where it supports the derivation of the primitive vocabulary, and is not restated here as a claim on the present census. Threshold sensitivity is shown the same descriptive way: we report plain counts of polysemous lemmas and mapped terms at adjacent frequency cut-offs and confirm that the structural finding does not turn on any single cut-off, attaching no interval to the census proportions.
External grounding is examined by benchmarking the stratified vocabulary against two established spatial ontologies, the IFC4 spatial types and the Building Topology Ontology, to test whether its terms have referents outside the thesis. The pass condition is that a majority of the terms have an equivalent in at least one external ontology, which breaks the internal circularity of a vocabulary justified only by its own corpus. The primitive-level evidence for this comparison is likewise derived and reported in Chapter 6.
5.9 Success Criteria and Rejection Logic
The single artefact claim is gated by five acceptance criteria, each tied to an instrument above and each testable against reported evidence:
- Baseline is measurable. At least three of the five ambiguity dimensions exhibit quantifiable burden in the corpus, rather than zero-burden dimensions requiring no intervention.
- Schema produces a non-trivial delta. The full schema’s resolution rate exceeds the null-model baseline by a margin greater than 0.05.
- Resolution is honest. The reported contribution distinguishes resolution from transparency and does not claim elimination where only visibility has been achieved.
- Residuals are governed. Fallback and unresolved cases are explicitly tracked, not silently absorbed into apparent coverage.
- Outputs are handoff-ready. Schema outputs satisfy the field-level contracts that Chapter 6 requires, without reinterpretation.
The claim would be rejected under any of four conditions, which we state as design floors rather than statistical tests. It would fail if the polysemy burden fell below a 0.50 design floor, which would remove the primary justification for ambiguity governance; if the null-model comparison showed no meaningful delta, defined as less than 0.05 above the null baseline; if within-corpus generalisation of the primitive mappings fell below 80 percent, indicating an overfitted primitive set; or if more than 20 percent of clauses failed field-level validation, indicating an impractical specification. None of these conditions is met, as Section 5.20 demonstrates. Four threats qualify the evaluation and bound how far its conclusions travel. First, generalisation is tested only within the SDA corpus across partitions, not across distinct regulatory instruments; cross-corpus generalisation to standards such as ISO 21542, BS 8300, or the National Construction Code with AS 1428 would require independent corpora and is the highest-priority future study. Second, the polysemy protocol’s Lesk component is sensitive to NLP library version, so exact counts are environment-specific and are read as estimates rather than fixed values, though the structural finding is stable under the observed drift. Third, no domain experts or housing practitioners have yet judged the schema’s practical utility, which is the most consequential scope limitation and follows directly from the artificial-evaluation classification. Fourth, the primitive and relation sets are derived from one regulatory instrument, so their adequacy for broader accessibility domains is not established within this chapter. Read together, the five criteria and four rejection conditions form a single acceptance gate: they are the operational form of the testable propositions named in the introduction, not five separate contributions, and the artefact claim stands or falls on them as a set.
5.10 Design Science Research Iteration Evidence
The artefact was not specified in a single pass but developed through nine documented DSR iterations (Iterations 0–7b) across four analytical cycles, each recording an explicit intent, change, key finding, and design decision. This design-search history satisfies Hevner, March, Park, and Ram’s Principle P6, design as a search process, and the condensed register is set out in full in the Chapter 5 Reference Tables appendix, Table 5 and visualised in Figure 5.11 below.57
Figure 5.11 — Design Science Research Iteration History for the Standardisation Schema Nine documented DSR iterations (Iterations 0-7b) across four analytical cycles. The contribution claim evolves from ‘standardisation schema’ (Iteration 2) to ‘governed ambiguity management with explicit residuals’ (Iterations 4-6) to ‘complementary cross-channel ambiguity resolution with verified asymmetric completeness’ (Iterations 7-7b). Key design trade-offs include preserving ambiguity as tracked metadata rather than attempting elimination (Iteration 2), accepting 5-lemma environmental drift as within-bounds (Iteration 3), and integrating the figures channel as complementary rather than redundant (Iteration 7). Data from Section 5.8.
The register documents cumulative refinement rather than a fixed target pursued from the outset. The contribution claim evolves from a standardisation schema at Iteration 2, through governed ambiguity management with explicit residuals, identity controls, and role-aware representation across Iterations 4 to 6, to complementary cross-channel resolution with verified asymmetric completeness at Iterations 7 and 7b. The load-bearing design trade-offs are recorded with equal candour: the decision to preserve ambiguity as tracked metadata rather than attempt its elimination (Iteration 2), the acceptance of a five-lemma environmental drift as within-bounds rather than blocking (Iteration 3), and the integration of the figures channel as complementary to text rather than redundant with it (Iteration 7). One limitation governs every count reported in this module: the corpus is a census of a single serialised source, so these figures characterise this material exactly and are not offered as population estimates. On this basis, Section 5.11 now establishes the empirical baseline against which these evaluation criteria are applied.
5.6 Empirical Baseline: Ambiguity and Fragmentation
Two conditions co-occur when the SDA source material is read as a shared standards substrate, and their conjunction is the problem this artefact must answer. The first is high conceptual ambiguity: recurrent terms carry several senses at once. The second is representational fragmentation: what one layer states, the aligned layer fails to carry. Each condition is individually tractable. Ambiguity is manageable under stable reference, and fragmentation is manageable under stable semantics. The corpus we measure has neither. Lexical multiplicity, clause-role mixing, and partial carry-over from source text into linked ontology artefacts arrive together, and their joint effect is interpretation drift at exactly the point where the standardisation schema must supply stable concepts to the downstream module, notation, and procedural artefacts of the thesis.58 We read this baseline as the substrate-design justification: every later quality claim in this chapter is measured against the six dimensions established here.
Baseline evidence begins with corpus integrity, because ambiguity claims are void if the source text is incomplete or corrupted. The integrity gate reports 611 serialised records, zero missing required keys, zero truncation markers under the declared heuristics, and zero malformed clause headers. Observed ambiguity is therefore real semantic burden in the corpus rather than extraction noise from dropped rows or clipped text. The same audit, however, surfaces a referential instability that bears directly on schema design. Eight clause identifiers recur with divergent text payloads, and none recurs with identical wording. Nominal identifiers in the source thus cannot serve as unique semantic anchors without normalisation. A schema that assumes one identifier denotes one statement would silently collapse distinct claims and obscure the conflicts between them, so any representational intervention must encode record identity beyond the nominal clause code.59
The second dimension is lexical plurality, measured through the multi-method protocol set out in Section 5.4. Under a frequency threshold of three or more occurrences, 204 lemmas are eligible and 121 of them — 59.3 percent — are classified as polysemous. Because the SDA corpus is a complete census of the serialised source rather than a sample drawn from a wider population, we report this figure descriptively, as a proportion of the eligible lemmas, and attach no inferential interval to it. Even on a conservative reading, the burden is high enough to reject the working assumption that recurrent terms map directly to stable concept classes without adjudication logic. The reading is consistent with the literature established in Section 5.2: Pustejovsky’s generative lexicon predicts elevated polysemy in regulatory corpora where terms carry several functional roles, and the SDA proportion exceeds both the roughly 40 percent general-English baseline documented by Navigli and the 53 percent figure for healthcare building requirements reported by Zhang, Ma, and Nisbet.6061 The content of the high-confidence set explains why the burden is operationally serious. Terms such as door, wall, height, and basin are central built-environment concepts, not peripheral labels, and they appear across requirement, rationale, and applicability contexts. Concept assignment cannot rest on dictionary-like intuition; the schema must carry explicit term-role contracts, scoped relations, and residual-ambiguity notes, or downstream artefacts will encode inconsistent assumptions under a shared label.62
The third dimension is structural signalling, and here ambiguity is class-conditioned rather than purely lexical. Across 605 processed clauses, the analysis records 4,556 distinct trigrams, of which 568 are retained at the minimum-frequency threshold. The dominant patterns track clause role: clause applicable to concentrates in applicability lines, clause design requirement in requirement lines, and clause rationale for in rationale lines. Clause framing therefore performs semantic work that the tokens alone do not. A term inside a rationale frame does not carry the inferential force it carries inside a design-requirement frame, so meaning cannot be inferred without role context, and clause class becomes mandatory metadata rather than incidental annotation.63
The fourth and fifth dimensions concern carry-over and channels. Representational carry-over from the serialised corpus into linked ontology artefacts is weak: 603 unique serial identifiers exist but only 30 are ontology-linked, a coverage ratio of about 0.046, and of 165 modal-bearing clauses only 9 are covered, a ratio of about 0.055. A strong discontinuity separates what the corpus states from what the linked artefacts currently capture, and without a schema that discontinuity propagates as hidden under-coverage rather than declared scope. Cross-channel fragmentation is more severe still. The figures-channel integration analysis, recorded as Iteration 7 of the design-cycle register in Section 5.8, finds that 75.7 percent of figure-derived design requirements have no textual equivalent in the clause corpus, while 86.4 percent of text-derived requirements have no figure match. Neither channel is a subset of the other; the two carry overlapping but distinct normative content. A schema operating on text alone would miss most of the dimensional constraints that figures specify, and a schema on figures alone would miss the clause-role and normative-force structures that text carries, so the schema must be a two-channel architecture that governs textual clauses and figure-derived requirements under one representational contract.
Figure 5.10 — Cross-Channel Fragmentation: Text-Figure Normative Coverage Overlap Cross-channel fragmentation between text clauses (n = 611) and figure-derived design requirements (n = 189). Of text-derived design requirements, 86.4% have no figure match; of figure-derived design requirements, 75.7% have no text equivalent. Neither channel is a subset of the other, confirming that a schema operating on one channel alone would miss the majority of the other channel’s normative content. Data from Section 5.4 baseline and Iteration 7 of the design-cycle register (Section 5.8).
The six dimensions are summarised together — each with its measured result and interpretive risk — in the Chapter 5 Reference Tables appendix, Table 2: corpus completeness (611 text records, 189 figure requirements, no integrity defects); referential identity (8 duplicate identifiers with divergent text); lexical plurality (121 of 204 eligible lemmas polysemous, 59.3 percent); structural signalling (4,556 trigrams, class-conditioned); representational carry-over (ontology coverage 30 of 603, modal coverage 9 of 165); and cross-channel fragmentation (75.7 percent of figure requirements text-absent, 86.4 percent of text requirements figure-absent).646566
These measurements describe a multi-layer condition, not a single lexical defect, and the layers interact. Identifier instability amplifies lexical ambiguity, because occurrence-level evidence can be merged onto the wrong record. Clause-role mixing amplifies polysemy, because terms travel between requirement and rationale frames. Weak carry-over amplifies both, because partial models conceal where ambiguity remains unresolved, and cross-channel fragmentation compounds all three, because text and figures encode different normative content under one standard. We treat fragmentation throughout this chapter as discontinuity across representation layers that ought to be aligned for consistent interpretation, and the baseline shows it recurring in four forms: similar requirements restated across categories with altered phrasing; serialised clauses and ontology artefacts covering different identifier sets; source notes, extraction outputs, and prose drifting unless tied to traceable evidence; and the text-figure asymmetry that no single channel captures completely.
The baseline also exposes a design paradox that governs how success may be claimed. Mapping coverage reaches 204 of 204 at the selected high-frequency threshold, yet 54 mapped items remain fallback-to-element assignments. High coverage coexists with non-trivial residual uncertainty. Coverage alone cannot therefore serve as a sufficiency proof for semantic quality, and a schema that reported coverage without residual burden would emit a false completion signal. The standardisation schema must report both, a discipline that directly shapes the results reporting in Section 5.20.67
Taken together, the baseline establishes six measurable failure modes, and these are the mandatory design inputs for the artefact:
- Referential identity instability — duplicate identifiers carrying non-identical text.
- High lexical plurality across the eligible lemma space.
- Clause-role-conditioned phrase structures that alter interpretive force.
- Weak carry-over coverage from the serial corpus to ontology-linked artefacts.
- Coverage-versus-residual tension in term-to-concept mapping.
- Cross-channel fragmentation between text and figure normative content.
These conditions also fix the threshold for claiming improvement later in the chapter. Improvement is demonstrated only if the schema reduces avoidable ambiguity along these exact dimensions and represents the ambiguity that remains explicitly rather than hiding it. The aim is not rhetorical simplification but disciplined representation: stable identity, explicit role context, traceable mapping, declared residuals, and two-channel coverage. One limitation bounds every figure reported here. The corpus is a census of one serialised source pack, not a probability sample, so these counts and proportions characterise this material exactly and are not offered as population estimates; their force is that they are complete for the substrate on which the rest of the artefact suite is built. On that basis the problem is decision-complete — the corpus is sufficiently complete to analyse, the ambiguity burden is substantial, and fragmentation across layers and channels is quantified — and the chapter can move from diagnosis to the schema contracts that convert this baseline into a governed and reproducible representation.
5.7 Schema Design: Principles and the Five-Layer Contract
A representation that is to carry regulatory meaning forward must commit, in advance, to which distinctions it will preserve, which it will collapse, and which it will hold open as unresolved. The baseline in Section 5.11 measured five interlocking burdens in the SDA corpus — unstable referential identity, high lexical plurality, clause-role mixing, partial representational carry-over, and cross-channel fragmentation — and the design task here is to specify a standardisation schema as the formal contract that absorbs those burdens without erasing the ambiguity they expose. The schema is the representational boundary condition for the standardisation schema: it fixes what survives the move from prose standard to machine-tractable record, and once fixed it is what every later artefact and chapter inherits. A contract of this kind earns its name only if it commits in writing to the hard cases — duplicated identifiers, polysemous terms, clauses whose force is unclear — rather than legislating them away at the moment of encoding. Six field-level constructs carry its representational load, and we introduce each at the point where a measured failure demands it. The record_uid is the SHA-1 hash of clause identifier and source text — the only identity that survives duplicated clause numbering. The clause_class is a mandatory enumerated role that refuses silent defaults. The modality_profile preserves a clause’s normative force where downstream representation would otherwise flatten obligation, permission, and prohibition into undifferentiated assertion. The ambiguity_profile records what the schema declines to decide — confidence labels, identity conflicts, role conflicts — so that refusal is itself auditable. A residual_flag makes that refusal visible, and provenance_ref anchors every governed claim to its source location.68
We adopt a schema-first approach in preference to extending an existing ontology, and the reasons are both empirical and architectural. The external benchmarking reported in Section 5.20 shows that IFC4 spatial types cover 32.1 per cent of SDA clauses and BOT covers 20.8 per cent. These are descriptive coverage figures computed over the full corpus census rather than estimates drawn from a sample, so they carry no margin of error and no inferential claim; they say only how much of the assembled corpus each scheme’s vocabulary already names, and the count would shift were the corpus boundary redrawn. Both schemes establish robust spatial typing for built-environment entities, and that established coverage is what makes them the right point of comparison. The accessibility-specific concepts that constitute the remainder of the corpus fall outside their typing vocabularies, and extending one of these models would inherit its commitments about entity boundaries and relation semantics before the corpus had been understood on its own terms — fixing answers to questions the baseline had only begun to pose. A schema-first design is architecturally preferable because it governs representational structure before ontological commitment, preserving explicit residuals exactly where an ontology would force premature resolution. This choice rests on Parnas’s information-hiding criterion: the schema separates what is known — clause identity, role, modality — from what is uncertain — lexical sense, ontological placement — and hides the uncertainty behind governed interfaces rather than driving it into a resolution the evidence does not support.69 The contribution the standardisation schema adds is therefore not a competing spatial ontology but a layer of representational governance above whatever typing scheme a downstream consumer adopts.
This reasoning converts diagnosis into design commitment, and the conversion is disciplined: each of the six measured failure modes maps to exactly one design principle, one schema layer, and a set of implementing fields, so that no design decision is left unmotivated and no measured failure is left unaddressed. The mapping is held intact below as the traceability spine of the schema.
| Failure Mode (Section 5.11) | Design Principle | Schema Layer | Implementing Field(s) |
|---|---|---|---|
| 1. Referential identity instability | Identity before interpretation | Identity layer | record_uid, source_clause_id |
| 2. High lexical plurality | Ambiguity visibility | Lexical-semantic layer | ambiguity_profile, term_lemma_set |
| 3. Clause-role-conditioned structural ambiguity | Role-aware representation | Clause-role layer | clause_class, modality_profile |
| 4. Weak representational carry-over | Handoff readiness | Governance-audit layer | provenance_ref, notes |
| 5. Coverage-residual tension | Residual explicitness | Ontological-mapping layer | residual_flag, residual_reason |
| 6. Cross-channel fragmentation | Primitive discipline | All layers (unified contract) | Shared field-level type contract |
Failure modes 1–3 map to the resolution layers and 4–6 to the governance layers and the unified contract, so that every schema element is empirically motivated and every measured failure is addressed (Table 5.9).
Design Principles
Six principles govern the schema, and each answers to a failure mode that the baseline measured rather than to a designer’s preference:
- Identity before interpretation. Clause identity is anchored to
record_uid, not tosource_clause_idalone, because duplicate source identifiers can carry distinct statements that must not be merged. - Role-aware representation. Clause class is mandatory, because a requirement, a rationale, and an applicability frame do not carry equivalent inferential force, and treating them alike would corrupt every inference drawn downstream.
- Ambiguity visibility. Term-level and row-level ambiguity indicators are first-class fields, not post-hoc commentary appended after the record is built.
- Primitive discipline. Mappings must target a single canonical vocabulary of primitives and relations rather than free-form labels, so that a term means the same thing in every row that uses it.
- Residual explicitness. Unresolved cases remain represented through residual flags carrying a reason and a planned disposition, never silently dropped.
- Handoff readiness. Each row preserves the information that the implementation in Section 5.14 and the module design in Chapter 6 require, so that no consumer must reinterpret the source.
Taken together, the six principles refuse the choice between false closure and untracked ambiguity. They operationalise the chapter’s completion requirement — that schema semantics be unambiguous in contract form while evidential traceability is preserved — and they align with the artefact-suite dependency map, in which the standardisation schema is the semantic substrate on which the Governed Kernel Architecture, the notation, and the empirical substrate are built.70
Schema Layer Contract
The principles are realised as five layered contracts, each with explicit required fields, validation rules, ambiguity-handling rules, and a named handoff consumer. The layered structure is justified by the design-theoretic foundations established in Section 5.2, and the justification runs deeper than tidiness. Parnas’s information hiding separates resolution concerns, in Layers 1 to 3, from governance concerns, in Layers 4 and 5, so that a change in how meaning is resolved need not disturb how meaning is audited. Simon’s account of near decomposability predicts that a well-decomposed system will show a zero cross-layer resolution delta — adding the governance layers should change no resolved value — and we treat that zero delta as the expected signature rather than as a finding to be manufactured. Baldwin and Clark’s theory of option value then explains why layers that contribute nothing to present resolution are nonetheless retained: a governance layer that is inert today carries the option to support future extension and regulatory audit at low cost, and discarding it would forfeit that option. This is the modularity argument applied to representation, and it is what turns the schema from a descriptive table into a governed contract.
Figure 5.2 — Five-layer standardisation schema architecture Five-layer standardisation schema architecture for the schema. Layers 1–3 (Identity, Clause-Role, Lexical-Semantic) serve resolution functions; Layers 4–5 (Ontological Mapping, Governance Audit) serve governance and extensibility functions. Each layer encapsulates a distinct design concern and exposes governed interfaces to adjacent layers.
The contract is specified layer by layer in the table below, which is reproduced intact as the normative specification an examiner can check field by field.
| Schema Layer | Purpose | Required Fields | Validation Rule | Ambiguity Handling Rule | Handoff Consumer |
|---|---|---|---|---|---|
| Identity layer | Stabilise record identity across duplicated clause IDs and preserve source traceability | record_uid, source_clause_id, source_text, provenance_ref |
record_uid must be unique in dataset; source_text non-empty; source_clause_id may repeat |
If source_clause_id repeats with distinct text, keep separate rows and mark ambiguity_profile.identity_conflict=true |
Section 5.14 serialisation; Chapter 6 module library |
| Clause-role layer | Preserve inferential class of each statement for downstream interpretation | clause_class, modality_profile |
Missing/unknown class is invalid; defaulting is disallowed unless explicitly justified | Mixed or uncertain frames marked with ambiguity_profile.role_conflict=true and explanatory note |
Section 5.20 results; Chapter 7 notation |
| Lexical-semantic layer | Represent term evidence and polysemy burden at row level | term_lemma_set, ambiguity_profile |
Each mapped high-frequency lemma must include confidence label | Polysemous terms keep confidence labels (strong\|moderate\|weak) and method-evidence summary |
Section 5.20 resolution rate; Chapter 7 lexical constraints |
| Ontological mapping layer | Bind terms to primitives and relations under declared constraints | primitive_assignments, relation_assignments, residual_flag |
Mappings outside allowed primitive/relation sets are invalid; each assignment requires rationale tag | If no stable assignment, set residual_flag=true with residual_reason and review pathway |
Chapter 6 taxonomy + interaction rules; Chapter 8 governance |
| Governance audit layer | Ensure reproducible, reviewable artefact behaviour across chapter boundaries | provenance_ref, residual_flag, notes |
Every verified claim row must be traceable to source object and chapter claim bank ID | Residuals cannot be suppressed; unresolved rows require documented disposition stage | Chapter 5 closure, Chapter 6 handoff, Chapter 9 evaluation |
The division is therefore principled rather than presentational. Layers 1 to 3 resolve: they stabilise identity, fix inferential role, and record lexical evidence with its confidence intact. Layers 4 and 5 govern: they bind terms to an approved vocabulary under declared constraints and guarantee that the artefact’s behaviour stays reproducible and reviewable across chapter boundaries. The layered interfaces also prevent category leakage between structural primitives and drafting-scaffolding terms, because every assignment must pass through an explicit layer boundary rather than being asserted directly into the record.71 Each layer names its own handoff consumer, so the contract is legible to the chapters that depend on it before any of them is written. This is the sense in which the standardisation schema is the governed-kernel semantic substrate for the artefact suite: the five layers fix, once, the meaning that the Governed Kernel Architecture, the notation, and the empirical substrate and the module engine of Chapter 6 will read without renegotiation. What the approved vocabulary contains, and why it is sufficient for the corpus, is a matter for Chapter 6; the schema’s role here is only to require that assignment be constrained to it and that whatever cannot be assigned be flagged rather than forced. With the layer contract fixed, the field-level type contract that operationalises these layers, the primitive and relation inventories it constrains, and the residual-governance regime that keeps the artefact honest follow next.
5.8 The Clause-Representation Contract, Inventories, and Residual Governance
The five-layer contract of Section 5.12 fixes where each commitment lives; the row type fixes what a single governed clause is. This row type is the operational core of the standardisation schema: an eleven-field record in which each field carries a requirement level, an allowed value set or type, a validation rule, and an ambiguity-handling rule. Each field earns its place because a measured failure mode demands it, so the contract below is the schema’s spine rather than a descriptive convenience. We hold all four columns intact because an examiner reads this row type field by field, and because the serialisation pipeline of Section 5.14 must satisfy every cell or be in breach.
Reading the contract by column is instructive. The requirement column distinguishes fields that every row must populate — identity, source text, clause class, modality, ambiguity, residual flag, provenance — from those required only when a row is analytic or carries an explicit mapping. The allowed-values column pins each field to a closed type, from the hashed record_uid to the enumerated clause_class, so that the space of legal entries is fixed before any annotation begins. The validation column states the machine-checkable condition that a row must pass, and the ambiguity-handling column states what the schema does when evidence is thin rather than letting an annotator improvise. Together these four columns convert each diagnosed failure mode into an enforceable obligation: referential instability is met by a unique hashed identity, role-conditioned ambiguity by a mandatory typed class, and weak carry-over by a provenance reference that must resolve.
| Field | Requirement | Allowed Values/Types | Validation Rule | Ambiguity Handling Rule |
|---|---|---|---|---|
record_uid |
required, stable unique identity | sha1(source_clause_id|source_text) string |
unique across all rows; non-null | identity collisions are treated as blocking defects |
source_clause_id |
required, non-unique allowed | string (e.g., 07-01-17) |
non-empty | repeats allowed only with distinct record_uid |
source_text |
required | non-empty string | length > 0 after trim | text variants retained as separate evidence rows |
clause_class |
required | design_requirement|rationale|applicable_to|other |
enum membership | uncertain class requires explicit note and role_conflict marker |
term_lemma_set |
required for analytic rows | array of lemma strings | array exists; each lemma normalised | uncertain lemmas flagged in ambiguity_profile |
primitive_assignments |
required for mapped rows | array of {term, primitive, rationale} |
primitive in approved set | unresolved term uses residual path, not forced assignment |
relation_assignments |
required when semantic relation is explicit | array of canonical relation tags | relation in approved set | disputed relation marked with confidence and note |
modality_profile |
required | object (modal_terms, normative_force) |
object present | weak/ambiguous force classifications recorded explicitly |
ambiguity_profile |
required | object (polysemy_confidence, identity_conflict, role_conflict, notes) |
object present with keys | confidence labels preserved from source analyses |
residual_flag |
required | boolean | boolean present | true requires residual_reason and planned disposition |
provenance_ref |
required | source file path + location anchor | trace path must resolve | unresolved provenance is blocking for verified claims |
The contract embeds two decisions carried forward from the empirical baseline of Section 5.11. First, ambiguity is represented rather than discarded: uncertain classes acquire a role_conflict marker, repeated source identifiers survive only under distinct record_uid, and identity collisions count as blocking defects rather than tolerated noise. Second, governance metadata is integral to meaning preservation, not decoration appended after the fact. Inferential stability requires more than lexical normalisation; it requires provenance that resolves and residual transparency that travels with each row.72
Primitive Alignment
The mapping subsystem fills primitive_assignments through a constrained five-step progression, and the constraint is the point: at no step may a term acquire a label outside the approved vocabulary. The steps run as follows. We begin from occurrence-grounded lemmas read together with their clause-class context; we apply primitive assignment only through the approved primitive inventory; we apply relation assignment only through the approved relation inventory; we record an assignment rationale tag of direct, keyword, or fallback; and we preserve ambiguity confidence together with any residual markers. The progression refuses free-form labelling at every point where a human annotator might otherwise improvise.
This sequence answers a tension surfaced in the baseline. High-frequency coverage can reach 1.0 while residual uncertainty remains non-trivial: across the corpus, 54 terms reach a primitive assignment only through the fallback pathway rather than through the direct or keyword routes.73 These are census counts over the assigned corpus, not estimates from a sample, and we read them descriptively. The fallback element count establishes a design principle rather than an incidental statistic. Mapping completeness cannot stand alone as a sufficiency condition, because a coverage figure of 1.0 can coexist with a substantial residual burden. The schema therefore requires dual reporting: every coverage claim is paired with a residual-burden figure, and neither is permitted to stand without the other. Because the corpus is the full set of eligible clauses rather than a drawn sample, these figures describe the artefact’s own population exactly and support no inference beyond it; sensitivity to the working thresholds is reported, where it matters, as plain counts at adjacent cut-offs rather than as interval estimates.
The approved vocabulary itself — what the primitives and composites are, why that stratified set is complete, and how it is benchmarked — is derived in Chapter 6, not asserted here. Figure 5.15 previews that taxonomy as a forward pointer; its branch structure and its external comparison are established through the ablation and coverage evidence presented in Chapter 6, Section 6.2.
Figure 5.15 — SDA Foundational Primitive Taxonomy Taxonomy of the stratified vocabulary used in the standardisation schema, organised by construal status: seven schematic primitives (space, boundary, element, quality, activity, context, actor), seven elaborated composites (room, dwelling, opening, path, level, fixture, role) generated from the primitives through the relation operators, and the relation-operator layer. Nine of the fourteen entity terms (64.3%) have equivalents in IFC4 or BOT; five domain-specific terms (the primitives quality, activity, context, and actor, with the relation operator) are justified through ablation evidence and coverage necessity. The original candidate set included zone, removed after ablation showed zero information loss — on the cognitive account a composite (space bounded by boundary) whose work the primitives already carry. Data from Section 5.8, Supplementary Ch5 Results and Data Package, and Chapter 6.
Relation Set
The schema records semantic structure as triples drawn from ten canonical relation operators: is_a, part_of, bounded_by, opens_to, connects, located_at_level, has_quality, serves_role, supports_activity, and within_context. Two of these, is_a and part_of, are standard ontological primitives; the remaining eight are thesis-specific extensions introduced to carry spatial and regulatory structure that the standard pair leaves implicit. At this point the schema’s only duty is to constrain relation_assignments to this approved operator layer, so that no annotator coins an eleventh relation under pressure and every triple in the substrate draws from a fixed, declared set. A disputed relation is recorded with a confidence label and an explanatory note rather than silently resolved. The justification for each extension — the ontological gap it fills, its logical constraints, its falsification condition, and the has_quality overloading analysis with its associated concentration measure — is developed in Chapter 6, Section 6.2. Per-relation usage frequencies across the corpus are tabulated in the supplementary mapping coverage record.74
Residual Governance
Residuals are explicit design objects with governance duties, not noise to be swept aside. A row is residual when the available evidence is insufficient for a stable assignment — of primitive, of relation, or of clause role — under the declared thresholds. Such rows must remain visible and reviewable, because suppressing them manufactures false confidence and quietly compromises every downstream claim that rests on the substrate. The schema accordingly attaches three controls to every residual.
- Mandatory declaration:
residual_flag=trueis set wherever assignment certainty is inadequate; silence is not an option. - Mandatory rationale: each residual row carries a
residual_reason— for example, frequency below the working threshold, an unresolved role conflict, or persistent polysemy. - Mandatory disposition pathway: each residual row names a planned resolution stage — refinement during the Section 5.14 implementation pass, expert adjudication, or explicit deferral to a chapter limitation statement.75
These controls keep the artefact methodologically honest by refusing the false choice between premature closure and untracked ambiguity. A schema that forced every term to a primitive would report clean coverage at the cost of fabricated certainty; one that simply dropped hard cases would understate the residual burden it had quietly shed. Declaration, rationale, and disposition together make the unresolved fraction a first-class, inspectable quantity rather than a silent omission. They permit genuine standardisation progress without claiming total semantic closure over the source standard, and they preserve the auditability on which the proposition-level claims of Chapter 10 later depend.
Interface and Section Outcome
The implementation that follows in Section 5.14 consumes this contract as its specification. It must demonstrate reproducible row generation that satisfies every required field and validation rule above, and any omitted field is read as a schema breach rather than a stylistic liberty. Five behaviours form the minimal acceptance surface: identity uniqueness at the record_uid level; complete clause-class assignment; auditable primitive and relation mapping rationale; explicit residual declarations with disposition; and provenance completeness for every claim-bearing row.
With these commitments fixed, the schema specification is decision-complete. It translates the measured baseline burdens into enforceable contracts at both layer level and field level, and it establishes the triplet substrate from which Chapter 6 derives the primitive and relation vocabulary. That handoff is itself governed: Chapter 6 builds its modularity engine on this governed-kernel substrate, and in doing so it must not suppress residual flags, collapse role distinctions, or assert closure on matters the schema deliberately leaves open. The standardisation schema thus stands as one evidence-bearing instantiation of the thesis’s single architectural claim — a governed semantic substrate, not a contribution unto itself — and Section 5.14 now shows how that substrate is produced in a reproducible pipeline.
5.9 Implementation: The Five-Stage Serialisation Pipeline
The schema specification declares a decision-complete contract; the serialisation pipeline is the apparatus that enacts it. We apply that contract to the SDA source material and assemble, for every clause, a governed representation that satisfies each required field and validation rule. The implementation separates five concerns and arranges them as five sequential, independently-verifiable stages — ingestion, classification, term analysis, triplet extraction, and governance — each populating exactly one schema layer. This one-to-one alignment is not incidental. It operationalises the information-hiding principle from Section 5.2: each stage encapsulates one design concern behind a defined interface, so a change in (for example) the polysemy protocol does not propagate to upstream identity generation or downstream mapping.76 The pipeline processes the full corpus — 611 text clauses and 189 figure-derived requirements — and reports zero validation failures against the contract.77
Figure 5.3 — Five-stage serialisation pipeline Five-stage serialisation pipeline for the standardisation schema implementation. Each stage transforms SDA source material by populating one schema layer, maintaining strict alignment between processing stage and representational concern. Input: 611 raw clause records plus 189 figure-derived requirements. Output: governed schema rows with stable identity, clause-role classification, ambiguity profiles, primitive mappings, and provenance.
5.15 Pipeline Architecture
Each stage consumes the output of its predecessor and writes the fields of one schema layer, so the intermediate product after every stage can be audited against the contract from Section 5.12 in isolation. Table 5.6 sets out the five-stage progression.
| Stage | Input | Process | Output | Schema Layer Populated |
|---|---|---|---|---|
| 1. Ingestion | Raw clause text from the SDA source document | Record assembly with stable identity generation | 611 records with record_uid, source_clause_id, source_text |
Identity layer |
| 2. Classification | Stage 1 output | Clause-class assignment and modality profiling | Classified records with clause_class and modality_profile |
Clause-role layer |
| 3. Term analysis | Stage 2 output | Lemmatisation, POS tagging, three-method polysemy detection | Term-annotated records with term_lemma_set and ambiguity_profile |
Lexical-semantic layer |
| 4. Triplet extraction | Stage 3 output | Primitive and relation assignment with fallback governance | Mapped records with primitive_assignments, relation_assignments, residual_flag |
Ontological mapping layer |
| 5. Governance | Stage 4 output | Provenance attachment, validation, and manifest generation | Complete schema rows with provenance_ref, notes, and run manifest |
Governance audit layer |
5.16 Identity Generation and Clause Classification
Identity precedes interpretation. Each clause receives a record_uid computed as a SHA-1 hash over the concatenation of its source_clause_id and source_text, so that two records sharing a nominal clause number but carrying distinct text payloads receive distinct identities. The implementation keeps source_clause_id as a non-unique reference for human navigation while enforcing uniqueness at the record_uid level. The distinction is not cosmetic: it is operationally necessary for sixteen records spread across eight duplicated source identifiers, where each member of a pair encodes a different design-category applicability statement under the same clause number.78
Classification then assigns every record to one of four classes — design_requirement, rationale, applicable_to, or other — by pattern matching on explicit role markers in the clause framing text. A clause headed “Design Requirement”, or opening on a modal verb, becomes a requirement; one headed “Rationale”, or opening on “The purpose”, becomes a rationale; one scoping a design category becomes applicable-to. Where no pattern matches, the record is assigned to other and marked with a role-conflict flag rather than defaulted to a convenient category. This rule-based approach is reliable here because the SDA standard uses consistent clause-level formatting, and it would require adaptation for less structured standards. In the present corpus, 95.3 per cent of clauses (582 of 611) receive determinate classification; the remaining 29 (4.7 per cent) are routed to other, signalling an unresolved structural role rather than a silent default.
Modality profiling constructs a modality_profile for each record by detecting modal verbs and classifying their normative force. The implementation recognises six modal operators across four force categories — shall and must (obligatory), should (advisory), may, can, and could (permissive), and shall not (prohibitive) — whose corpus frequencies are tabulated in the Chapter 5 Reference Tables appendix, Table 3, with shall dominant at 140 occurrences.
Where a single clause carries operators of differing force, the record is classified as mixed-force and the ambiguity_profile is marked accordingly. Fifteen clauses in the corpus contain mixed deontic force — for instance, one that both requires (shall) and permits (may) different aspects of the same design condition — and these are flagged for external adjudication that the schema can identify but cannot itself perform. In aggregate the analysis finds 164 modal clauses of the 611 total, a modal rate of 0.2684. The force distribution is strongly class-conditioned: design_requirement clauses carry a 95.7 per cent modal rate with 122 obligatory clauses, rationale clauses a 21.3 per cent rate that is permissive-dominant, and applicable_to and other clauses are entirely non-modal. This conditioning confirms that the modality_profile field is empirically warranted rather than over-engineered: normative force varies, and the variation matters downstream.79
5.17 Term-Level Analysis and Polysemy Protocol
Each record is then tokenised, lemmatised, and part-of-speech tagged using spaCy’s en_core_web_md model. We adopt the medium model over the small one because the small model lacks word vectors entirely, and those vectors are required for the embedding-dispersion component below. The choice is not pedantic. During the reproducibility stress test a silent substitution of the small model for the medium one altered polysemy results by 29 lemmas before the discrepancy was caught and corrected. At a frequency threshold of three or more occurrences, 204 lemmas are eligible for assessment; this threshold is a transparent design decision that trades coverage against reliability, since terms below it lack the occurrence context for meaningful cross-method comparison.
The three-method polysemy protocol applies three independent tests to each eligible lemma. The first is a WordNet 3.1 inventory check, which marks a lemma inventory-polysemous when its synset count exceeds one and so captures the lexicographic dimension of ambiguity. The second is simplified Lesk contextual disambiguation, which selects, for each occurrence, the WordNet sense whose gloss overlaps most with a ±5-token context window, and marks the lemma Lesk-polysemous when sense assignments vary across occurrences. The third is embedding-context dispersion, computed as the mean pairwise cosine distance across a lemma’s occurrence vectors: a lemma is flagged when dispersion exceeds 0.30 (calibrated at the 75th percentile of the corpus-wide distribution) and agglomerative clustering returns more than one cluster.80 These thresholds were calibrated against the corpus distribution rather than imported from external benchmarks, because domain-specific embedding distributions differ from general-corpus ones.
The protocol is robust to its weakest component. Removing Lesk — the method least suited to built-environment regulatory prose — shifts the polysemous proportion by less than 0.04, from 0.5931 to approximately 0.56, and the 27 terms that change status involve weak-to-moderate confidence shifts rather than categorical reversals. The threshold sensitivity reported in Section 5.20 tells the same story: the polysemous proportion moves only across the range 0.593 to 0.643 over seven frequency thresholds, so the structural finding — that a majority of eligible domain terms are polysemous — does not turn on any single cut-off. The SDA corpus is a census of one standard rather than a sample, so these figures are reported descriptively, with explicit denominators, and not as estimates carrying inferential uncertainty. Cross-method agreement assigns confidence tiers, tabulated in the Chapter 5 Reference Tables appendix, Table 4: the strong tier (all three methods agreeing) holds 24 terms, the moderate tier (two of three methods) holds 97, and the weak tier (one of three) holds the residual.
The 24 strong-confidence terms are dominated by domain-central items — door, wall, height, basin, space — that recur across requirement, rationale, and applicability contexts, and so are exactly the terms where misinterpretation would carry the highest downstream cost for module design in Chapter 6.
5.18 Foundational Mapping and Fallback Governance
Triplet extraction assigns each eligible term to a primitive and, where the clause expresses one, to a canonical relation, forming subject-predicate-object triples under a constrained sequence. The pipeline examines the term’s corpus occurrences for clause-class distribution and co-occurring vocabulary, then assigns directly when the context maps unambiguously to a single primitive, by keyword overlap when it does not but shares a primitive’s defining vocabulary, and by fallback to element — with residual_flag set and residual_reason recorded as fallback:element — when neither yields a stable mapping. The stratified entity vocabulary and the ten-operator relation layer applied here are derived and validated in Chapter 6, Section 6.1; this section reports their application to the corpus, not their justification.
Mapping coverage reaches 204 of 204 eligible terms, but 54 of those assignments are fallbacks, and the dual report of coverage alongside residual count is deliberate. The fallbacks form a low-frequency tail — 38 terms at frequency three and 16 at frequency four, with none at frequency five or above — and their confidence distribution is 28 weak, 20 moderate, and 6 strong. Each carries an explicit residual flag and a fallback:element rationale. On our reading the fallback mechanism is not a mapping failure but a governed boundary: it declares precisely where corpus evidence is insufficient for confident primitive assignment, and each flagged term has a planned disposition through later refinement, expert adjudication, or an acknowledged limitation. The has_quality relation is the most frequently assigned, accounting for 67.1 per cent of all triples and reflecting the corpus’s emphasis on dimensional and performance attributes of built-environment elements; the overloading trade-off that this concentration introduces is developed in Chapter 6, Section 6.2.
5.19 Row Assembly, Validation, and Reproducibility Controls
Row assembly integrates the stage outputs into complete schema rows and validates each against the field-level contract from Section 5.12: record_uid uniqueness, non-empty source_text, clause_class membership in the declared enumeration, ambiguity_profile completeness, and residual-flag consistency, whereby every flag set true must carry a matching reason. The governance audit layer attaches a provenance_ref tracing each record to its source location together with notes recording any implementation decision. All 611 rows pass field-level validation with zero schema breaches, which confirms that the specification is implementable at corpus scale and that the pipeline satisfies every required field and rule.
Reproducibility is controlled through an execution manifest written on every run, recording the Python, spaCy, NLTK, and WordNet versions, the model identity, the frequency, dispersion, and cluster-distance thresholds, and a fixed random seed of 42. After correcting the silent model substitution noted above, residual environmental drift narrowed to five lemmas — 126 against the canonical 121 polysemous terms — attributable to minor library-version differences in disambiguation behaviour rather than to a model-architecture mismatch. We report the canonical count of 121 throughout the chapter and treat the 126 figure as a reproducibility bound that states, honestly, how far the result can shift under environment variation. The figures channel runs a parallel, adapted pipeline over the 189 pre-structured figure-derived requirements: classification is deterministic, polysemy detection operates on dimensional terms, and mapping decomposes dimensional values into entity, predicate, and dimension triples, yielding 189 schema-compatible governed records. Together the two channels convert the SDA corpus into governed triples with stable identity, explicit clause-role classification, confidence-labelled ambiguity profiles, constrained primitive and relation assignments, and transparent residual governance — the operational inputs that Section 5.20 evaluates and that Chapter 6 consumes for primitive-vocabulary derivation and module construction.
5.10 Evaluation Results: Resolution Rate, Deontic Force, and Schema Comparison
A five-dimensional ambiguity profiler scores each clause of the serialised corpus on identity, lexical, structural, deontic, and mapping ambiguity, recording for each dimension whether it is present, whether the standardisation schema fully resolves it, or whether a residual remains. The resolution rate is the proportion of present dimensions the schema resolves, as defined in Section 5.8. We report it as the chapter’s primary quantitative result, alongside its variation by clause class, an honest decomposition of deontic force, and a comparison against null and reduced-schema variants. Two cautions frame every figure below. The SDA corpus is a complete census of one serialised standard rather than a sample drawn from a wider population, so we report each result descriptively, as a proportion of an explicit denominator, and attach no inferential interval; and the per-clause delta reported here is a different computation from the aggregate comparison delta used later to contrast schema variants, a distinction we keep visible throughout.
5.21 Resolution Rate: Overall Results
Across the 611 clauses, the average number of dimensions present per clause is 2.072, of which 0.730 are fully resolved and 1.342 remain residual, giving an overall resolution rate of 0.2514.
| Metric | Value |
|---|---|
| Average dimensions present per clause | 2.072 |
| Average dimensions fully resolved | 0.730 |
| Average dimensions residual | 1.342 |
| Resolution rate | 0.2514 |
The dimension-level summary is sharply differentiated, and the differentiation is the design working as intended rather than an uneven outcome.
| Dimension | Clauses Present | Fully Resolved | Residual | Resolution Rate |
|---|---|---|---|---|
| Identity | 16 | 16 | 0 | 100.0% |
| Structural | 310 | 281 | 29 | 90.6% |
| Deontic | 164 | 149 | 15 | 90.9% |
| Lexical | 611 | 0 | 611 | 0.0% |
| Mapping | 165 | 0 | 165 | 0.0% |
The three resolution-mode dimensions clear by wide margins: identity is resolved through record_uid (16 of 16, 100%), structural through the mandatory clause_class field (281 of 310, 90.6%), and deontic through modality_profile (149 of 164, 90.9%). The two dual-mode dimensions are not resolved but made transparent. Lexical polysemy (0 of 611) and mapping residuals (0 of 165) are preserved as visible, auditable design objects through confidence labels and residual flags rather than forced to a premature semantic closure. This dual-mode reporting is the epistemically honest response to two convergent positions in computational linguistics: Kilgarriff’s argument that discrete word senses are methodological artefacts rather than cognitive realities, so forced sense assignment manufactures artificial precision, and Haber and Poesio’s recent evidence that polysemy operates along a regular-to-irregular spectrum.8182 The 29 structural residuals are the other-class clauses whose framing does not match the three primary role patterns; the 15 deontic residuals are mixed-force clauses carrying conflicting modal operators that the schema identifies but cannot itself adjudicate.
Figure 5.4 — Resolution rate resolution profile by dimension Clause counts for each ambiguity dimension showing resolved versus residual. Identity (100.0%), structural (90.6%), and deontic (90.9%) dimensions achieve high resolution. Lexical and mapping dimensions show 0% resolution by design: the schema governs these as transparent, confidence-labelled design objects rather than claiming premature semantic closure.
The per-clause delta of 0.2514 sits 0.0014 above the pre-registered design floor of 0.25 declared in Section 5.8. We report this as a transparent design decision met descriptively, not as a tested hypothesis, and so attach no confidence interval or significance claim to the margin. The margin is genuinely narrow, which warrants one honest caveat about reproducibility. The five-lemma classification drift between execution environments documented in Section 5.25 affects lexical-sense disambiguation alone, and the three resolution-mode dimensions are decoupled from it by construction — identity is a deterministic SHA-1 hash, structural classification matches syntactic patterns, and deontic profiling detects modal keywords — so the drift cannot flip a resolution-mode result. A conservative upper bound confirms the headline is environment-stable: even were all five drifting lemmas to flip an entire clause’s resolution profile, a scenario the dimensional decoupling rules out, the worst-case change to the delta is 5 of 611, or 0.00818, and the drift falls within the polysemy-and-mapping band that the dual-mode dimensions already exempt from the per-clause binding.83
5.22 Resolution Rate by Clause Class
The overall delta of 0.2514 conceals substantial variation across the four clause classes, and the variation reveals where resolution capacity concentrates.
| Clause Class | Count | Avg Dims Present | Avg Resolved | Avg Residual | Avg Delta |
|---|---|---|---|---|---|
design_requirement |
140 | 3.364 | 1.879 | 1.486 | 0.570 |
rationale |
141 | 2.539 | 1.192 | 1.348 | 0.475 |
applicable_to |
301 | 1.253 | 0.050 | 1.203 | 0.023 |
other |
29 | 2.069 | 0.000 | 2.069 | 0.000 |
Design_requirement clauses achieve the highest delta (0.570) because they carry the most ambiguity dimensions simultaneously and the schema resolves three of them — structural, deontic, and identity where applicable. Rationale clauses follow at 0.475, drawing on structural resolution and partial deontic resolution. Applicable_to clauses reach only 0.023 because their ambiguity is almost exclusively lexical and mapping-based, the two dimensions the schema governs rather than resolves. The other class scores 0.000: these 29 clauses receive an explicit class assignment that signals unresolved status rather than a forced fit, which we read as an honest null signal rather than a defect. The interpretive consequence is the one that matters for the artefact suite. The clauses with the highest delta are precisely those with the highest downstream consequence — design requirements carry obligatory normative force and fix the dimensional constraints that the module specifications of Chapter 6 must encode — so the schema’s resolution capacity is concentrated where it is most needed.84
Figure 5.5 — Resolution rate by clause class Average resolution rate by clause class (n = 611). Design requirements achieve the highest delta (0.570) because they carry the most ambiguity dimensions simultaneously and the schema resolves three of these. The dashed reference line indicates the overall corpus delta of 0.2514.
5.23 Deontic Force: Honest Reframing
An earlier analysis of the modal clauses reported an 87.3% force-collapse headline. On closer inspection that figure proved misleading, because it conflated three distinct conditions under one binary metric, and we have retracted it. The honest decomposition across the 164 modal clauses separates the conditions explicitly.
| Path | Count | Proportion | Description |
|---|---|---|---|
| Coverage gap | 75 | 45.7% | Ontology-linked artefacts never attempted to represent these clauses |
| Actual force collapse | 61 | 37.2% | Ontology represents the clause but loses deontic force information |
| Force preserved | 28 | 17.1% | Ontology uses deontic predicates that preserve normative force |
The coverage gap (45.7%, 75 clauses) reflects a scope limitation in existing ontology-linked artefacts rather than a schema failure: these clauses have no ontology representation at all, so force preservation is moot because no representational attempt was made. The actual force collapse (37.2%, 61 clauses) is the genuine problem — clauses where existing artefacts flatten obligatory, permissive, or advisory force into undifferentiated statements — and the schema addresses it through mandatory modality_profile construction, which preserves normative force in the serialised output regardless of downstream ontology coverage. The remaining 28 clauses (17.1%) are a success condition: existing representations already retain deontic predicates. This three-path decomposition replaces the retracted 87.3% headline throughout the thesis. The replacement is more informative because it separates a coverage problem, which the schema addresses by serialising all 611 clauses, from a force-preservation problem, which it addresses through explicit modality profiling, from a case where force was never at risk.85
Figure 5.12 — Deontic Force Decomposition: Three-Path Honest Reframing Honest decomposition of the retracted 87.3% force collapse headline into three distinct conditions across 164 modal clauses. Coverage gap (45.7%): 75 clauses where ontology-linked artefacts never attempted representation. Actual force collapse (37.2%): 61 clauses where existing representational artefacts flatten normative force into undifferentiated statements. Force preserved (17.1%): 28 clauses where existing representations retain deontic predicates. The schema addresses coverage gap through complete serialisation of all 611 clauses and force collapse through mandatory modality_profile construction. Data from Section 5.7.3.
5.24 Null-Model and Reduced-Schema Comparison
The null-model comparison tests whether the schema produces more than a trivial contribution. The delta values in this subsection are aggregate comparison scores used only to contrast weaker and stronger schema variants over the same 611-clause census; they are not the per-clause resolution rate of Section 5.21, and we keep the two distinct. The null model applies the identity layer alone — SHA-1 unique identifiers without clause classification, modality profiling, or ambiguity governance — while the full schema applies all five layers.
| Schema Variant | Aggregate Comparison Delta | design_requirement |
applicable_to |
rationale |
|---|---|---|---|---|
| Null (identity only) | 0.007 | 0.000 | 0.013 | 0.002 |
| Minimal (identity + clause_class) | 0.315 | 0.254 | 0.342 | 0.317 |
| Full (all 5 layers) | 0.382 | 0.493 | 0.342 | 0.370 |
The full schema reaches an aggregate comparison delta of 0.382 against the null model’s 0.007, a margin of 0.375 that comfortably exceeds the 0.05 floor for non-trivial contribution declared in Section 5.8. The failure-condition test passes in the same move: because the null model scores near zero, the evaluation framework can discriminate effective designs from ineffective ones rather than rewarding any configuration. Decomposing the contribution layer by layer locates the resolution work precisely.
| Configuration | Layers Active | Aggregate Comparison Delta | Marginal Gain |
|---|---|---|---|
| A: Identity only | 1 | 0.007 | — |
| B: Identity + clause role | 2 | 0.307 | +0.301 |
| C: Identity + clause role + deontic | 3 | 0.375 | +0.067 |
| D: Full 5-layer | 5 | 0.375 | +0.000 |
The clause_class layer is the dominant resolution instrument, contributing +0.301 — 80.2% of the total — with deontic profiling adding a further +0.067, and Layers 4 and 5 (ontological mapping and governance audit) adding +0.000 resolution delta. This finding requires careful interpretation, because zero resolution delta does not make those layers unnecessary. They discharge a governance function the resolution metric does not capture — auditability, residual tracking, and downstream handoff readiness — and the five-layer structure was justified in Section 5.2 as three resolution layers plus two governance layers, not as five equally weighted resolution instruments. Sullivan and colleagues’ real-options account of modular software design supplies the warrant: zero-delta layers carry option value for future extensibility that a resolution metric measured at one point in time cannot register.8687
Figure 5.6 — Progressive schema layer contribution Progressive addition of schema layers and their cumulative contribution to overall resolution rate. The clause-role layer contributes +0.301 (80.2% of total delta). Layers 4–5 contribute zero additional resolution delta but serve governance functions. Zero-delta governance layers carry option value for future extensibility.
Read together, these results characterise the artefact’s evaluation evidence exactly for the census on which the suite is built: a measurable, non-trivial reduction in avoidable ambiguity concentrated in the highest-consequence clause class, with the remaining ambiguity preserved as governed, auditable design objects. The robustness checks, integrated assessment against the success criteria, and handoff contracts that consolidate this evidence follow next.
5.11 Robustness, Integrated Assessment, Contributions, and Handoff to Chapter 6
A finding that holds only at one cut-point is a finding about the cut-point, not about the corpus, so the chapter’s headline numbers are tested against threshold choice before any contribution is claimed. The question is narrow and Chapter 5 in scope: whether the polysemy, mapping, and modal results that motivate the standardisation schema survive a change in the frequency threshold that selects eligible lemmas. Validation of the derived primitive vocabulary itself — its coverage and its external grounding against established ontologies — answers a different question, the internal validity of the primitive set, and is reported as Chapter 6 evidence rather than recomputed here. One descriptive anchor from that comparison is retained because it supports the schema-first argument of Section 5.12: the stratified vocabulary covers 32.1 percent of clauses under IFC4 spatial types and 20.8 percent under the BOT ontology.
5.26 Threshold Sensitivity
Polysemy is stable to threshold choice. Across seven frequency thresholds the polysemous proportion rises gently — 121 of 204 eligible lemmas at a threshold of one or more occurrences (59.3 percent), 97 of 157 at four, 86 of 137 at five, 66 of 103 at seven, and 54 of 84 at ten or more (64.3 percent) — with no cliff between any adjacent pair. Because the SDA corpus is a complete census of the serialised source rather than a sample, we report these as plain counts at adjacent cut-offs and attach no interval to them; the spread itself is the robustness evidence, and the core reading — that a majority of eligible domain terms are polysemous — does not rest on an arbitrary cut-point. The whole range sits above both the roughly 40 percent general-English polysemy baseline documented in Navigli’s survey and the 53 percent figure that Zhang, Ma, and Nisbet report for healthcare building requirements, which is consistent with Pustejovsky’s generative-lexicon prediction that regulatory domains, where terms carry several functional roles, carry an elevated burden.88899091
Figure 5.9 — Threshold Sensitivity: Polysemy Ratio Across Frequency Cutpoints Polysemy ratio at six frequency thresholds (>=1 through >=10), reported as plain counts of polysemous over eligible lemmas at each cut-off. The ratio rises gently from 121/204 (0.5931) at >=1 to 54/84 (0.6429) at >=10, with no cliff between adjacent thresholds. The stable range (0.5931-0.6429) sits above both the approximately 40% general-English polysemy baseline (Navigli, 2009) and the 53% healthcare building requirement figure (Zhang et al., 2023), consistent with Pustejovsky’s generative lexicon prediction for regulatory domains. The SDA corpus is a complete census of the serialised source, so the figures are descriptive and carry no inferential interval. Data from Section 5.26 (n = 611 clauses, 204 eligible lemmas at threshold >=1).
Mapping coverage shows a frequency-driven breakpoint of the same descriptive kind. Effective coverage — the proportion of eligible terms receiving a non-fallback primitive assignment — rises from 0.735 at a threshold of three to 1.000 at five and above, which confirms that the 54 fallback terms concentrate in the low-frequency tail at frequencies of three and four, where corpus evidence is thinnest. Modal detection is similarly insensitive: across strict, standard, and expanded keyword sets the detected proportion holds at 22.9, 26.8, and 31.4 percent respectively, with the standard set used throughout the chapter in the middle. The multi-method protocol behind these readings answers known limits of any single approach: Erk’s region model shows that embedding dispersion captures graded sense distinctions discrete methods miss, while Camacho-Collados and Pilehvar identify granularity limits in sense embeddings that require triangulation with knowledge-based methods.9293
5.27 Integrated Assessment Against Success Criteria
The results can now be read against the five success criteria and four rejection conditions declared in Section 5.8. The artefact claim is accepted.
| Criterion | Evidence | Verdict |
|---|---|---|
| 1. Baseline is measurable | All six baseline dimensions show quantifiable burden (Section 5.11) | Met |
| 2. Schema produces non-trivial delta | Aggregate comparison delta 0.382 against null 0.007; margin 0.375 far above the 0.05 threshold | Met |
| 3. Resolution is honest | Dual-mode reporting separates resolution (identity, structural, deontic) from transparency (lexical, mapping) | Met (boundary-limited) |
| 4. Residuals are governed | 54 fallback terms, 29 other-class clauses, and 15 mixed-force clauses all carry explicit flags, reasons, and disposition pathways |
Met (boundary-limited) |
| 5. Outputs are handoff-ready | All 611 rows satisfy field-level contracts with zero validation failures (Section 5.14); the triplet substrate is delivered to Chapter 6 | Met |
Criteria three and four are boundary-limited: both are met under the chapter’s single-rater calibration scope, but external multi-rater calibration of the resolution/transparency boundary and of residual-flag disposition is declared as future work below, so final acceptance at these two is conditional on that exercise. None of the four rejection conditions is triggered: the polysemy proportion of 0.5931 sits above the 0.50 design floor (condition a); the comparison delta clears the non-triviality test (condition b); schema-level output coverage reaches 100 percent, with primitive-vocabulary generalisation reported in Chapter 6 (condition c); and 0 of 611 rows fail field validation (condition d). Under Venable, Pries-Heje, and Baskerville’s FEDS framework the evaluation is artificial-summative, and in Sonnenberg and vom Brocke’s four-stage model it completes Eval 2 (design validation) and Eval 3 (instantiation testing) but does not reach Eval 4 (deployment testing).9495 Prat, Comyn-Wattiau, and Akoka’s taxonomy of evaluation methods confirms that metric-based evaluation in artificial settings is the empirical norm for conceptual artefacts at this maturity.96
Figure 5.7 — External benchmarking coverage Clause coverage comparison between the thesis stratified vocabulary and two established spatial ontologies (IFC4 spatial types, BOT ontology). The thesis achieves 100% coverage (611/611 clauses) compared to IFC4’s 32.1% and BOT’s 20.8%. At the term level, nine of the fourteen entity terms (64.3%) have equivalents in IFC4 or BOT. (SVG re-render pending.)
5.28 Contribution
The standardisation schema’s contribution is a single thing: the governed semantic substrate that downstream artefacts consume. It is one evidence-bearing instantiation of the thesis’s architectural claim, not a cluster of independent contributions, and it is evaluable along three dimensions. The first is governed ambiguity resolution — a measurable reduction of identity instability, structural-role confusion, and deontic-force collapse, concentrated in design_requirement clauses, the class with the highest downstream consequence, at a per-clause resolution rate of 0.570 (Section 5.20). The second is ambiguity as governed transparency — confidence-labelled visibility for what resists deterministic resolution, lexical polysemy across 59.3 percent of eligible terms and the 54 mapping residuals, neither suppressed into false closure nor left untracked. The third is a governed triplet-data substrate — 611 text and 189 figure-derived records in subject–predicate–object form, satisfying every field-level contract with zero validation failures, which the Governed Kernel Architecture (Chapter 6) consumes and from which the notation and the generator inherit. The primitive vocabulary derived from that substrate is a Chapter 6 artefact, not a Chapter 5 claim. Under Gregor and Hevner’s typology these outcomes place the standardisation schema in the improvement quadrant — known foundations of modularity, information hiding, and design-theory anatomy applied to a new domain — with nascent design-theory potential in the five-layer contract and the resolution rate framework.97 The architecture also preserves what Baldwin and Clark term option value: each layer boundary is an interface at which later extensions — practitioner feedback, cross-corpus generalisation, richer polysemy adjudication — can attach without reorganising the contract.98
The contribution claim is deliberately bounded. Chapter 3 frames Proposition 1 (interpretive convergence) as governance under multi-actor, multi-representation adaptation. Chapter 5 confirms its schema-construction component — the schema produces measurably more consistent clause representations than the unschematised source text, reducing interpretive variance at the regulatory-input layer — but defers its adaptation-governance component, whether governed representations improve interpretive consistency when several actors use them across adaptation scenarios, to the integrated evaluation in Chapter 10. The claim here is therefore schema-level interpretive improvement, not system-level governance improvement.
5.29 Limitations
Five limitations bound these claims. First, the evaluation is artificial. It tests the artefact analytically against its specification and outputs without naturalistic deployment, so it cannot show whether practitioners would find the schema useful or whether its clause classifications align with expert judgement — the most significant scope limitation. Second, generalisation is within-corpus only. The schema is validated against a single regulatory instrument; whether its five-layer structure, classification scheme, and polysemy findings transfer to adjacent standards such as the Livable Housing Design Guidelines, AS 1428.1, ISO 21542, or BS 8300 is untested. Third, the evaluation dimensions are self-referential. The five ambiguity dimensions are schema-defined; the null-model and reduced-schema comparisons mitigate this by showing the schema can fail its own evaluation, but the evaluation space remains the designer’s. Fourth, residual drift persists in reproducibility. A 5-lemma drift survives environment correction, so exact polysemy counts are environment-specific even though the structural conclusion is stable.
The fifth limitation is the principal methodological vulnerability, and it is named rather than absorbed into the others. The evaluation apparatus spanning Chapters 4 and 5 is self-contained in a consequential way: the evaluation questions, the metrics, the profiler that computes the resolution rate, the rejection thresholds, and the clause classifications, polysemy judgements, and primitive mappings were all produced by a single researcher. No external human participated at any stage of the chain, from metric design through data collection to interpretation. The mitigation is to specify the missing evidence precisely. A future protocol should measure Cohen’s kappa for clause-role classification on a 50-clause sample, polysemy judgement on 30 terms, and primitive assignment on 50 clauses, against a minimum acceptable agreement of 0.60 (substantial agreement); an external task-based validation, testing whether practitioners interpret clauses differently under the schema than under unstructured text, would supply the missing utility evidence; and expert review of the 54 fallback terms would be the strongest validation signal.99
5.30 Handoff Contracts to Chapter 6
The handoff to the Governed Kernel Architecture (Chapter 6) is governed by five explicit contracts. First, triplet-substrate integrity: Chapter 6 derives its primitive vocabulary from the 611 text and 189 figure-derived records — each with stable record_uid, clause_class, ambiguity_profile, and residual_flag — and may suppress no residual flag, alter no role classification, and claim no resolution of an ambiguity left open here. Second, role preservation: clauses may be reorganised into modules, but clause_class assignments and the inferential force they carry are retained, so a design_requirement keeps its obligatory force in any module. Third, residual inheritance: the 54 fallback terms, 29 other-class clauses, and 15 mixed-force clauses carry their flags, reasons, and disposition pathways forward and remain visible at module level. Fourth, ambiguity-profile carry-forward: polysemy confidence labels and identity-conflict markers are inherited, not overwritten, though Chapter 6 may extend them with module-level information. Fifth, provenance completeness: every clause entering a Chapter 6 module carries a resolved provenance_ref tracing to the SDA source, and module-level claims inherit that chain.100
5.31 Future Work
Two directions follow directly from the limitations. The first, and highest priority, is naturalistic practitioner evaluation, which would advance the work from Eval 3 to Eval 4: domain experts — accessibility consultants, certifiers, or SDA assessors — classifying clauses and assessing residual utility, testing whether schema-governed representations improve interpretive consistency over unstructured text. Douglas and colleagues’ finding of statistically significant improvements, with a large effect, in resident wellbeing and community integration in purpose-designed SDA underscores the stakes: ambiguity in clause interpretation can propagate through certification decisions to measurable tenant outcomes.101 The second is cross-corpus generalisation of the schema to a second regulatory instrument, testing whether the five-layer structure and the polysemy-governance protocol transfer to adjacent accessibility domains; whether the stratified vocabulary itself generalises is a Chapter 6 question. On this basis Chapter 6 proceeds to consume the standardisation schema’s outputs as its semantic substrate, operating under the handoff contracts defined above.
Notes
- C. Y. Baldwin and K. B. Clark, Design Rules, Volume 1: The Power of Modularity. Cambridge, MA, USA: MIT Press, 2000. ↩︎
- H. A. Simon, “The Architecture of Complexity,” Proceedings of the American Philosophical Society, vol. 106, no. 6, pp. 467–482, 1962. ↩︎
- M. P. Gallaher, A. C. O’Connor, J. L. Dettbarn Jr., and L. T. Gilday, Cost Analysis of Inadequate Interoperability in the U.S. Capital Facilities Industry, NIST GCR 04-867, National Institute of Standards and Technology, 2004. ↩︎
- S. T. March and G. F. Smith, “Design and natural science research on information technology,” Decision Support Systems, vol. 15, no. 4, pp. 251–266, 1995. ↩︎
- S. Gregor and D. Jones, “The anatomy of a design theory,” Journal of the Association for Information Systems, vol. 8, no. 5, pp. 312–335, 2007. ↩︎
- C. Eastman, J. M. Lee, Y. S. Jeong, and J. K. Lee, “Automatic rule-based checking of building designs,” Automation in Construction, vol. 18, no. 8, pp. 1011–1033, 2009. ↩︎
- E. Hjelseth, “Foundations for BIM-based model checking systems” (Doctoral dissertation, Norwegian University of Life Sciences), 2015. ↩︎
- W. Solihin, “A simplified BIM data representation using a relational database schema for an efficient rule checking system” (Doctoral dissertation, Georgia Institute of Technology), 2015. ↩︎
- W. Solihin and C. Eastman, “Classification of rules for automated BIM rule checking development,” Automation in Construction, vol. 53, pp. 69–82, 2015. ↩︎
- J. Dimyadi and R. Amor, “Automated building code compliance checking. Where is it at?” in Proceedings of the 19th CIB World Building Congress, 2013. ↩︎
- J. Zhang and N. M. El-Gohary, “Integrating semantic NLP and logic reasoning into a unified system for fully-automated code checking,” Automation in Construction, vol. 73, pp. 45–57, 2017. ↩︎
- C. Eastman et al., 2009; B. Zhong, L. Ding, H. Luo, Y. Zhou, Y. Hu, and H. Hu, “Ontology-based semantic modeling of regulation constraint for automated construction quality compliance checking,” Automation in Construction, vol. 28, pp. 58–70, 2012; X. Tan, A. Hammad, and P. Fazio, “Automated code compliance checking for building envelope design,” Journal of Computing in Civil Engineering, vol. 24, no. 2, pp. 203–211, 2010. ↩︎
- N. O. Nawari, “The challenge of computerizing building codes in a BIM environment,” in Computing in Civil Engineering (2012), pp. 285–292, ASCE, 2012; S. Malsane, J. Matthews, S. Lockley, P. Love, and D. Greenwood, “Development of an object model for automated compliance checking,” Automation in Construction, vol. 49, pp. 51–58, 2015. ↩︎
- W. Solihin and C. Eastman, 2015; E. Hjelseth, 2015. ↩︎
- T. Beach, Y. Rezgui, H. Li, and T. Kasim, “A rule-based semantic approach for automated regulatory compliance in the construction sector,” Expert Systems with Applications, vol. 42, no. 12, pp. 5219–5231, 2015. ↩︎
- Z. Zhang, L. Ma, and N. Nisbet, “Unpacking ambiguity in building requirements to support automated compliance checking,” Journal of Management in Engineering, vol. 39, no. 5, 2023. ↩︎
- F. Noardo et al., “Unveiling the actual progress of Digital Building Permit: Getting awareness through a critical state of the art review,” Building and Environment, vol. 213, 108854, 2022. ↩︎
- R. Navigli, “Word sense disambiguation: A survey,” ACM Computing Surveys, vol. 41, no. 2, Article 10, pp. 1–69, 2009. ↩︎
- J. Pustejovsky, The generative lexicon, MIT Press, 1995. ↩︎
- N. Ide and J. Veronis, “Introduction to the special issue on word sense disambiguation: The state of the art,” Computational Linguistics, vol. 24, no. 1, pp. 1–40, 1998. ↩︎
- A. Kilgarriff, “‘I don’t believe in word senses,’” Computers and the Humanities, vol. 31, no. 2, pp. 91–113, 1997. ↩︎
- K. Erk, “Representing words as regions in vector space,” in Proceedings of CoNLL-2009, pp. 57–65, ACL, 2009. ↩︎
- J. Camacho-Collados and M. T. Pilehvar, “From word to sense embeddings: A survey on vector representations of meaning,” Journal of Artificial Intelligence Research, vol. 63, pp. 743–788, 2018. ↩︎
- E. Agirre and P. Edmonds (Eds.), Word sense disambiguation: Algorithms and applications, Springer, 2006. ↩︎
- A. Raganato, J. Camacho-Collados, and R. Navigli, “Word sense disambiguation: A unified evaluation framework and empirical comparison,” in Proceedings of EACL 2017, pp. 99–110, ACL, 2017. ↩︎
- E. Agirre, O. Lopez de Lacalle, C. Fellbaum et al., “SemEval-2010 Task 17: All-words word sense disambiguation on a specific domain,” in Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 75–80, ACL, 2010. ↩︎
- J. Haber and M. Poesio, “Polysemy — Evidence from linguistics, behavioural science and contextualised language models,” Computational Linguistics, vol. 50, no. 1, pp. 219–261, 2024. ↩︎
- M. Brun and H. Langseth, “Aspects of uncertainty in digital building permit processing: A systematic literature review,” Automation in Construction, vol. 161, 105357, 2024. ↩︎
- D. L. Parnas, “On the criteria to be used in decomposing systems into modules,” Communications of the ACM, vol. 15, no. 12, pp. 1053–1058, 1972. ↩︎
- H. A. Simon, “The Architecture of Complexity,” Proceedings of the American Philosophical Society, vol. 106, no. 6, pp. 467–482, 1962. ↩︎
- C. Y. Baldwin and K. B. Clark, Design rules: Vol. 1. The power of modularity, MIT Press, 2000. ↩︎
- K. J. Sullivan, W. G. Griswold, Y. Cai, and B. Hallen, “The structure and value of modularity in software design,” in Proceedings of ESEC/FSE-9, pp. 99–108, ACM, 2001. ↩︎
- P. Buneman, S. Khanna, and W.-C. Tan, “Why and where: A characterization of data provenance,” in ICDT 2001 (LNCS Vol. 1973), pp. 316–330, Springer, 2001. ↩︎
- L. Moreau and P. Missier (Eds.), PROV-DM: The PROV data model (W3C Recommendation), W3C, 2013. ↩︎
- ISO 19650-1:2018, Organization and digitization of information about buildings and civil engineering works, including BIM — Part 1: Concepts and principles, ISO, 2018. ↩︎
- E. W. Dijkstra, “The structure of the ‘THE’-multiprogramming system,” Communications of the ACM, vol. 11, no. 5, pp. 341–346, 1968. ↩︎
- J. F. Sowa, Conceptual structures: Information processing in mind and machine, Addison-Wesley, 1984. ↩︎
- F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and M. Stal, Pattern-oriented software architecture: A system of patterns (Vol. 1), Wiley, 1996. ↩︎
- S. Gregor and D. Jones, “The anatomy of a design theory,” Journal of the Association for Information Systems, vol. 8, no. 5, pp. 312–335, 2007. ↩︎
- National Disability Insurance Agency, NDIS Specialist Disability Accommodation Design Standard, Australian Government, 2019. ↩︎
- NDIS (Specialist Disability Accommodation) Rules 2020 (Cth). ↩︎
- National Disability Insurance Agency, Specialist Disability Accommodation (SDA) data, NDIS Data and Research, 2025. ↩︎
- J. Douglas, D. Winkler, S. Oliver, S. Liddicoat, and K. D’Cruz, “Moving into new housing designed for people with disability: Preliminary evaluation of outcomes,” Disability & Rehabilitation, vol. 45, no. 8, pp. 1370–1378, 2023. ↩︎
- J. Bringolf, “Barriers to universal design in Australian housing” (Doctoral dissertation, University of Western Sydney), 2011. ↩︎
- A. Beer and D. Faulkner, Housing transitions through the life course: Aspirations, needs and policy, Policy Press, 2011. ↩︎
- R. Imrie, Accessible housing: Quality, disability and design, Routledge, 2006. ↩︎
- ISO 21542:2021, Building construction — Accessibility and usability of the built environment, ISO, 2021. ↩︎
- British Standards Institution, BS 8300-1:2018. BS 8300-2:2018, BSI, 2018. ↩︎
- Access Institute, Accredited SDA assessor — Update 2: Clarifications re the SDA Design Standard interpretations, 2020. ↩︎
- Access Institute, Accredited SDA assessor — Update 4 (amended), 2022. ↩︎
- Morris Goding Access Consulting, Imperfections and interpretations, n.d. ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- National Disability Insurance Agency, NDIS Specialist Disability Accommodation Design Standard, Australian Government, 2019. ↩︎
- Supplementary Ch5 SDA Corpus Integrity Metrics ↩︎
- Supplementary Ch5 Polysemy Metrics and Confidence ↩︎
- R. Venable, J. Pries-Heje, and R. Baskerville, “FEDS: A Framework for Evaluation in Design Science Research,” European Journal of Information Systems, vol. 25, no. 1, pp. 77–89, 2016. ↩︎
- A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in information systems research,” MIS Quarterly, vol. 28, no. 1, pp. 75–105, 2004. ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- Supplementary Ch5 SDA Corpus Integrity Metrics ↩︎
- R. Navigli, “Word sense disambiguation: A survey,” ACM Computing Surveys, vol. 41, no. 2, Article 10, pp. 1–69, 2009. ↩︎
- Z. Zhang, L. Ma, and N. Nisbet, “Unpacking ambiguity in building requirements to support automated compliance checking,” Journal of Management in Engineering, vol. 39, no. 5, 2023. ↩︎
- Supplementary Ch5 Polysemy Metrics and Confidence ↩︎
- Supplementary Ch5 Trigram POS Structural Metrics ↩︎
- Supplementary Ch5 SDA Corpus Integrity Metrics ↩︎
- Supplementary Ch5 Polysemy Metrics and Confidence ↩︎
- Supplementary Ch5 Trigram POS Structural Metrics ↩︎
- Supplementary Ch5 Foundational Mapping Coverage ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- D. L. Parnas, “On the criteria to be used in decomposing systems into modules,” Communications of the ACM, vol. 15, no. 12, pp. 1053–1058, 1972. ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- Supplementary Ch5 Foundational Mapping Coverage ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- Supplementary Ch5 Foundational Mapping Coverage ↩︎
- Supplementary Ch5 Foundational Mapping Coverage ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- Supplementary Ch5 SDA Corpus Integrity Metrics ↩︎
- Supplementary Ch5 SDA Corpus Integrity Metrics ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- K. Erk, “Representing words as regions in vector space,” in Proceedings of CoNLL-2009, pp. 57–65, ACL, 2009. ↩︎
- A. Kilgarriff, “‘I don’t believe in word senses,’” Computers and the Humanities, vol. 31, no. 2, pp. 91–113, 1997. ↩︎
- J. Haber and M. Poesio, “Polysemy — Evidence from linguistics, behavioural science and contextualised language models,” Computational Linguistics, vol. 50, no. 1, pp. 219–261, 2024. ↩︎
- Supplementary Ch5 Results and Data Package documents the per-environment lemma classifications and the dimension-by-dimension propagation analysis in full; the present subsection summarises that analysis at the level of the headline figure. ↩︎
- Supplementary Ch5 SDA Corpus Integrity Metrics ↩︎
- Supplementary Ch5 SDA Corpus Integrity Metrics ↩︎
- K. J. Sullivan, W. G. Griswold, Y. Cai, and B. Hallen, “The structure and value of modularity in software design,” in Proceedings of ESEC/FSE-9, pp. 99–108, ACM, 2001. ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- R. Navigli, “Word sense disambiguation: A survey,” ACM Computing Surveys, vol. 41, no. 2, Article 10, pp. 1–69, 2009. ↩︎
- Z. Zhang, L. Ma, and N. Nisbet, “Unpacking ambiguity in building requirements to support automated compliance checking,” Journal of Management in Engineering, vol. 39, no. 5, 2023. ↩︎
- J. Pustejovsky, The generative lexicon, MIT Press, 1995. ↩︎
- Supplementary Ch5 Polysemy Metrics and Confidence ↩︎
- K. Erk, “Representing words as regions in vector space,” in Proceedings of CoNLL-2009, pp. 57–65, ACL, 2009. ↩︎
- J. Camacho-Collados and M. T. Pilehvar, “From word to sense embeddings: A survey on vector representations of meaning,” Journal of Artificial Intelligence Research, vol. 63, pp. 743–788, 2018. ↩︎
- R. Venable, J. Pries-Heje, and R. Baskerville, “FEDS: A Framework for Evaluation in Design Science Research,” European Journal of Information Systems, vol. 25, no. 1, pp. 77–89, 2016. ↩︎
- C. Sonnenberg and J. vom Brocke, “Evaluations in the science of the artificial — Reconsidering the build-evaluate pattern in design science research,” in DESRIST 2012 (LNCS Vol. 7286), pp. 381–397, 2012. ↩︎
- N. Prat, I. Comyn-Wattiau, and J. Akoka, “A taxonomy of evaluation methods for information systems artifacts,” Journal of Management Information Systems, vol. 32, no. 3, pp. 229–267, 2015. ↩︎
- S. Gregor and A. R. Hevner, “Positioning and presenting design science research for maximum impact,” MIS Quarterly, vol. 37, no. 2, pp. 337–355, 2013. ↩︎
- C. Y. Baldwin and K. B. Clark, Design rules: The power of modularity, vol. 1, MIT Press, 2000. ↩︎
- J. Cohen, “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46, 1960. ↩︎
- Supplementary Ch5 Artefact Suite Handoff Contracts ↩︎
- J. Douglas, D. Winkler, S. Oliver, S. Liddicoat, and K. D’Cruz, “Moving into new housing designed for people with disability: Preliminary evaluation of outcomes,” Disability & Rehabilitation, vol. 45, no. 8, pp. 1370–1378, 2023. ↩︎