Modular Infrastructure for Inclusive Housing — Appendix: Standards Serialisation Schema Specification

Schema Architecture

The standardisation schema is structured as five governed layers, each encapsulating a distinct design concern and exposing only governed interfaces to the layers next to it. The layering is itself an enforcement mechanism: no downstream consumer can bypass the identity, role, or ambiguity conditions an upstream layer has established. The table reads from the most foundational layer (identity) to the most integrative (governance audit), and for each layer it states the concern it owns, the fields it requires, the rule that validates those fields, and the way it handles ambiguity rather than hiding it.

Schema layer	Concern	Required fields	Validation rule	Ambiguity handling
Identity	Stabilise record identity across duplicated clause IDs and preserve source traceability	`record_uid`, `source_clause_id`, `source_text`, `provenance_ref`	`record_uid` unique in the dataset; `source_text` non-empty; `source_clause_id` may repeat	If a `source_clause_id` repeats with distinct text, the rows are kept separate and marked `ambiguity_profile.identity_conflict=true`
Clause role	Preserve the inferential class of each statement for downstream interpretation	`clause_class`, `modality_profile`	A missing or unknown class is invalid; defaulting is disallowed unless explicitly justified	Mixed or uncertain frames are marked `ambiguity_profile.role_conflict=true` with an explanatory note
Lexical-semantic	Represent term evidence and polysemy burden at row level	`term_lemma_set`, `ambiguity_profile`	Each mapped high-frequency lemma carries a confidence label	Polysemous terms retain their confidence label (`strong`, `moderate`, `weak`) and a method-evidence summary
Ontological mapping	Bind terms to primitives and relations under declared constraints	`primitive_assignments`, `relation_assignments`, `residual_flag`	Mappings outside the allowed primitive or relation sets are invalid; each assignment carries a rationale tag	Where no stable assignment exists, `residual_flag` is set true with a `residual_reason` and a review pathway
Governance audit	Keep artefact behaviour reproducible and reviewable across chapter boundaries	`provenance_ref`, `residual_flag`, `notes`	Every verified row traces to its source object and its chapter claim-bank identifier	Residuals are never suppressed; an unresolved row carries a documented disposition stage

Clause Representation Contract (field-level type)

Every schema row is governed by the field-level type contract below. Each field exists because a measured failure mode requires it — the failure analysis is in Chapter 5, Section 5.4 — so no field is optional: removing any one reintroduces the failure mode it was designed to prevent.

Field	Requirement	Allowed values / types	Validation rule	Ambiguity handling
`record_uid`	Required; stable unique identity	`sha1(source_clause_id` concatenated with `source_text)` string	Unique across all rows; non-null	Identity collisions are treated as blocking defects
`source_clause_id`	Required; non-unique allowed	String (e.g. `07-01-17`)	Non-empty	Repeats are allowed only with a distinct `record_uid`
`source_text`	Required	Non-empty string	Length greater than zero after trimming	Text variants are retained as separate evidence rows
`clause_class`	Required	`design_requirement`, `rationale`, `applicable_to`, `other`	Enum membership	An uncertain class requires an explicit note and a `role_conflict` marker
`term_lemma_set`	Required for analytic rows	Array of lemma strings	Array exists; each lemma normalised	Uncertain lemmas are flagged in `ambiguity_profile`
`primitive_assignments`	Required for mapped rows	Array of `{term, primitive, rationale}`	Primitive in the approved set	An unresolved term takes the residual path rather than a forced assignment
`relation_assignments`	Required when a semantic relation is explicit	Array of canonical relation tags	Relation in the approved set	A disputed relation is marked with a confidence label and a note
`modality_profile`	Required	Object (`modal_terms`, `normative_force`)	Object present	Weak or ambiguous force classifications are recorded explicitly
`ambiguity_profile`	Required	Object (`polysemy_confidence`, `identity_conflict`, `role_conflict`, `notes`)	Object present with keys	Confidence labels are preserved from the source analyses
`residual_flag`	Required	Boolean	Boolean present	A true value requires a `residual_reason` and a planned disposition
`provenance_ref`	Required	Source file path plus location anchor	The trace path must resolve	Unresolved provenance is blocking for a verified claim

Foundational Entity Inventory

The foundational inventory is the stratified vocabulary of spatial-semantic entities that remains after the ablation-justified removal of zone. It is organised into seven schematic primitives — irreducible conceptual baselines — and seven elaborated composites construed from them through the relation operators (Canonical Relation Set, below). The set is closed by empirical ablation rather than by definitional fiat: any extension requires a comparable ablation justification before a new term is admitted to the governed schema.

Schematic primitives

Primitive	Domain	Justification
`space`	Spatial entity	Core spatial concept; maps to IFC `IfcSpace` and BOT `bot:Space`
`boundary`	Spatial boundary	Containment and separation construct; maps to IFC `IfcRelSpaceBoundary`
`element`	Built component	Generic physical component; maps to IFC `IfcBuildingElement`
`quality`	Attribute	Dimensional, material, or performance attribute; thesis-specific (ablation: 16.4 per cent information loss on removal)
`activity`	Function	Occupant activity supported by an entity; thesis-specific (ablation: 30.3 per cent information loss on removal)
`context`	Governance	Environmental or regulatory scope qualifier; thesis-specific (ablation justified by coverage necessity)
`actor`	Participant	The human subject a requirement addresses; thesis-specific, made explicit under the re-stratification and inheriting the participant terms formerly recorded under `role` (ablation justified by coverage necessity)

Elaborated composites

Composite	Domain	Justification
`room`	Spatial entity	Named functional space; maps to an IFC `IfcSpace` subtype
`dwelling`	Spatial entity	Highest-level habitable unit; maps to IFC `IfcBuilding`
`opening`	Spatial boundary	Access point between bounded spaces; maps to IFC `IfcOpeningElement`
`path`	Circulation	Designed movement route; maps to IFC `IfcSpace` (circulation subtype)
`level`	Vertical position	Storey or vertical stratum; maps to IFC `IfcBuildingStorey`
`fixture`	Built component	Fixed service or fitting; maps to IFC `IfcFurnishingElement`
`role`	Function	Functional assignment of an `actor` and `activity` to an entity, expressed relationally through `serves_role`; thesis-specific (ablation justified by coverage necessity)

What the flat vocabulary recorded as a relation primitive is reconceived here as the relation-operator layer (the Canonical Relation Set below) rather than as a standalone entity.

Canonical Relation Set

Ten governed relation operators remain after applying to the relation set the same ablation discipline used for the entities — two standard ontological relations and eight thesis-specific ones. Each is retained only because its removal produces measurable information loss in the corpus analysis, so the set is empirically bounded rather than theoretically open-ended.

Relation	Type	Symmetry	Domain	Range
`is_a`	Standard	Asymmetric	Any entity	Any entity
`part_of`	Standard	Asymmetric	Any entity	Any entity
`bounded_by`	Thesis-specific	Asymmetric	`space`, `room`, `dwelling`	`boundary`, `opening`
`opens_to`	Thesis-specific	Asymmetric	`opening`	`space`, `room`, `path`
`connects`	Thesis-specific	Symmetric	`space`, `room`, `path`	`space`, `room`, `path`
`located_at_level`	Thesis-specific	Asymmetric	`space`, `room`, `element`, `fixture`	`level`
`has_quality`	Thesis-specific	Asymmetric	Any entity	`quality`
`serves_role`	Thesis-specific	Asymmetric	`space`, `room`, `element`, `fixture`	`role`
`supports_activity`	Thesis-specific	Asymmetric	`space`, `room`, `element`, `fixture`	`activity`
`within_context`	Thesis-specific	Asymmetric	Any entity	`context`

Design Principles

Six principles govern how the schema is constructed.

Identity before interpretation. Clause identity is anchored to record_uid, not to source_clause_id alone, because duplicate source identifiers can carry distinct statements.
Role-aware representation. A clause class is mandatory because requirement, rationale, and applicability frames do not carry equivalent inferential force.
Ambiguity visibility. Term-level and row-level ambiguity indicators are first-class fields, not post-hoc commentary.
Primitive discipline. Mappings target the foundational primitive and relation sets rather than free-form labels.
Residual explicitness. Unresolved cases remain represented, through residual flags carrying a reason and a planned disposition.
Handoff readiness. Each row preserves the information a downstream artefact needs, so that no reinterpretation is required at the boundary.

Taken together, the six principles hold completeness, transparency, and downstream usability as co-equal requirements rather than as trade-offs to be balanced. The corpus-scale figures below show the coverage achieved under those constraints.

Corpus Scale

Applied across the full SDA Design Standard, the schema governs 800 representations with no validation failure, which is the quantitative warrant that the field-level contract above is satisfiable over the entire instrument rather than only on selected clauses.

Metric	Value
Text clause records	611
Figure-derived design requirements	189
Total governed representations	800
Design categories covered	25
Validation failures against the schema contract	0