Schema Architecture
The standardisation schema is structured as five governed layers, each encapsulating a distinct design concern and exposing only governed interfaces to the layers next to it. The layering is itself an enforcement mechanism: no downstream consumer can bypass the identity, role, or ambiguity conditions an upstream layer has established. The table reads from the most foundational layer (identity) to the most integrative (governance audit), and for each layer it states the concern it owns, the fields it requires, the rule that validates those fields, and the way it handles ambiguity rather than hiding it.
| Schema layer | Concern | Required fields | Validation rule | Ambiguity handling |
|---|---|---|---|---|
| Identity | Stabilise record identity across duplicated clause IDs and preserve source traceability | record_uid, source_clause_id, source_text, provenance_ref |
record_uid unique in the dataset; source_text non-empty; source_clause_id may repeat |
If a source_clause_id repeats with distinct text, the rows are kept separate and marked ambiguity_profile.identity_conflict=true |
| Clause role | Preserve the inferential class of each statement for downstream interpretation | clause_class, modality_profile |
A missing or unknown class is invalid; defaulting is disallowed unless explicitly justified | Mixed or uncertain frames are marked ambiguity_profile.role_conflict=true with an explanatory note |
| Lexical-semantic | Represent term evidence and polysemy burden at row level | term_lemma_set, ambiguity_profile |
Each mapped high-frequency lemma carries a confidence label | Polysemous terms retain their confidence label (strong, moderate, weak) and a method-evidence summary |
| Ontological mapping | Bind terms to primitives and relations under declared constraints | primitive_assignments, relation_assignments, residual_flag |
Mappings outside the allowed primitive or relation sets are invalid; each assignment carries a rationale tag | Where no stable assignment exists, residual_flag is set true with a residual_reason and a review pathway |
| Governance audit | Keep artefact behaviour reproducible and reviewable across chapter boundaries | provenance_ref, residual_flag, notes |
Every verified row traces to its source object and its chapter claim-bank identifier | Residuals are never suppressed; an unresolved row carries a documented disposition stage |
Clause Representation Contract (field-level type)
Every schema row is governed by the field-level type contract below. Each field exists because a measured failure mode requires it — the failure analysis is in Chapter 5, Section 5.4 — so no field is optional: removing any one reintroduces the failure mode it was designed to prevent.
| Field | Requirement | Allowed values / types | Validation rule | Ambiguity handling |
|---|---|---|---|---|
record_uid |
Required; stable unique identity | sha1(source_clause_id concatenated with source_text) string |
Unique across all rows; non-null | Identity collisions are treated as blocking defects |
source_clause_id |
Required; non-unique allowed | String (e.g. 07-01-17) |
Non-empty | Repeats are allowed only with a distinct record_uid |
source_text |
Required | Non-empty string | Length greater than zero after trimming | Text variants are retained as separate evidence rows |
clause_class |
Required | design_requirement, rationale, applicable_to, other |
Enum membership | An uncertain class requires an explicit note and a role_conflict marker |
term_lemma_set |
Required for analytic rows | Array of lemma strings | Array exists; each lemma normalised | Uncertain lemmas are flagged in ambiguity_profile |
primitive_assignments |
Required for mapped rows | Array of {term, primitive, rationale} |
Primitive in the approved set | An unresolved term takes the residual path rather than a forced assignment |
relation_assignments |
Required when a semantic relation is explicit | Array of canonical relation tags | Relation in the approved set | A disputed relation is marked with a confidence label and a note |
modality_profile |
Required | Object (modal_terms, normative_force) |
Object present | Weak or ambiguous force classifications are recorded explicitly |
ambiguity_profile |
Required | Object (polysemy_confidence, identity_conflict, role_conflict, notes) |
Object present with keys | Confidence labels are preserved from the source analyses |
residual_flag |
Required | Boolean | Boolean present | A true value requires a residual_reason and a planned disposition |
provenance_ref |
Required | Source file path plus location anchor | The trace path must resolve | Unresolved provenance is blocking for a verified claim |
Foundational Entity Inventory
The foundational inventory is the stratified vocabulary of spatial-semantic entities that remains after the ablation-justified removal of zone. It is organised into seven schematic primitives — irreducible conceptual baselines — and seven elaborated composites construed from them through the relation operators (Canonical Relation Set, below). The set is closed by empirical ablation rather than by definitional fiat: any extension requires a comparable ablation justification before a new term is admitted to the governed schema.
Schematic primitives
| Primitive | Domain | Justification |
|---|---|---|
space |
Spatial entity | Core spatial concept; maps to IFC IfcSpace and BOT bot:Space |
boundary |
Spatial boundary | Containment and separation construct; maps to IFC IfcRelSpaceBoundary |
element |
Built component | Generic physical component; maps to IFC IfcBuildingElement |
quality |
Attribute | Dimensional, material, or performance attribute; thesis-specific (ablation: 16.4 per cent information loss on removal) |
activity |
Function | Occupant activity supported by an entity; thesis-specific (ablation: 30.3 per cent information loss on removal) |
context |
Governance | Environmental or regulatory scope qualifier; thesis-specific (ablation justified by coverage necessity) |
actor |
Participant | The human subject a requirement addresses; thesis-specific, made explicit under the re-stratification and inheriting the participant terms formerly recorded under role (ablation justified by coverage necessity) |
Elaborated composites
| Composite | Domain | Justification |
|---|---|---|
room |
Spatial entity | Named functional space; maps to an IFC IfcSpace subtype |
dwelling |
Spatial entity | Highest-level habitable unit; maps to IFC IfcBuilding |
opening |
Spatial boundary | Access point between bounded spaces; maps to IFC IfcOpeningElement |
path |
Circulation | Designed movement route; maps to IFC IfcSpace (circulation subtype) |
level |
Vertical position | Storey or vertical stratum; maps to IFC IfcBuildingStorey |
fixture |
Built component | Fixed service or fitting; maps to IFC IfcFurnishingElement |
role |
Function | Functional assignment of an actor and activity to an entity, expressed relationally through serves_role; thesis-specific (ablation justified by coverage necessity) |
What the flat vocabulary recorded as a relation primitive is reconceived here as the relation-operator layer (the Canonical Relation Set below) rather than as a standalone entity.
Canonical Relation Set
Ten governed relation operators remain after applying to the relation set the same ablation discipline used for the entities — two standard ontological relations and eight thesis-specific ones. Each is retained only because its removal produces measurable information loss in the corpus analysis, so the set is empirically bounded rather than theoretically open-ended.
| Relation | Type | Symmetry | Domain | Range |
|---|---|---|---|---|
is_a |
Standard | Asymmetric | Any entity | Any entity |
part_of |
Standard | Asymmetric | Any entity | Any entity |
bounded_by |
Thesis-specific | Asymmetric | space, room, dwelling |
boundary, opening |
opens_to |
Thesis-specific | Asymmetric | opening |
space, room, path |
connects |
Thesis-specific | Symmetric | space, room, path |
space, room, path |
located_at_level |
Thesis-specific | Asymmetric | space, room, element, fixture |
level |
has_quality |
Thesis-specific | Asymmetric | Any entity | quality |
serves_role |
Thesis-specific | Asymmetric | space, room, element, fixture |
role |
supports_activity |
Thesis-specific | Asymmetric | space, room, element, fixture |
activity |
within_context |
Thesis-specific | Asymmetric | Any entity | context |
Design Principles
Six principles govern how the schema is constructed.
- Identity before interpretation. Clause identity is anchored to
record_uid, not tosource_clause_idalone, because duplicate source identifiers can carry distinct statements. - Role-aware representation. A clause class is mandatory because requirement, rationale, and applicability frames do not carry equivalent inferential force.
- Ambiguity visibility. Term-level and row-level ambiguity indicators are first-class fields, not post-hoc commentary.
- Primitive discipline. Mappings target the foundational primitive and relation sets rather than free-form labels.
- Residual explicitness. Unresolved cases remain represented, through residual flags carrying a reason and a planned disposition.
- Handoff readiness. Each row preserves the information a downstream artefact needs, so that no reinterpretation is required at the boundary.
Taken together, the six principles hold completeness, transparency, and downstream usability as co-equal requirements rather than as trade-offs to be balanced. The corpus-scale figures below show the coverage achieved under those constraints.
Corpus Scale
Applied across the full SDA Design Standard, the schema governs 800 representations with no validation failure, which is the quantitative warrant that the field-level contract above is satisfiable over the entire instrument rather than only on selected clauses.
| Metric | Value |
|---|---|
| Text clause records | 611 |
| Figure-derived design requirements | 189 |
| Total governed representations | 800 |
| Design categories covered | 25 |
| Validation failures against the schema contract | 0 |