Skip to content
modumatics Modular Infrastructure for Inclusive Housing Tran Thien Toan Ngo · PhD Dissertation

Schema Architecture

The standardisation schema is structured as five governed layers, each encapsulating a distinct design concern and exposing only governed interfaces to the layers next to it. The layering is itself an enforcement mechanism: no downstream consumer can bypass the identity, role, or ambiguity conditions an upstream layer has established. The table reads from the most foundational layer (identity) to the most integrative (governance audit), and for each layer it states the concern it owns, the fields it requires, the rule that validates those fields, and the way it handles ambiguity rather than hiding it.

Schema layer Concern Required fields Validation rule Ambiguity handling
Identity Stabilise record identity across duplicated clause IDs and preserve source traceability record_uid, source_clause_id, source_text, provenance_ref record_uid unique in the dataset; source_text non-empty; source_clause_id may repeat If a source_clause_id repeats with distinct text, the rows are kept separate and marked ambiguity_profile.identity_conflict=true
Clause role Preserve the inferential class of each statement for downstream interpretation clause_class, modality_profile A missing or unknown class is invalid; defaulting is disallowed unless explicitly justified Mixed or uncertain frames are marked ambiguity_profile.role_conflict=true with an explanatory note
Lexical-semantic Represent term evidence and polysemy burden at row level term_lemma_set, ambiguity_profile Each mapped high-frequency lemma carries a confidence label Polysemous terms retain their confidence label (strong, moderate, weak) and a method-evidence summary
Ontological mapping Bind terms to primitives and relations under declared constraints primitive_assignments, relation_assignments, residual_flag Mappings outside the allowed primitive or relation sets are invalid; each assignment carries a rationale tag Where no stable assignment exists, residual_flag is set true with a residual_reason and a review pathway
Governance audit Keep artefact behaviour reproducible and reviewable across chapter boundaries provenance_ref, residual_flag, notes Every verified row traces to its source object and its chapter claim-bank identifier Residuals are never suppressed; an unresolved row carries a documented disposition stage

Clause Representation Contract (field-level type)

Every schema row is governed by the field-level type contract below. Each field exists because a measured failure mode requires it — the failure analysis is in Chapter 5, Section 5.4 — so no field is optional: removing any one reintroduces the failure mode it was designed to prevent.

Field Requirement Allowed values / types Validation rule Ambiguity handling
record_uid Required; stable unique identity sha1(source_clause_id concatenated with source_text) string Unique across all rows; non-null Identity collisions are treated as blocking defects
source_clause_id Required; non-unique allowed String (e.g. 07-01-17) Non-empty Repeats are allowed only with a distinct record_uid
source_text Required Non-empty string Length greater than zero after trimming Text variants are retained as separate evidence rows
clause_class Required design_requirement, rationale, applicable_to, other Enum membership An uncertain class requires an explicit note and a role_conflict marker
term_lemma_set Required for analytic rows Array of lemma strings Array exists; each lemma normalised Uncertain lemmas are flagged in ambiguity_profile
primitive_assignments Required for mapped rows Array of {term, primitive, rationale} Primitive in the approved set An unresolved term takes the residual path rather than a forced assignment
relation_assignments Required when a semantic relation is explicit Array of canonical relation tags Relation in the approved set A disputed relation is marked with a confidence label and a note
modality_profile Required Object (modal_terms, normative_force) Object present Weak or ambiguous force classifications are recorded explicitly
ambiguity_profile Required Object (polysemy_confidence, identity_conflict, role_conflict, notes) Object present with keys Confidence labels are preserved from the source analyses
residual_flag Required Boolean Boolean present A true value requires a residual_reason and a planned disposition
provenance_ref Required Source file path plus location anchor The trace path must resolve Unresolved provenance is blocking for a verified claim

Foundational Entity Inventory

The foundational inventory is the stratified vocabulary of spatial-semantic entities that remains after the ablation-justified removal of zone. It is organised into seven schematic primitives — irreducible conceptual baselines — and seven elaborated composites construed from them through the relation operators (Canonical Relation Set, below). The set is closed by empirical ablation rather than by definitional fiat: any extension requires a comparable ablation justification before a new term is admitted to the governed schema.

Schematic primitives

Primitive Domain Justification
space Spatial entity Core spatial concept; maps to IFC IfcSpace and BOT bot:Space
boundary Spatial boundary Containment and separation construct; maps to IFC IfcRelSpaceBoundary
element Built component Generic physical component; maps to IFC IfcBuildingElement
quality Attribute Dimensional, material, or performance attribute; thesis-specific (ablation: 16.4 per cent information loss on removal)
activity Function Occupant activity supported by an entity; thesis-specific (ablation: 30.3 per cent information loss on removal)
context Governance Environmental or regulatory scope qualifier; thesis-specific (ablation justified by coverage necessity)
actor Participant The human subject a requirement addresses; thesis-specific, made explicit under the re-stratification and inheriting the participant terms formerly recorded under role (ablation justified by coverage necessity)

Elaborated composites

Composite Domain Justification
room Spatial entity Named functional space; maps to an IFC IfcSpace subtype
dwelling Spatial entity Highest-level habitable unit; maps to IFC IfcBuilding
opening Spatial boundary Access point between bounded spaces; maps to IFC IfcOpeningElement
path Circulation Designed movement route; maps to IFC IfcSpace (circulation subtype)
level Vertical position Storey or vertical stratum; maps to IFC IfcBuildingStorey
fixture Built component Fixed service or fitting; maps to IFC IfcFurnishingElement
role Function Functional assignment of an actor and activity to an entity, expressed relationally through serves_role; thesis-specific (ablation justified by coverage necessity)

What the flat vocabulary recorded as a relation primitive is reconceived here as the relation-operator layer (the Canonical Relation Set below) rather than as a standalone entity.

Canonical Relation Set

Ten governed relation operators remain after applying to the relation set the same ablation discipline used for the entities — two standard ontological relations and eight thesis-specific ones. Each is retained only because its removal produces measurable information loss in the corpus analysis, so the set is empirically bounded rather than theoretically open-ended.

Relation Type Symmetry Domain Range
is_a Standard Asymmetric Any entity Any entity
part_of Standard Asymmetric Any entity Any entity
bounded_by Thesis-specific Asymmetric space, room, dwelling boundary, opening
opens_to Thesis-specific Asymmetric opening space, room, path
connects Thesis-specific Symmetric space, room, path space, room, path
located_at_level Thesis-specific Asymmetric space, room, element, fixture level
has_quality Thesis-specific Asymmetric Any entity quality
serves_role Thesis-specific Asymmetric space, room, element, fixture role
supports_activity Thesis-specific Asymmetric space, room, element, fixture activity
within_context Thesis-specific Asymmetric Any entity context

Design Principles

Six principles govern how the schema is constructed.

  1. Identity before interpretation. Clause identity is anchored to record_uid, not to source_clause_id alone, because duplicate source identifiers can carry distinct statements.
  2. Role-aware representation. A clause class is mandatory because requirement, rationale, and applicability frames do not carry equivalent inferential force.
  3. Ambiguity visibility. Term-level and row-level ambiguity indicators are first-class fields, not post-hoc commentary.
  4. Primitive discipline. Mappings target the foundational primitive and relation sets rather than free-form labels.
  5. Residual explicitness. Unresolved cases remain represented, through residual flags carrying a reason and a planned disposition.
  6. Handoff readiness. Each row preserves the information a downstream artefact needs, so that no reinterpretation is required at the boundary.

Taken together, the six principles hold completeness, transparency, and downstream usability as co-equal requirements rather than as trade-offs to be balanced. The corpus-scale figures below show the coverage achieved under those constraints.

Corpus Scale

Applied across the full SDA Design Standard, the schema governs 800 representations with no validation failure, which is the quantitative warrant that the field-level contract above is satisfiable over the entire instrument rather than only on selected clauses.

Metric Value
Text clause records 611
Figure-derived design requirements 189
Total governed representations 800
Design categories covered 25
Validation failures against the schema contract 0