Skip to content
modumatics Modular Infrastructure for Inclusive Housing Tran Thien Toan Ngo · PhD Dissertation

Evaluation Workbench

Evaluation measure specifications and evaluation question contracts are consolidated here. These documents govern the empirical portion of the thesis. Each measure is defined at the metric, baseline, unit, and threshold level. Chapter 10 results tables are interpretable against these pre-declared expectations. Full traceability is in the Requirements–Design–Evaluation Traceability Matrix. Environment-derived requirements are operationalised here as measurable evaluation contracts. Each contract can be confirmed or refuted by the Chapter 10 demonstration evidence.

Evaluation Measure Specifications

Seven evaluation measures are operationalised across the thesis. Measures EM-4W-01 to EM-4W-03 are defined and pre-registered in Chapter 4; measures EM-09-01 to EM-09-04 are executed and reported in Chapter 10.

Measure ID Chapter Metric Definition Baseline Definition Unit Threshold Data Source
EM-4W-01 Ch4w Interpretation divergence for matched regulatory obligations Divergence under current synchronous artefact workflow Divergence rate Lower than baseline with practical significance Annotation sheets; protocol logs
EM-4W-02 Ch4w Local-to-global check scope ratio per transformation event Scope ratio in baseline non-modular workflow Ratio Local scope majority in bounded edits Change-trace logs
EM-4W-03 Ch4w Rule-compliant variation yield with exception budget Variation yield under ungoverned complements Compliance proportion Meets pre-registered budget and rationale coverage Generation logs; exception register
EM-09-01 Ch9 Standards interpretability trace completeness Baseline trace completeness Index Improved completeness over baseline Ch9 case outputs
EM-09-02 Ch9 Modular-fit evidence with bounded verification signals Baseline modular-fit without interface contracts Composite score Positive bounded-check trend over baseline Ch9 results tables
EM-09-03 Ch9 Workflow burden delta across time, cognitive, and skill proxies Baseline burden profile for matched tasks Delta Net reduction with stated confidence limits Ch9 workflow outputs
EM-09-04 Ch9 Exception governance quality in discussion synthesis Baseline exception handling quality Quality score Full typing and justification coverage Ch9 discussion evidence

Evaluation Question Contract

The five evaluation questions below are defined to organise the measures into clusters that correspond to the propositions tested in the thesis. Each evaluation question maps to a primary proposition (EQ-01↔︎P1; EQ-02↔︎P2; EQ-03↔︎P3; EQ-04↔︎P4; EQ-05↔︎P5), following the property↔︎proposition↔︎EQ alignment established in Chapter 2 §2.9. Each evaluation question is linked to the environment requirements it addresses and the measures that provide the evidence. Overall, the seven measures span all five evaluation questions, and no requirement is addressed by a measure that cannot be observed in the demonstration evidence. Therefore, the evaluation question contracts documented here establish the interpretive framework within which the Chapter 10 results tables are to be read.

Evaluation Question Requirement IDs Measure IDs Expected Chapter Output
EQ-01 ER-01, ER-04 EM-4W-01, EM-09-01 Standards interpretability evidence
EQ-02 ER-02, ER-05 EM-4W-02, EM-09-02 Modular-fit and bounded-check evidence
EQ-03 ER-01, ER-02 EM-09-01, EM-09-02 Round-trip replay + invariant-preservation evidence (EVID-P3-REPLAY, EVID-P3-INVARIANTS)
EQ-04 ER-06 EM-4W-03, EM-09-04 Governed variation and exception evidence
EQ-05 ER-03, ER-05 EM-09-03 Workflow burden comparison evidence (integrated utility)

Requirement identifiers used here are defined in the Environment-Derived Requirements Register and the Environmental Grounding Dossier. Each identifier traces to at least one design feature and one evaluation measure. In summary, the workbench constitutes the pre-registration record for the thesis’s evaluation. All measures, thresholds, and question-to-requirement linkages are declared before demonstration results are interpreted. Chapter 10 results can therefore be assessed against thresholds set independently of observed outcomes.