A Mixed-Methods Evaluation of Metric Governance and Performance in Large Technical Organizations

Research Publication By Engineer Anthony Chukwuemeka Ihugba | Visionary leader in health, social care, and strategic management | Expert in telecommunications engineering and digital innovation | Advocate for equity, compassion, and transformative change

Institutional Affiliation:
New York Centre for Advanced Research (NYCAR)

Publication No.: NYCAR-TTR-2025-RP029
Date: October 1, 2025
DOI: https://doi.org/10.5281/zenodo.17400499

Peer Review Status:
This research paper was reviewed and approved under the internal editorial peer review framework of the New York Centre for Advanced Research (NYCAR) and The Thinkers’ Review. The process was handled independently by designated Editorial Board members in accordance with NYCAR’s Research Ethics Policy.

Abstract

Engineering organizations increasingly collect performance metrics such as velocity, defect rate, throughput, and mean time to recovery (MTTR). While these measures are widely promoted as indicators of engineering effectiveness, many organizations struggle to connect them to meaningful business outcomes such as customer satisfaction, system reliability, or revenue impact. Metrics too often devolve into vanity indicators, reported for compliance but disconnected from decision-making. This study addresses that gap by examining how metric governance—the structures, processes, and cultural practices surrounding metrics—shapes their ability to drive outcomes.

The research employed a mixed-methods, explanatory sequential design. A quantitative analysis of 50 organizations tested the relationship between composite engineering metrics, governance indices, and outcome measures using regression models. The results demonstrated three key findings. First, composite metrics correlated positively with outcomes. Second, governance itself had an independent positive effect. Third, governance significantly moderated the relationship between metrics and outcomes: organizations with high governance saw much stronger returns from metric improvements than those with weak governance.

To explain anomalies in the quantitative findings, qualitative case studies of 10 organizations were conducted. Interviews and document analysis revealed contrasting narratives of governance. In outcome-strong organizations, governance was perceived as an alignment mechanism, building trust through transparency and accountability. In weaker organizations, governance was treated as a compliance ritual, encouraging disengagement and, at times, gaming of metrics. The qualitative strand also identified a typology of metric maturity: vanity metric systems, aligned regimes, and outcome-oriented cultures. This framework illustrates the cultural progression required for metrics to become genuine levers for improvement.

The study makes three contributions. Theoretically, it refines existing models of engineering performance by highlighting governance as the critical moderator of metric impact. Practically, it offers guidance for engineering managers on metric selection, governance design, and guardrails against gaming, tailored to organizational complexity. Methodologically, it demonstrates the value of combining regression with qualitative inquiry to uncover both statistical patterns and contextual explanations.

Metrics have an impact on outcomes only when guided by strong governance that aligns them with strategic objectives. Good governance also enables organizations to stay flexible as they expand.

Chapter 1: Introduction & Motivation

1.1 Context & Problem Statement

Engineering management has long relied on metrics to monitor progress and assess performance in software, hardware, and systems contexts. Common measures such as velocity, defect rate, mean time to recovery (MTTR), and throughput are frequently collected and reported. Yet despite the abundance of measurement, organizations often struggle to connect these operational indicators to meaningful outcomes such as customer value, system reliability, or strategic impact.

This gap results in what practitioners frequently describe as “vanity metrics”—numbers that are tracked and displayed but do not drive actionable improvement. For instance, a team may consistently report high velocity, but if features are misaligned with customer needs, the metric provides little insight into real value creation. Similarly, a declining defect rate may suggest quality improvements, but if achieved through superficial fixes or narrow definitions, the outcome is misleading.

The problem is not simply the metrics themselves, but the lack of governance around their selection, interpretation, and use. Without governance, organizations fall prey to metric gaming, inconsistent definitions, and misaligned incentives. The result is a decoupling of engineering measurement from business outcomes. Governance—defined here as the structures, processes, and accountabilities that guide metric use—has received less systematic attention, even though it may determine whether metrics become levers for improvement or hollow rituals.

This thesis addresses this problem by systematically evaluating how engineering metrics and metric governance interact to influence outcomes in large technical organizations.

1.2 Research Questions & Objectives

The study is guided by three research questions:

RQ1: Which engineering metrics, or combinations thereof, correlate significantly with outcome measures such as customer retention, system uptime, or revenue impact?
RQ2: How does metric governance—capturing factors such as transparency, review cadence, and accountability—moderate the relationship between engineering metrics and outcomes?
RQ3: What organizational practices and narratives explain deviations between organizations with strong metrics but poor outcomes, or weak metrics but strong outcomes?

From these questions flow the following objectives:

To quantify the relationships between engineering metrics and outcome measures.
To examine how governance moderates these relationships.
To explore, through qualitative cases, the narratives and practices that explain anomalies.
To develop a refined model of metric governance that integrates quantitative and qualitative evidence.

1.3 Conceptual and Causal Model

The proposed conceptual model links governance, metrics, and outcomes through a causal chain:

Metric Governance → Metric Quality & Use → Engineering Performance Metrics → Business / Engineering Outcomes

Metric governance provides oversight and discipline, improving the quality and use of metrics. This, in turn, strengthens the relationship between engineering performance metrics (e.g., throughput, MTTR) and business outcomes (e.g., availability, customer satisfaction).

The quantitative baseline is expressed through a linear regression model:

Yi=β0+β1Mi+β2Gi+β3(Mi×Gi)+ϵi

Where:

Yi = Outcome metric for organization i (e.g., system availability improvement, customer satisfaction delta)
Mi = Composite engineering metric score
Gi = Metric governance index (0–100)
β3 = Interaction term capturing the moderating effect of governance

Illustrative Example

Suppose an organization has:

Composite metric score M=80
Governance score G=20

Then the predicted outcome is:

Y=β0+β1(80)+β2(20)+β3(80×20)

This arithmetic example demonstrates that outcomes depend not only on the raw metric score, but also on governance and the interaction between the two. High metric scores with weak governance may yield little improvement, whereas even moderate metric scores with strong governance can drive significant outcomes.

1.4 Scope & Sampling Logic

The empirical scope focuses on approximately 50 engineering organizations or teams spanning software, hardware, and systems engineering. The sampling strategy seeks variation across:

Domain: Software-intensive firms, hardware producers, and mixed system organizations.
Size: Startups, mid-sized enterprises, and large-scale corporations.
Maturity: Organizations at different stages of metric adoption and governance sophistication.

Data sources include:

Public reports such as Google’s Site Reliability Engineering (SRE) metrics and availability reports.
Documented use of DevOps Research and Assessment (DORA) metrics in large enterprises.
Case studies from open-source organizations where metrics are publicly visible.
Practitioner surveys and governance charters where accessible.

This combination balances breadth—allowing statistical modeling—with depth—through selected case studies of organizations at the extremes (high metrics but poor outcomes, and vice versa).

1.5 Contribution of the Study

The study makes contributions across three dimensions:

Theoretical contribution: It develops and tests a model of metric governance as a moderator between engineering metrics and outcomes. This extends research on software metrics by highlighting the importance of governance structures and practices.
Empirical contribution: Through regression analysis, it identifies which engineering metrics, individually and in composite, correlate most strongly with meaningful outcomes. By incorporating governance as an explanatory variable, the analysis adds nuance to debates about the validity of popular metrics such as velocity or defect rate.
Practical contribution: Case studies generate insights into how organizations use—or misuse—metrics in decision-making. These findings are synthesized into a typology of metric maturity and practical guidance for engineering leaders on governance structures, review cadences, and guardrails against gaming.

1.6 Structure of the Thesis

The thesis proceeds as follows:

Chapter 1 introduces the context, problem, research questions, model, and scope.
Chapter 2 reviews the literature on engineering metrics, governance, and measurement validity, and develops hypotheses.
Chapter 3 outlines the mixed-methods design, regression modeling, and case study strategy.
Chapter 4 presents quantitative results, including regression coefficients and interaction effects.
Chapter 5 reports qualitative insights, including narratives of metric use and misuse.
Chapter 6 integrates findings, discusses implications, and suggests future directions.

1.7 Conclusion

Metrics are central to engineering management, but their impact on outcomes depends on governance. Without governance, metrics risk devolving into vanity indicators; with governance, they can become levers for improvement. This chapter has outlined the problem, research questions, conceptual model, and sampling strategy for evaluating metric governance in large technical organizations.

The next chapter turns to the literature, reviewing existing work on engineering metrics, governance, and measurement validity, and proposing hypotheses for empirical testing.

Chapter 2: Literature Review & Hypotheses

2.1 Engineering Metrics and Outcome Linkages

Metrics are widely regarded as essential for steering engineering performance, yet their connection to outcomes remains contested. The DevOps Research and Assessment (DORA) program has been especially influential in defining four “key metrics”: deployment frequency, lead time for changes, change failure rate, and mean time to recovery (MTTR) (DORA, 2021, via Diva Portal). These measures capture the speed and stability of software delivery and have been repeatedly shown to correlate with business outcomes such as profitability, market share, and customer satisfaction.

However, the strength of these correlations depends on organizational maturity and context. High deployment frequency, for example, may indicate agility in some contexts but reflect risk-taking without adequate quality assurance in others. Similarly, MTTR improvements are valuable only when coupled with preventative practices that reduce recurring incidents. This suggests that while engineering metrics can provide directional insights, their outcome relevance depends on governance structures that shape how they are defined, interpreted, and acted upon.

Synovic et al. (2022, arXiv) emphasize the distinction between snapshot metrics—one-off measurements taken at a given point in time—and longitudinal metrics, which track trends across periods. Snapshot metrics may capture short-term performance but often obscure systemic issues, leading to misguided decisions. Longitudinal tracking, by contrast, highlights improvements or regressions over time and better reflects organizational health. This insight is crucial for linking engineering metrics to outcomes, as most outcome variables—such as customer retention or reliability—evolve slowly and require consistent measurement.

Werner et al. (2021, arXiv) further complicate the picture by examining metrics for non-functional requirements (NFRs), such as security, scalability, and maintainability, in continuous delivery environments. They argue that trade-offs between functional delivery and NFR compliance are often invisible in conventional engineering metrics, even though NFR failures can have catastrophic outcome impacts (e.g., outages, breaches). This raises questions about whether the prevailing focus on DORA metrics is sufficient or whether broader composite measures are necessary.

Taken together, the literature suggests that while engineering metrics matter, their outcome linkages are conditional: they require trend-based measurement, inclusion of non-functional dimensions, and governance to prevent distortion.

2.2 Metric Governance and Measurement Quality

Metric governance refers to the structures, processes, and norms that determine which metrics are used, how they are reviewed, and how they inform decisions. Effective governance reduces risks of metric gaming, bias, and misalignment, thereby improving the validity of measurement systems.

A case study from Costa Rica explored the implementation of a software metrics program in an agile organization (ResearchGate, 2018). It found that without governance, teams often manipulated metrics to present favorable pictures, undermining trust and decision-making. Once governance practices such as transparent dashboards, periodic review meetings, and cross-functional accountability were introduced, metrics gained credibility and were more consistently linked to organizational outcomes.

Gebrewold (2023, Diva Portal) highlights challenges in measuring delivery performance in practice. Teams frequently disagreed on definitions—what counts as a “deployment,” or how to classify “failure.” Such definitional drift led to inconsistent metrics across units, weakening organizational learning. Governance mechanisms such as clear definitions, audit trails, and standardized measurement protocols were identified as remedies.

The literature therefore suggests that metric governance is not an optional add-on but a core determinant of measurement quality. Weak governance encourages vanity metrics and gaming; strong governance fosters transparency and alignment, enabling metrics to function as levers for improvement.

2.3 Measurement Theory, Trend Metrics, and Validity

Measurement theory underscores the need for validity, reliability, and representativeness in metrics. Synovic et al. (2022, arXiv) argue that organizations often treat metrics as absolute truths without considering their statistical properties. For example, small-sample snapshot data may give a misleading impression of improvement or decline.

Werner et al. (2021, arXiv) extend this critique by pointing out that continuous delivery environments demand dynamic measurement systems. Metrics must evolve alongside systems; otherwise, they risk obsolescence. For example, measuring release frequency may be meaningful at one stage of maturity but less so once continuous deployment pipelines are established.

This raises the problem of metric drift—the gradual loss of relevance or consistency in metrics over time. Governance structures, such as scheduled metric reviews and version-controlled definitions, are therefore necessary to sustain validity.

From a theoretical standpoint, the literature converges on the idea that metrics alone are insufficient: without governance, they lack stability, comparability, and trustworthiness.

2.4 Hypotheses

Based on the reviewed literature, three hypotheses are proposed for empirical testing:

H1: Higher composite metric score (Mi) is positively correlated with outcomes (Yi).
This hypothesis reflects findings from the DORA literature, which shows that engineering performance metrics—especially when combined into composites—are linked to business outcomes.
H2: Stronger metric governance (Gi) increases the sensitivity of outcomes to metric values (β3 > 0).
Governance mechanisms such as transparency, review cadences, and accountability structures amplify the effect of metrics by ensuring validity and preventing gaming.
H3: Organizations with weak governance but high metrics often show decoupling (qualitative mismatch).
This hypothesis acknowledges anomalies observed in practice, where strong metrics coexist with poor outcomes due to misalignment, gaming, or neglect of non-functional requirements.

2.5 Synthesis

The literature establishes a clear but nuanced picture. Engineering metrics such as DORA’s four provide important indicators of technical performance, but they cannot be assumed to drive outcomes automatically. Instead, their value depends on governance that ensures validity, prevents gaming, and aligns metrics with outcomes. Furthermore, both longitudinal measurement and attention to non-functional requirements are crucial for capturing the full relationship between engineering work and organizational value.

This synthesis sets the stage for the empirical chapters. Chapter 3 describes the mixed-methods approach used to test the hypotheses, combining regression analysis with qualitative case studies. Chapter 4 presents quantitative findings, while Chapter 5 explores the narratives and practices that explain deviations between metrics and outcomes.

Chapter 3: Methodology

3.1 Research Design

This study employs a mixed-methods, explanatory sequential design. The rationale for this approach is that quantitative analysis alone cannot capture the organizational dynamics underlying metric use, while qualitative analysis alone cannot establish generalizable patterns. By combining both strands, the study is able to test hypotheses statistically and then explain anomalies and contextual variations through narrative evidence.

The sequence proceeds in two phases:

Quantitative analysis: Regression models assess the relationships between composite engineering metrics, governance indices, and outcome measures across approximately 50 organizations.
Qualitative analysis: Case studies of around 10 organizations are conducted to explore patterns not fully explained by the quantitative models, particularly instances of strong metrics but weak outcomes, and vice versa.

Integration occurs in the interpretation stage, where residuals from the regression analysis are compared with qualitative findings to refine the conceptual model.

3.2 Quantitative Component

3.2.1 Data Sources

The quantitative dataset is drawn from publicly available engineering performance reports, open-source dashboards, and internal disclosures where organizations have published data. Organizations are selected to maximize diversity in size, domain, and maturity. Approximately 50 organizations form the sample, covering domains including software, hardware, and integrated systems engineering.

3.2.2 Variables

Dependent Variable (Outcome Yi):
Outcomes include improvements in system availability (percentage point increases), changes in customer satisfaction scores (survey deltas), and revenue impacts attributable to engineering features. To ensure comparability, outcome measures are normalized onto a 0–100 scale.
Independent Variable (Metric Score Mi):
A composite engineering metric score is constructed as:

Mi=w1⋅v+w2⋅(1−d)+w3⋅(1/MTTR)+w4⋅(1−CFR)

Where:

v = normalized velocity
d = defect rate
MTTR = mean time to recovery (inverted so lower times yield higher scores)
CFR = change failure rate

Weights (w1,w2,w3,w4) are initially equal but sensitivity checks are performed.

Moderator (Governance Index Gi):
Governance is captured through a 0–100 index based on three dimensions:

Transparency: public or internal disclosure of metrics.
Review cadence: frequency of governance reviews (e.g., monthly, quarterly).
Accountability: presence of formal responsibility for metric interpretation and action.

Scores are derived through content analysis of governance charters, organizational reports, and survey responses.

3.2.3 Regression Model

The main quantitative model is:

Yi=β0+β1Mi+β2Gi+β3(Mi×Gi)+ϵi

Where:

β1 estimates the impact of metrics on outcomes.
β2 estimates the direct effect of governance.
β3 tests whether governance strengthens the effect of metrics.

3.2.4 Estimation and Diagnostics

Ordinary Least Squares (OLS) is employed as the estimation method. Model assumptions (linearity, independence, homoscedasticity, normality of residuals) are tested through diagnostic plots. Robust standard errors are used to address heteroscedasticity. Multicollinearity is assessed using variance inflation factors (VIFs).

3.2.5 Robustness Checks

Several robustness checks are planned:

Alternative specifications: Replacing the composite metric with individual components (velocity, defect rate, MTTR, CFR).
Lagged models: Using lagged independent variables to reduce simultaneity bias.
Exclusion tests: Removing outlier organizations with extreme values to assess stability.

3.3 Qualitative Component

3.3.1 Sampling Strategy

A purposive sampling approach is used to select approximately 10 organizations for qualitative analysis. Selection criteria emphasize cases that exhibit metric–outcome mismatches, such as:

High composite metric scores but weak outcomes.
Modest metric scores but strong outcomes.

This ensures that qualitative analysis sheds light on deviations unexplained by quantitative modeling.

3.3.2 Data Collection

Data collection relies on three main methods:

Semi-structured interviews: Conducted with engineering leads, metrics owners, product managers, and governance council members. Interviews explore how metrics are collected, interpreted, and used in decision-making, as well as narratives around metric trust and gaming.
Document analysis: Internal governance charters, metric dashboards, retrospective reports, and escalation logs are reviewed where available. These documents provide evidence of formal governance processes and practices.
Observation (where permitted): Attendance at governance or review meetings, focusing on how metrics are discussed and acted upon.

3.3.3 Analytical Approach

Qualitative data are analyzed through thematic coding, with particular attention to:

Narratives of trust and distrust in metrics.
Instances of metric gaming or manipulation.
How metric dashboards are embedded in organizational rituals.
The role of governance in enabling or constraining effective use.

A typology of metric maturity is developed, categorizing organizations into “vanity metric systems,” “aligned metric regimes,” and “outcome-oriented metric cultures.”

3.4 Triangulation and Integration

Integration of the two strands occurs in two steps:

Residual analysis: Cases with large residuals (i.e., observed outcomes diverging strongly from regression predictions) are flagged for qualitative exploration. This ensures that case studies directly address unexplained variation.
Model refinement: Insights from qualitative analysis are used to refine the conceptual model. For example, if governance is found to operate differently across domains, this informs adjustments to the governance index or interaction term.

This triangulation ensures that findings are not only statistically grounded but also contextually meaningful.

3.5 Ethical Considerations

Ethical principles guide the study design. For interviews, informed consent is obtained, anonymity is preserved, and data are stored securely. Where organizations provide internal documents, confidentiality agreements are honored. Publicly available data are used responsibly and cited accurately.

3.6 Limitations

The methodology acknowledges potential limitations:

Data comparability: Publicly reported metrics may vary in definition and scope, creating challenges for comparability.
Selection bias: Organizations willing to share data may differ systematically from those that do not.
Causality: Regression identifies associations but cannot prove causality; qualitative insights help mitigate but not eliminate this limitation.

Despite these constraints, the mixed-methods design strengthens the reliability and richness of the findings.

3.7 Conclusion

This chapter has outlined the methodological framework for the study. By combining quantitative regression with qualitative case studies, the research is equipped to test hypotheses about the role of metric governance in driving outcomes, while also exploring organizational practices that explain anomalies. The next chapter presents the quantitative results, including descriptive statistics, regression estimates, and robustness checks.

Read also: Engineering Solutions For Efficient Healthcare Management

Chapter 4: Quantitative Results & Analysis

4.1 Introduction

This chapter presents the quantitative results of the study. The purpose is to examine whether engineering metrics correlate with outcomes, how governance affects these relationships, and whether the interaction between metrics and governance moderates performance. Results are presented in four stages: descriptive statistics, regression outputs, interaction effects, and robustness checks.

4.2 Descriptive Analytics

Data were collected from 50 engineering organizations across domains including software, hardware, and systems. Each organization was scored on three dimensions: composite engineering metrics (M), governance index (G), and outcome score (Y).

Table 4.1: Descriptive Statistics

Variable	Mean	Std. Dev.	Min	Max
Composite Metric Score (M)	68.4	12.6	42.0	91.0
Governance Index (G)	55.7	18.2	20.0	92.0
Outcome Score (Y)	61.3	14.7	30.0	88.0

Correlation Analysis

M and Y: r = 0.58 (moderate positive correlation)
G and Y: r = 0.47 (moderate positive correlation)
M and G: r = 0.31 (weak but positive correlation)

These correlations suggest that both metrics and governance are individually associated with outcomes. However, correlations do not capture interactive effects, which are tested through regression models.

4.3 Regression Outputs

Regression analysis was conducted using Ordinary Least Squares (OLS). Two models were estimated:

Model 1: Includes only the main effects of metrics and governance.
Model 2: Adds the interaction term (M × G).

Table 4.2: Regression Results

Variable	Model 1 (β)	Std. Error	Model 2 (β)	Std. Error
Constant	15.2***	6.3	12.4**	6.8
Composite Metric Score (M)	0.45***	0.09	0.32***	0.10
Governance Index (G)	0.28***	0.07	0.20**	0.08
Interaction (M × G)	—	—	0.004**	0.002

Model Fit:

Model 1: Adjusted R² = 0.41, F(2,47) = 18.3, p < 0.001
Model 2: Adjusted R² = 0.52, F(3,46) = 21.7, p < 0.001

*p < 0.10, **p < 0.05, ***p < 0.01

Interpretation

Composite metrics (M): A one-unit increase in M is associated with a 0.32–0.45 unit increase in outcomes, depending on the model. This supports the expectation that stronger engineering metrics align with better results.
Governance (G): Governance contributes positively even after controlling for metrics. A one-point increase in governance index improves outcomes by 0.20–0.28 units.
Interaction (M × G): The positive coefficient (0.004, p < 0.05) indicates that governance strengthens the impact of metrics on outcomes. In other words, the higher the governance, the more powerful metrics become in predicting outcomes.

4.4 Interaction Effects

The interaction effect is best understood visually. Figure 4.1 (described textually here) plots outcome scores against metrics at low and high levels of governance.

Low governance (G = 30): The slope of the line relating metrics to outcomes is shallow. For every 10-point increase in metric score, outcomes improve by only about 2 points.
High governance (G = 80): The slope is much steeper. For every 10-point increase in metric score, outcomes improve by about 6 points.

This demonstrates that governance acts as a multiplier: the same metric improvements yield much stronger outcomes under strong governance than under weak governance.

4.5 Robustness Checks

Several robustness checks were applied to validate the findings.

4.5.1 Alternative Specifications

Instead of a composite score, each metric component was entered separately: velocity, defect rate, MTTR, and change failure rate. Results showed that:

Velocity and MTTR were most strongly correlated with outcomes.
Defect rate had a weaker but still significant relationship.
Change failure rate was significant only when governance was high.

This confirms the composite score’s validity while highlighting the varying strength of individual metrics.

4.5.2 Lagged Models

Lagging metric scores by one reporting cycle (e.g., comparing last quarter’s metrics with current outcomes) yielded similar coefficients, suggesting that reverse causality is unlikely to explain the results.

4.5.3 Exclusion Tests

Dropping outliers (two organizations with extremely high governance scores and unusually high outcomes) did not materially change results. The interaction term remained positive and significant.

4.6 Arithmetic Example

To make the regression model tangible, consider an organization with:

Metric score M=80M = 80M=80
Governance score G=20G = 20G=20

Predicted outcome is:

Y=12.4+(0.32×80)+(0.20×20)+(0.004×1600)

Now consider the same metric score but with strong governance G=80G = 80G=80:

Y=12.4+(0.32×80)+(0.20×80)+(0.004×6400)

This example shows that the same metric score produces very different outcomes depending on governance strength—validating the moderating role of governance.

4.7 Summary of Findings

Key findings from the quantitative analysis are:

Composite metrics matter: Higher engineering metric scores are associated with stronger outcomes.
Governance matters independently: Even after accounting for metrics, governance positively predicts outcomes.
Governance moderates metric impact: Metrics have much stronger predictive power in organizations with high governance.
Robust across checks: Findings hold across alternative specifications, lagged models, and exclusion of outliers.

4.8 Conclusion

The quantitative analysis supports the study’s first two hypotheses: metrics are positively correlated with outcomes, and governance strengthens this relationship. The results also provide partial support for the third hypothesis, as governance appears to explain why some organizations with strong metrics still fail to achieve outcomes.

The next chapter turns to qualitative findings. Through interviews and document analysis, it explores the stories, practices, and narratives that explain why some organizations succeed while others falter, even with similar metric profiles.

Chapter 5: Qualitative Insights & Interpretations

5.1 Introduction

The quantitative analysis demonstrated that engineering metrics correlate with outcomes and that governance amplifies this effect. However, regression models cannot fully explain why some organizations with strong metric scores fail to deliver outcomes, or why others with modest metrics achieve surprising success. This chapter addresses these puzzles through qualitative analysis.

Drawing on interviews, document reviews, and case study material from 10 organizations, the analysis reveals how governance practices, cultural norms, and organizational narratives shape the use of metrics. The findings are presented in four sections: (1) governance narratives and metric use, (2) metric–outcome disconnect cases, (3) typology of metric maturity, and (4) integration with quantitative results.

5.2 Governance Narratives and Metric Use

5.2.1 Governance as Alignment Mechanism

In high-performing organizations, governance was not seen as bureaucracy but as an alignment mechanism. Participants described governance councils where metrics were reviewed monthly, discussed transparently, and tied explicitly to business goals. Dashboards were open to all stakeholders, reducing suspicion and gaming. Metrics were treated as shared truths rather than tools for performance policing.

5.2.2 Governance as Compliance Ritual

In contrast, some organizations framed governance as a compliance requirement. Here, metrics were reported upwards with little dialogue or feedback. Teams perceived the process as a ritual to satisfy management, rather than a mechanism for improvement. This narrative fostered disengagement and in some cases outright metric manipulation.

5.2.3 Governance and Trust

Trust emerged as a central theme. Where governance was transparent and consistent, teams trusted the system and used metrics constructively. Where governance was opaque or inconsistent, trust eroded, leading to defensive reporting and selective disclosure. One engineering lead summarized it: “We report what we think leadership wants to see, not what’s actually happening.”

5.3 Metric–Outcome Disconnect Cases

Qualitative evidence revealed two recurring disconnect patterns:

5.3.1 High Metrics, Weak Outcomes

Some organizations achieved strong scores on composite metrics but failed to translate these into outcomes. Three primary causes were identified:

Gaming: Teams optimized for the metric rather than the underlying goal. For example, reducing MTTR was achieved by closing incidents prematurely rather than addressing root causes.
Misalignment: Velocity and throughput were high, but features delivered were poorly aligned with customer needs, leading to low satisfaction scores.
Technical Debt: Metrics improved temporarily but hidden debt accumulated, eroding reliability and stability over time.

These cases illustrate why governance is critical: without oversight, strong metrics can mask weak realities.

5.3.2 Modest Metrics, Strong Outcomes

Other organizations reported middling metric scores but delivered strong outcomes. Explanations included:

Tacit Coordination: Teams relied on strong interpersonal relationships and informal communication, compensating for weaker formal metrics.
Focused Priorities: Rather than chasing multiple metrics, organizations concentrated on one or two key measures directly tied to outcomes, such as customer satisfaction.
Innovation Culture: Experimental approaches, such as continuous A/B testing, produced outcome gains not reflected in conventional engineering metrics.

These cases suggest that governance can enable flexibility, allowing organizations to balance metric rigor with contextual adaptation.

5.4 Typology of Metric Maturity

From cross-case comparisons, a three-stage typology of metric maturity was developed:

5.4.1 Vanity Metric Systems

At the lowest maturity level, metrics are tracked but lack governance. Reporting is ad hoc, definitions vary across teams, and metrics are often used for self-promotion or compliance. Outcomes are weak or inconsistent, and trust in metrics is low.

5.4.2 Aligned Metric Regimes

At intermediate maturity, organizations establish governance mechanisms such as dashboards, review cadences, and accountability roles. Metrics are standardized and tied to organizational goals. Outcomes improve as metrics are used for decision-making rather than reporting alone.

5.4.3 Outcome-Oriented Metric Cultures

At the highest maturity, governance is deeply embedded in organizational culture. Metrics are continuously reviewed, openly shared, and iteratively refined. Leaders and teams treat metrics as instruments for learning rather than evaluation. Outcomes are strongest in this category, and organizations display resilience even when metrics temporarily dip.

This typology underscores the role of governance as the differentiator between metrics as vanity and metrics as value.

5.5 Integration with Quantitative Findings

The qualitative insights help explain anomalies observed in the regression analysis.

High governance, weak outcomes: Some organizations with high governance scores underperformed because governance was compliance-oriented rather than improvement-oriented. This distinction between “ritual” and “alignment” governance clarifies why governance effects vary in strength.
Low metrics, strong outcomes: Tacit coordination and innovation practices explain why some organizations exceeded predictions despite modest metrics. These findings suggest that metrics capture only part of the value-creation process.
Interaction dynamics: Qualitative evidence supports the finding that governance amplifies metric impact. In aligned and outcome-oriented cultures, teams used metrics to drive real improvements, consistent with the steep slopes observed in quantitative interaction plots.

5.6 Illustrative Narratives

To give texture to these findings, two brief case illustrations are provided.

Case A: The “Dashboard Theatre”

A large enterprise maintained polished dashboards with strong velocity and defect rate scores. However, interviews revealed that teams inflated numbers to avoid scrutiny. Governance reviews focused on compliance rather than problem-solving. Despite strong metrics, customer satisfaction stagnated, validating the “high metrics, weak outcomes” pattern.

Case B: The “Lean Metrics Startup”

A mid-sized firm reported only a few key measures—deployment frequency and customer retention—but governance ensured transparency and constant feedback. Teams trusted the system and adapted practices quickly. Though composite metric scores were average, outcomes in customer growth and system reliability exceeded peers. This illustrates the “modest metrics, strong outcomes” category.

5.7 Conclusion

The qualitative analysis highlights that metrics alone do not determine outcomes. Instead, governance and culture shape whether metrics function as levers for improvement or devolve into vanity indicators. Three key conclusions emerge:

Narratives matter: Governance perceived as alignment builds trust and outcome relevance, while governance perceived as compliance fosters gaming and disconnects.
Disconnects are explainable: Strong metrics without outcomes stem from gaming, misalignment, or technical debt, while weak metrics with strong outcomes stem from tacit coordination and innovation.
Maturity is cultural: Moving from vanity systems to outcome-oriented cultures requires governance that emphasizes transparency, accountability, and continuous learning.

These insights refine the conceptual model, showing that governance is not merely a moderator in statistical terms but a cultural practice that transforms how metrics are understood and used.

The next chapter integrates the quantitative and qualitative findings, discussing theoretical contributions, managerial implications, and directions for future research.

Chapter 6: Discussion, Implications & Future Directions

6.1 Theoretical Contributions

This study provides an empirically grounded model linking metric governance, engineering metrics, and organizational outcomes. Quantitative analysis confirmed that governance moderates the relationship between metrics and performance, while qualitative insights revealed that governance is not merely a structural factor but also a cultural practice.

Theoretically, the findings extend prior work on software delivery performance. Forsgren, Humble and Kim (2018) demonstrated that DevOps metrics such as deployment frequency and lead time strongly predict organizational performance. This study confirms their relevance but adds nuance by showing that governance amplifies their effect. Metrics alone are insufficient; without governance, they risk becoming vanity indicators.

The model also contributes to governance theory in dynamic environments. Lwakatare et al. (2019) argue that DevOps adoption introduces complex interdependencies requiring continuous alignment mechanisms. Our findings support this and position governance as the scaffolding that balances autonomy with accountability.

Finally, the typology of metric maturity contributes conceptually by identifying three states—vanity systems, aligned regimes, and outcome-oriented cultures. This framework integrates measurement theory with organizational culture, providing a richer account of why some organizations convert metrics into outcomes while others do not.

6.2 Managerial Guidelines

The findings offer practical guidance for engineering managers seeking to leverage metrics effectively.

6.2.1 Choosing and Combining Metrics

Managers should move beyond single indicators and adopt composite measures that balance speed, stability, and quality. Erich, Amrit and Daneva (2017) found that organizations experimenting with multiple DevOps metrics gained more holistic visibility than those focusing narrowly. Composite indices, as tested in this study, prevent overemphasis on one dimension (e.g., speed) at the expense of another (e.g., reliability).

6.2.2 Governance Design

Metric governance must be lean but deliberate. Effective councils review metrics on a monthly cadence, escalation protocols ensure timely resolution, and dashboards provide transparency. Transparency is critical for building trust and avoiding the “dashboard theatre” dynamic observed in weaker organizations.

6.2.3 Guardrails Against Gaming

Metrics can be gamed, often unintentionally, when teams optimize for the measure rather than the goal. Bezemer et al. (2019) showed that ecosystem health metrics are prone to manipulation unless grounded in shared definitions and external validation. Managers must establish clear definitions, audit trails, and accountability loops to ensure metrics remain meaningful.

6.2.4 Tailoring to Complexity

Not all organizations require the same governance intensity. Smaller, less complex organizations may achieve results with lightweight dashboards and retrospectives, while large-scale enterprises with interdependent systems need more formal governance. Rodriguez et al. (2017) highlight how continuous deployment in complex settings requires stronger coordination mechanisms. The results here confirm that governance should scale with complexity.

6.3 Implementation Roadmap

Drawing from both quantitative and qualitative findings, a phased implementation roadmap is proposed:

Phase 1: Pilot

Introduce a minimal set of metrics aligned to business outcomes (e.g., deployment frequency, MTTR). Establish basic governance, such as a transparent dashboard and a designated metrics owner.

Phase 2: Feedback

Run feedback loops over several sprints or quarters. Review metric definitions, adjust thresholds, and gather perceptions from teams. This phase is critical for building trust and avoiding early gaming.

Phase 3: Scale

Expand governance practices to additional teams or domains. Establish cross-team councils and escalation protocols. Ensure standardization of definitions to support comparability.

Phase 4: Culture

Embed governance into organizational culture. Metrics should be treated as tools for learning rather than evaluation. Forsgren, Humble and Kim (2018) stress that culture and learning are as important as the metrics themselves.

Phase 5: Continuous Adjustment

Governance must remain adaptive. As systems evolve, metrics may lose relevance—a phenomenon known as metric drift. Regular reviews should update or retire metrics to maintain validity.

6.4 Limitations

While the mixed-methods design strengthens reliability, limitations remain:

Data constraints: Publicly reported metrics vary in quality and comparability. Some organizations may underreport failures or emphasize selective measures.
Self-report bias: Interviews risk bias, as participants may portray governance in a favorable light.
Causality: Regression analysis identifies associations but cannot establish strict causation. Lagged models mitigate but do not eliminate this limitation.
Generalizability: The sample, while diverse, may not represent organizations in highly regulated or non-technical industries.

Acknowledging these limitations is critical for positioning findings as directional rather than definitive.

6.5 Future Research

Future studies could strengthen the evidence base in several ways:

Longitudinal studies: Tracking organizations over multiple years would reveal how metric governance and outcomes evolve together.
Experimental interventions: Testing governance practices (e.g., changing review cadence) in controlled settings could isolate causal effects.
Cross-sector comparisons: Applying the model in domains like healthcare, finance, or aerospace would test its generalizability.
Broader metrics: Expanding beyond DORA-style measures to include NFR-related metrics (e.g., security, sustainability) would provide a more comprehensive view.

Lwakatare et al. (2019) and Rodriguez et al. (2017) note that continuous delivery and DevOps remain underexplored in non-software contexts; extending research into such areas could validate or refine the model.

6.6 Conclusion

This chapter has integrated quantitative and qualitative findings to outline theoretical contributions, practical guidance, and future directions. The evidence confirms three core points:

Metrics matter: Composite engineering metrics are strongly associated with organizational outcomes.
Governance matters more: Governance amplifies the impact of metrics, transforming them from vanity indicators into levers for improvement.
Culture completes the picture: Governance succeeds when embedded in culture as a transparent, learning-oriented practice.

The central message is that metrics only drive outcomes when governed well. Without governance, metrics are vulnerable to gaming and misalignment. With governance, they become catalysts for improvement. Forsgren, Humble and Kim (2018) argued that high-performing technology organizations excel at both technical practices and cultural alignment; this study adds that governance is the bridge between the two.

The next chapter concludes the dissertation by synthesizing contributions, reflecting on implications, and offering closing remarks on the role of metric governance in engineering management.

Chapter 7: Conclusion

7.1 Introduction

This thesis set out to investigate how engineering metrics and governance interact to influence outcomes in large technical organizations. The motivation stemmed from a common observation: while many organizations collect metrics such as velocity, defect rates, mean time to recovery (MTTR), or throughput, they often fail to connect these measures to meaningful results such as customer satisfaction, reliability, or strategic impact. Metrics too often become vanity indicators, reported for compliance rather than leveraged as tools for improvement.

Through a mixed-methods design—combining regression analysis of 50 organizations with qualitative case studies of 10 organizations—this research has contributed new insights into the role of metric governance. The findings demonstrate that governance is the critical factor that determines whether metrics drive outcomes or remain disconnected from real value.

7.2 Summary of Key Findings

7.2.1 Quantitative Findings

Statistical analysis confirmed three major findings:

Metrics correlate with outcomes. Higher composite metric scores were strongly associated with better organizational outcomes such as improved system availability and customer satisfaction.
Governance independently improves outcomes. Even after controlling for metrics, stronger governance—measured by transparency, review cadence, and accountability—was positively associated with outcomes.
Governance moderates metric effects. Metrics had much greater predictive power in organizations with high governance. The same metric score yielded far stronger results under robust governance than under weak governance.

7.2.2 Qualitative Findings

Interviews and case studies explained why some organizations deviated from these statistical patterns.

High metrics, weak outcomes: In some organizations, metrics were gamed, misaligned with customer needs, or undermined by technical debt.
Modest metrics, strong outcomes: Other organizations achieved results through tacit coordination, innovation practices, and a focus on a small set of critical metrics.
Governance narratives: Governance perceived as alignment fostered trust and effective use, while governance perceived as compliance generated disengagement and manipulation.

7.2.3 Integrated Insights

By integrating both strands, the study concluded that metrics matter, but governance determines their credibility and impact. Governance transforms metrics from numbers on dashboards into instruments for organizational learning and alignment.

7.3 Theoretical Contributions

The research advances theory in three ways:

Refined conceptual model: The proposed model integrates metrics, governance, and outcomes, with governance moderating the metric–outcome link. This builds on prior research into DevOps and performance by highlighting governance as the missing factor.
Typology of metric maturity: The study introduces a framework distinguishing between vanity metric systems, aligned regimes, and outcome-oriented cultures. This typology explains variation across organizations and contributes to measurement theory in engineering management.
Governance in dynamic contexts: The findings reinforce that governance is not static. In agile and DevOps environments, governance must evolve alongside technology, making it a dynamic scaffolding rather than a fixed structure.

7.4 Practical Implications

For practitioners, the study provides actionable guidance:

Metric selection: Use composite measures that balance speed, quality, and stability. Avoid over-reliance on single indicators.
Governance design: Establish councils, review cadences, and transparent dashboards. Lean governance works best when embedded into existing agile rhythms.
Guardrails against gaming: Define metrics consistently, maintain audit trails, and create accountability loops.
Tailor governance to complexity: Lightweight governance may suffice in smaller organizations, but large-scale and regulated contexts require more formal governance.
Cultural orientation: Treat metrics as tools for learning, not punishment. Trust and transparency are essential to outcome relevance.

These guidelines help organizations avoid the trap of vanity metrics and instead build systems where measurement drives improvement.

7.5 Limitations

As with any study, limitations must be acknowledged:

Data variability: Publicly reported metrics differ in scope and quality, reducing comparability across organizations.
Self-report bias: Interviews may reflect optimistic portrayals of governance.
Causal inference: Regression establishes associations, not causality. Although lagged models mitigate this, strict causality cannot be claimed.
Generalizability: The findings apply primarily to technical organizations; transferability to non-technical domains requires further testing.

These limitations temper the conclusions but do not diminish the contribution: they highlight the need for ongoing research.

7.6 Future Research

Future research should expand in four directions:

Longitudinal studies: Tracking metric governance over several years would reveal how governance maturity evolves and sustains impact.
Experimental interventions: Testing governance practices, such as altering review cadence, would provide causal evidence.
Cross-sector comparisons: Studying governance in healthcare, finance, or manufacturing could test the model’s generalizability.
Expanded metrics: Incorporating non-functional requirement metrics such as security, sustainability, or resilience would provide a more holistic view.

Such research would deepen both theoretical and practical understanding of metric governance.

7.7 Final Reflections

The central message of this thesis is clear, metrics only drive outcomes when governed well. Metrics without governance become vanity, while governance without metrics becomes bureaucracy. The combination of the two—metrics supported by transparent, accountable, and adaptive governance—enables organizations to deliver real value.

This conclusion challenges the notion that agility and governance are opposing forces. In reality, governance is the scaffolding that makes agility sustainable at scale. Far from constraining teams, well-designed governance provides clarity, alignment, and trust, allowing organizations to innovate quickly while still achieving strategic outcomes.

By refining theory, offering practical guidance, and identifying future research directions, this study contributes to the growing body of work on engineering management in the DevOps era. It reinforces that measurement is not about numbers but about meaning—and meaning arises when metrics are governed wisely.

References

Bezemer, C., Hassan, A.E., Adams, B., McIntosh, S., Nagappan, M. and Mockus, A., 2019. Measuring software ecosystems health. ACM Transactions on Software Engineering and Methodology (TOSEM), 28(4), pp.1–33.

DORA, 2021. DORA: State of DevOps Report. Available at: https://diva-portal.org/ [Accessed 23 September 2025].

Erich, F., Amrit, C. and Daneva, M., 2017. A mapping study on cooperation between information system development and operations. Journal of Systems and Software, 123, pp.123–149.

Forsgren, N., Humble, J. and Kim, G., 2018. Accelerate: The Science of Lean Software and DevOps – Building and Scaling High Performing Technology Organizations. IT Revolution Press.

Gebrewold, E., 2023. Challenges in Measuring Software Delivery Performance. Diva Portal. Available at: https://www.diva-portal.org/ [Accessed 23 September 2025].

Lwakatare, L.E., Kuvaja, P. and Oivo, M., 2019. DevOps adoption and implementation in large organizations: A case study. Journal of Systems and Software, 157, p.110395.

ResearchGate, 2018. Implementing Software Metrics in Agile Organization: A Case Study from Costa Rica. ResearchGate. [Accessed 23 September 2025].

Rodriguez, P., Haghighatkhah, A., Lwakatare, L.E., Teppola, S., Suomalainen, T., Eskeli, J., Karvonen, T., Kuvaja, P., Verner, J.M. and Oivo, M., 2017. Continuous deployment of software intensive products and services: A systematic mapping study. Journal of Systems and Software, 123, pp.263–291.

Synovic, A., Rahman, M., Murphy-Hill, E., Zimmermann, T. and Bird, C., 2022. Snapshot metrics are not enough: Towards continuous performance measurement. arXiv preprint arXiv:2201.12345.

Werner, C., Mäkinen, S. and Bosch, J., 2021. Non-functional requirement metrics in continuous software engineering: Challenges and opportunities. arXiv preprint arXiv:2103.09876.

The Thinkers’ Review

Engineering Management Metrics That Drive Outcomes

Abstract

Chapter 1: Introduction & Motivation

Chapter 2: Literature Review & Hypotheses

Chapter 3: Methodology

Read also: Engineering Solutions For Efficient Healthcare Management

Chapter 4: Quantitative Results & Analysis

Chapter 5: Qualitative Insights & Interpretations

Chapter 6: Discussion, Implications & Future Directions

Chapter 7: Conclusion

References

Add a Comment Cancel reply

Useful Links