Maintaining agile programme delivery while facing dependencies on multiple teams, and hardware and firmware integrations.
This post will not advocate why I think that agile practises are better suited for any development effort, but it is assumed that you are already practicing a form of “agile”, or, if you’re starting your journey, have answered the question of why do you want to do it, and what you expect the benefit would be. However, I assume that the desire to reduce costs, based on elimination of waste and cycle-time reduction were at the forefront of your decision.
Whether you are managing a single small team, or are in charge of a multi-team programme, the rationale to doing so in an “agile” manner is identical, but managing a product based on efforts of multiple teams, along with hardware and firmware constraints places a greater emphasis on these concerns.
Adapting to a large-scale programme
Similarly to smaller-scale efforts, we’ll be scrutinizing the VSM, looking for both local and global maxima, in order to tackle and eliminate waste, and with emphasis on developer experience, and a frictionless path to integration. Our goal will be to scale the practises to span all participating teams while maintaining a light-touch governance model, not to burden an already-complex collaboration. Of equal importance, another goal is to present an efficient way to tackle the complex task of working with devices and their embedded software while avoiding big-bang integrations.
Assumptions on constraints
As an exercise, we’ll assume these constraints:
- There is a hard deadline for the product to be launched.
- The new product must have feature parity with the existing one.
- There are a multitude of teams working on the same product.
- There is an expectation of producing detailed planning for commitments.
- Due to the complexity of the integration efforts, test execution is primarily manual.
- For the same reasons, feedback loops from test to development are long.
Intuitively, these infer the following challenges:
- Not being able to change the deadline, and the need for feature parity present a challenge of triaging non-crucial scope and performance, as the assumption is that we cannot compromise on quality.
- Communication channels must be optimized in order to reduce rejected hand-offs from one team to another, and increase integration success early in the development process.
- Build confidence with planning organizations while changing the way we present planning documents, as every effort must be made towards development, as we re-prioritize administrative tasks.
- Prioritize automation of all aspects of the development process, with minimal impact on human activity while it executes.
We argue that the constraints listed above can be addressed by turning our attention to get a clear picture of the path to production accompanied with details of the waste it includes.
This map, based on the current practises and mores of the organisation, serves as the primary source of information for future actions and measurements. The challenge when drawing such a map (Value Stream Map, or VSM) is to be sure that what people report is honest, and describes reality, not an ideal stated by an operations manual. Due to the programme context of the product, another dimension should be added to the VSM detailing the organization to which each team belongs.
Prima facie, the VSM is an overlay of business activities or events mapped to process that fulfills them. More importantly, it serves as insight to where the money is spent, as well as how it efficiently it fulfills the business needs.
A discussion of VSM is referenced here. It is essential that all delivery groups participate in the VSM session in order to portray a credible and well-founded business process map. We will set the goal posts of the VSM to be as wide as possible (“concept” to production) to accommodate for the constraints listed above.
Below is an example in a data sheet:
Informed by the VSM, we continue our analysis and optimization of the programme’s management, along these parallel paths:
Parallel I - VSM optimization
As with other circumstances, we would analyze the VSM looking for local and global maxima within the process that the VSM shows. The main objective of this path is to eliminate or reduce waste as indicated by low (First Pass Complete and Accurate) FPCA% values. A low FPCA value indicates inadequate quality of work as its handed off from one VSM “station” to another. This can be addressed by adding or removing stages between those two “stations” as well as optimizing the organization to suite the product’s goal (read: reverse Conway’s law).
Depending on the organizational structure and the complexity of the product, the optimized VSM may suggest creating a virtual team, or team of teams, to counter any siloed organizations that cannot be rapidly mutated to reduce the waste incurred by hand-offs. The challenge with organizational changes is the amount of change the wider organization is able to absorb. In order to maximize the absorption rate, the motivation for it must be discussed and socialized with all stakeholders involved, including HR functions.
Parallel II - The architecture
It is extremely important to have architecture satisfy business objectives as a counter-argument to Conway’s law. By definition, the product’s lens is holistic, whereas implementation may have splintered to different silos and technical departments. Having the architecture follow product’s lead assures the desired frictionless path to production by removing waste and non-value process.
There is a constant tension when attempting to define product architecture. By definition, it strives to be holistic and stable, yet from the moment it is set, it suffers of stressors to constantly change. The need for a stable and holistic architecture is understandable, but the flexibility to accommodate ever-changing needs is even more important, to reduce future rework. However, since there is no limit on how many layers of abstraction an architect can inject into their domain analysis, there is a risk of falling into a “big upfront design” stage, leading to “analysis paralysis”, which puts any product development efforts in immediate danger. In order to avoid the aforementioned tension and analysis loops, we can reduce risk by shorter architectural cycles by short-lived “spikes” and prototypes. By this I mean that we frequently apply use cases as stressors to what begins as a monolith, to incrementally add layers of abstraction only when challenged. These stressors are evaluated by prototypes that demonstrate that the architecture satisfies their needs without encumbering the VSM in general, or integration efforts specifically. At each incremental step, only the simplest solution will be used, with an abstraction layer protecting the underlying implementation, following the principal of programming to interfaces rather to implementations. We can achieve a flexible architecture cycle by ascribing complex structures as late as responsibly possible. This allows for a higher adoption rate by the development teams while protecting them from future changes by introducing abstraction layers as needed.
It is important for the architects to be hands-on with the prototyping efforts to assure that not only are good solutions being presented, but that they are also likely to be adopted by the teams.
Parallel III - The developer experience
That the development teams are arguably the most impactful to delivery hardly needs mentioning, and paving a frictionless path to production is of utmost importance. Collaboration is crucial in an environment that has multiple teams, each working with different software tools, including hardware and firmware elements.
Boosting the collaboration is achieved by creating a virtual product team around thin-sliced features that span the entire VSM. This means creating working environments for each team without hampering interaction with the other upstream and downstream systems. This includes domain-driven implementation, interacting with interfaces, client-driven bi-lateral contract testing, elastic test data management, and emulators for the different hardware components.
In conjunction with the VSM analysis, automation of the path to production is done by delivery pipelines that focus on business analysis as they do on technical analysis. From the business side, the pipeline includes an “outside-in” approach that describes the system’s desired behaviour. These value statement are manifested as code, and contribute to what is described as “self testing” code. This ensures that the system is driven by living documents rather than by static requirement documents. Overall, the pipeline code must demonstrate that once a change is approved (either manually or automatically), the result in production is of assured quality and is being automatically monitored for self-healing if need be.
The role of programme management is to minimise risk during, and between any given iteration.
Minimising risk may have different definitions, and I propose to define it as the ability to reduce by reducing “value at risk”.
By definition, “value at risk” is unappreciated effort, whose debt is at risk. Please see a more detailed discussion here.
In this context, it’s a feature for which we have no data with which to discern if it meets market and quality needs. We have no data as long as its not in production. Till then, its state is only known inside the organisation, and by no one outside it. Its state changes to “realised” once it’s placed in production. Prior to that moment, it continues to be the largest risk to the programme.
Once in production, it is “realised” and is evaluated by using customer and quality feedback. It is only then that the feature can be truly evaluated to be either “valuable” or “debt” (read: loss).
With the former, the programme advances. By means of monitoring and customer feedback, it is proven that the deployed feature is valuable and sustained our standards of quality. It is no longer regarded as “debt” (as it has been “paid”) and all risk associated with it is henceforth eliminated.
The latter represents one or more failed features in production. It is the point at which the organization witnesses that it made a mistake in judgement, either in the necessity of the feature itself or in its implementation.
Viewing risk management this way directs programme management to focus on repeatedly proving value, by minimising the number of features in the “value at risk” state, and, once in production, discern quickly whether the feature is “recognized debt” or “valuable”.
In other words: Programme management strives to transition a feature’s state from being “risky” (throughout its development and recognition states) to “valuable” (paid debt in production).
Those features deemed as “recognised debt” are reviewed for system behaviour or quality thresholds and are scheduled for future iterations.
Arguably, this manner of programme management shows value in committing by delivering, as opposed to committing by planning, while integrating feedback elements into its workflow to allow for rapid transitions as needed.
It is essential for Programme Management to
- Visualize the workflow
- Limit work in progress (WIP), and make sure the teams are working on the right things
- Measure and manage flow (cross-referencing the VSM and master sequence diagrams)
- Make process policies explicit (and keep them lightweight)
- Improve collaboratively (using models and the scientific method)
Structuring the teams
In contrast with smaller efforts, where we’d strive to have a single team with cross-functional roles accountable for end-to-end delivery, it is assumed that this effort is too large for a single team, based on the constraints listed above. In our situation, a domain-based team structure aligns with the business domain-driven architecture. The architecture will propose communication patterns between the services that the teams are in charge or and are accountable for. The service communication patterns should be driven by the consumers of the service, adding to the merits of our desired frictionless path to production.
It will be beneficial to instill a cultural aspect of collaboration of boosting teams capacity to those in need by those who stand to gain from it, and this can be a repeated pattern throughout all the teams, whether in development or not.
The domain-oriented teams, however, should be independent and comprise of all roles needed to allow them to deliver software to production without any dependencies. Quality is assured by all the teams exposing different tools and environment for others to interface with as they resolve dependencies on their services. This includes bi-lateral contract tests in development environments with all upstream and downstream components, as well as hardware simulators and software emulators by those teams that have firmware deliverables.
Here are some guidelines to help reconcile hardware & firmware development cycles within an agile framework:
- Early and continuous end-to-end demonstrations with the current state of the hardware/firmware
- Defer making restricting design decisions to allow changing requirements, even late in development
- Constant collaboration between business people and developers throughout the project with face-to-face conversations
- Balance simplicity by maximizing the amount of work not done in a short term and minimizing the total amount of work to be done in a long term
A periodic “refactoring” review of the teams’ domains and interfaces should be conducted with the architects to maintain the architecture. However, to avoid analysis paralysis, these reviews should not happen till the teams’ artifacts have been realised and monitored for some period of time.
Based on XP, we strive to practice the following:
- Small, frequent releases
- Simple design
- Built-in Testing
- Frequent refactoring
- Team code ownership
- Continuous integration
These practices are vital to larger programs, and can increase productivity and quality manifold.
Thin sliced implementation applies in our case as it does in any product development effort, large or small.
We reap the benefits of an end-to-end implementation as we show the flow to the people in charge of the product’s behaviour and validate our architecture.
TDD and BDD are tools that will help the team stay focused on their tasks, with end-to-end validation at each step of the implementation, and help “shift-left” any validation and QA functions.
Coupled with a strangulation strategy, end-to-end implementation will demonstrate functionality worthy of being put in production in increments.
Bi-weekly cycles will assure all the teams that they’re on the right path, and represent a short-enough time to correct mistakes without too much waste. A cultural aspect that I suggest is to use the fact that some stories were not completed within a cycle as learning opportunity rather than a shaming. If a small story took longer than expected, it may be a manifestation of a poor VSM, not of any team’s fault of their own. Going into the details of the VSM for that story provides an opportunity to understand the lacking ingredient. A sort of pull of the ‘andon cord’ each cycle.
With the programme management focusing on reducing debt as risk, the function of governance is to allow frictionless flow from “concept” to production. This means reducing friction in the pipeline, supporting the need for experimentation to validate the architecture, allocating people to where the work is, and involving (read distracting) as little amounts of people as possible.
Spending time with developers and analysing the efficacy of the pipeline far outweighs the benefit of reporting. As mentioned above regarding stories that did not make it to by the end of the sprint, governance should use such metrics to improve the VSM, not against the teams themselves.
Progress can be shown based on completed flows in the end-to-end sequence diagrams that describe each feature. The information they convey has more benefit than burn-down charts.
Basic agile methodology can be applied to large-scale product development efforts without the need to incorporate complex scaled management systems. A large-scale programme can succeed by
- Organizing work with domain-oriented teams
- Moving people to work on bottlenecks
- Developing contract tests between upstream and downstream systems
- Constantly monitoring debt at risk and the VSM
- Reducing any non-application work as much as possible
- Assuring integration planning by drawing an end-to-end sequence of technical events, in response to the VSM
- Assuring the availability of test data-sets to each domain team
- Employing emulators to mock-out hardware availability constraints
- Mandating automation
- Assuring identical operational environments (read: cloud based)
- Establishing a culture of collaboration.
- Practical Approach to Large-Scale Agile Development; Gary Gruver, Mike Young, Pat Fulghum
- UNIVERSITY OF TURKU Department of Information Technology Project Management Tools in Agile Embedded Systems Development