DevOps 缺少定义,平台工程需要指导性路线图
Key Takeaways
A lack of definition for DevOps enabled early adopters but didn't allow late-majority enterprises to be successful in their adoption of DevOps. The platform engineering community is in danger of repeating this mistake
What is missing is a map for how to progressively adopt a Platform Engineering approach, not a highly specific end goal
While Team Topologies has provided an excellent starting point, not everyone understands it is more than just implementing the right types of teams
A common anti-pattern is setting an open-ended goal of increasing collaboration between teams. This tends to be highly inefficient at scale as stream-aligned teams outnumber platform teams.
Most enterprises fail at transformations like this because they implement them via project management instead of product management
At PlatformCon 2022, Nigel Kersten, Field CTO at Puppet, spoke about the need for a prescriptive roadmap for platform engineering. He shared that in previous cultural transformations such as Agile and DevOps, the industry starts to splinter in its approaches and definitions. This leads to confusion about how to best approach the organizational and cultural change necessary to embrace the new paradigm.
Kersten believes that this lack of direction not only leads to poor implementations but it can also have negative consequences. One side-effect of the lack of understanding of DevOps is the ecosystem of DevOps tools and software claiming to be the means to achieving DevOps. This has led to many falsely equating the purpose of DevOps to be implementing the correct tooling only.
However, as Patrick Debois, VP of Engineering at Showpad, noted back in 2010, DevOps is more than just software:
The DevOps movement is built around a group of people who believe that the application of a combination of appropriate technology and attitude can revolutionise the world of software development and delivery.
The focus on culture and processes is often underrepresented in large-scale migrations. Within the enterprise space, the amount of pre-existing teams, processes, and communication channels mean that the purpose of the movement can easily be lost. As Kersten notes, this is exacerbated by the numerous vendors and software companies releasing tooling into the space:
There were too many vendors who saw this huge, vibrant community of people who were really interested in trying to change things and said "That software, yup, that's totally DevOps". All of that started to distort and mutate the entire movement so that expectations and understandings became a little more difficult for people to reach.
Kersten argues that a prescriptive roadmap is necessary to ensure that this history is not repeated with the platform engineering movement. Setting up an appropriate team structure is only the beginning of laying the foundation for a successful organization. The interactions between teams are just as, if not more, important to get right. It is here that organizations tend to flounder.
This is not new to platform engineering, this has been a challenge with DevOps as well. As Adam McLaurin, Senior Engineering Leader at SoundHound, notes:
There are so many useless debates now about what DevOps *really* is. I think we should just step back and see that DevOps is just a subset of and a tangible introduction to value stream optimization.
Without a clear path forward and a shared understanding of what value we are looking to drive with platform engineering, we run the risk of this movement also falling prey to uncertainty and misunderstanding. However, it is in the application of these ideas to living organizations that many start to struggle.
Organizations, especially large enterprise companies, have pre-existing patterns and processes that make introducing any change a challenge. While ideas such as Team Topologies and Value Stream Optimization help provide an entry point to the change, they may not be enough to help ensure the change is executed fully and correctly.
InfoQ sat down with Kersten to discuss this roadmap idea and the future of platform engineering in more detail.
InfoQ: You've mentioned that it is time for a prescriptive map and model for implementing Platform Engineering. Could you elaborate on what you mean by this? And why now?
Nigel Kersten: Much of this perspective is driven by being deeply involved in the DevOps movement from near the beginning, watching this incredibly vibrant and productive space have a huge impact on small to medium-scale tech companies, and yet observing that large-scale enterprises have generally failed to see the same kinds of benefits when they tried to adopt these practices.
The fact that DevOps was loosely defined is what enabled this vibrant community, and yet that same lack of definition meant that by the time late majority enterprises started trying to “do DevOps”, they did all sorts of different things in the name of DevOps, most of which have been a failure. I talk to folks inside enterprises trying to do DevOps all the time, and it can mean anything from being a release engineer to rebadged sysadmin to an ops person doing whatever developers tell them to do, none of which are really “DevOps” in my mind, as they lack the essential characteristic of productive collaboration.
I fear we’re going to see the same thing with Platform Engineering. If the community of enthusiastic early practitioners doesn’t get together and work out more clearly what the path to success looks like, we’ll end up in the same spot, which would be a shame.
Note that I don’t believe models like SAFe are the right way to go, where you have highly specific roles and processes that feel very rigid, and the opposite of Agile. What I believe we need is a map for how to progressively adopt a Platform Engineering approach, not a highly specific end goal.
InfoQ: How is this need for a prescriptive model not being filled by current models like Team Topologies?
Kersten: The Team Topologies work is by far the best model out there and I am in awe of how good Matt and Manuel’s work is, but I’m not entirely convinced most people who claim they’re working to the TT model actually finished the book :) It’s not just about implementing the different teams that are described towards the beginning, it’s also about ensuring the right interaction patterns and practices between those teams.
The two most common mistakes I’m seeing are people not focusing on the interactions between teams, and not investing seriously in roles like the platform product owner.
Honestly, if the whole enterprise space took the Team Topologies work on platform-as-product, followed it diligently, and collectively learnt from each other on how to create that change, we’d be in a wonderful position.
InfoQ: Why do you think the enterprise space has more challenges in implementing transformations like Agile, DevOps, and now Platform Engineering? For those of us in the enterprise space what should we focus on to help make these transformations successful?
Kersten: Fundamentally the problem is that all of these transformations have a massive people-interaction component, and the bigger and older you are as an organisation, the more difficult it is to change how people interact, and the higher up the chain you have to go to create organisational change.
Having spent time at a “webscale” large tech company, a small-to-medium tech company, and then working for the last decade with a lot of very traditional enterprises, it’s striking how poor internal communication is inside most enterprises compared to tech companies. I’ve lost track of the number of meetings and working sessions I’ve had where it’s apparent that the teams I’m working with have never actually worked together before.
Ultimately success requires being very deliberate about architecting productive team-to-team interactions, with as few intermediaries as possible, and to focus on the feedback loops between the producers and consumers of systems. A common mistake I see folks make is to set an open-ended goal of “collaboration” between teams, with endless meetings and working sessions, and it turns out this is extremely inefficient at scale when your consumers outnumber your producers (which they should do in almost every situation!).
As a producer of systems that others will consume, you should absolutely focus on meaningful collaboration at the design phase, and to ensure that you can get direct and timely feedback from your consumers about their experience using your systems, but for true efficiency at scale, you want to be focusing on building self-service systems, where your consumers can operate at their speed and cadence.
I very strongly believe that the reason most enterprises fail at these sorts of initiatives is that they try to implement them via project management rather than product management. Treat your internal user base as a market of users, and the solutions you build as products for those users. This means you need to solve their problems for them in ways that work for them, rather than just delivering “capabilities”, and it means that you need to keep solving them as those problems change over time, which they always do!
InfoQ: In a recent presentation, you discussed the challenge of finding the right balance between abstracting too much of the underlying cloud infrastructure and providing too much flexibility in the platform. This seems to be a struggle for many new platform teams. Could you dig into this more, especially around how can teams set themselves up for success from the start?
Kersten: There are two common failure modes I see when you’re building a platform for developers that has a large public cloud component. The first is to build entirely new interfaces that completely abstract away the underlying public cloud services, and the problem here is that your users almost always have some knowledge of those services, will be frustrated that you’ve hidden them away, and will be unable to solve new problems on their own without your help.
The second is to go entirely in the other direction, and simply expose all the public cloud services to all of your users and simply give your “platform team” ownership over identity management and cost control. Your development teams will all come up with different solutions to the same problems, and areas like security and compliance will become increasingly expensive to deal with.
To navigate between these modes you’ve got to have someone operating as a product manager early, someone who holds the vision for the platform, just like any other product. There’s a saying in product management that “the truth lies outside the building” and it’s critical to keep this in mind when you’re getting started. If you’re forming a platform team out of folks who’ve been operating infrastructure for the rest of the company, and don’t work on creating communication channels with the folks who will actually be using the platform, then you have no idea what your users actually want and you’ll fail to build actual solutions for them that solve their problems.
The ideal situation is to start with a team that consists of folks with experience in operating infrastructure, folks who are representative of the users of the platform, a product manager to prioritise features and balance competing desires, and work on making sure you can get reliable feedback from the users of the platform.
If you’ve got all that, then you can continuously course-correct between the two failure points I referenced.
InfoQ: What are you expecting to see in the upcoming 2022 State of DevOps Survey? What trends are you expecting to see in the upcoming year?
Kersten: We’re still putting together the final report, but there are a few really interesting points emerging from the data. The sentiment around Platform Engineering amongst practitioners is much more positive than it is around DevOps, in general, these days, and a higher proportion of folks are seeing measurable, positive outcomes. It’s looking like Platform Engineering has massive potential to be a model that is easier for enterprises to adopt in order to get to the benefits of DevOps
It’s not all beer and skittles though, a massive majority of respondents are concerned that their platform teams are not going to keep up with the needs of the product teams consuming the platform, and resistance to change and communication issues are impediments for a lot of folks.
We’re also seeing that the product manager role is critical and almost certainly understaffed across the industry, with teams wanting to hire for better communication skills, and for someone to help both evangelise and set realistic expectations across the organisation.
Comments
Post a Comment