June 9, 2016
When you build and evolve a managed platform, there are a variety of resources to think and reason about. These could be:
- Local dependencies – file system (local, network based mounts)
- External dependencies – REST services (proprietary APIs, public APIs)
- State stores – data stores (firm-hosted, cloud-hosted stores,…)
- Platform services – services that manage the platform functionality (provisioning, runtime management, operational tooling,…)
- Managed Resources – these are resources that your platform is managing on behalf of clients. These are the bread-and-butter abstraction over which you typically will have most control
- Desired States – this is the list of features that must be made available for regular processing, when platform is being upgraded, when platform is undergoing routine maintenance, etc.
This isn’t an exhaustive list by any means but my intention is to summarize resources a managed platform is made up of. There are numerous benefits in making your platform machine readable. You can now:
- Think about the assumptions your team is making about non-functional requirements and characteristics of your internal services, dependencies, and managed resources. These would be things like response time, availability, etc. and can help you design and tune integration proxies.
- Catalogue features that are important for your clients and under what conditions (e.g. ability to handle a provisioned resource, upgrading, etc.). More specifically, define the relationship between features and resource dependencies.
- Define all these elements in a single, consistent, machine readable definition. This will allow your team to view resources and their state, visualize and report feature dependencies.
- Design/ implement feature toggling – feature states can be derived using resources
- Apply self-healing techniques – reset resource (e.g. close and re-initialize a corrupt connection pool, automatically start service instances in the event of a host crashing, etc.)
I will explore each of these points in much more detail in follow up posts. Needless to say, getting your platform in a machine readable state has several benefits.
June 7, 2016
You need to practice minimal design to be effective with systematic reuse. The design needs to continuously look for opportunities to align iteration goals with your systematic reuse roadmap. Too many developers mistakenly think that adopting agile means abandoning design. This couldn’t be farther from the truth. You design whether you explicitly allocate time for it or not. Your code will reflect the design and you will impact the technical debt for your codebase in one way, shape, or form. Implementing user stories and paying down technical debt should be your end goal and not avoiding design altogether.
Always design for meeting your iteration goals. Avoid designing for several weeks or months and surely avoid putting technical elegance ahead of delivering real user needs. You should design minimally. Just enough to take advantage of existing reusable components, identify new ones, and plan refactoring to existing code. Specifically this means:
1. Keeping a list of short tem and medium term business goals in mind when designing
2. Always looking for ways to make domain relevant software assets more reusable
3. You are aware of what distribution channels your business is looking to grow
4. Design reflects the domain as close as possible and that your reusable assets map to commonly occurring entities in your business domain
5. Value is placed on identifying the product lines that your business wants to invest in and evolving your reusable assets to mirror product line needs.
6. Design isn’t a pursuit of perfection but an iterative exercise in alignment with your domain.
What you decide to encapsulate, abstract, and scale are all natural byproducts of this design approach. Rather than spend a lot of effort with a one time design approach you need to do just enough design.
June 5, 2016
Production incidents are one of the best avenues to accelerate the maturity of a managed platform. While incidents are stressful when we are dealing with them they provide clear and direct feedback on gaps in the platform. First, don’t indulge in blame games and don’t waste time fretting that it has happened. Second, if you step back from the heat incidents are an excellent means to learn more about your assumptions and risks.
- Did you assume that an external dependency will always be available? More specifically, did you assume that the dependency will respond within a certain threshold latency window?
- Was there manual effort involved in identifying the problem? if so, how much time did it take to get to the root cause? what was missing in your supportability tooling? Every manual task opens the door for additional risks so examining them is key. Think about how to get to the root cause faster:
- Instrumentation about what was happening during the incident – were there pending transactions? pending events to be processed? how “busy” was your process or service and was that below or above expected thresholds?
- Is there a particular poison / rogue message that triggered a chain reaction of sorts? did your platform get overwhelmed by too many requests within a certain time window?
- Did you get alerted? if so, was the alert about a symptom or did it provide any clues to the underlying root cause? did it include enough diagnostic information for additional troubleshooting? was there an opportunity to treat the issue as an intermittent failure – instead of alerting, could the platform have automatically healed itself?
- Was the issue caused by a ill-behaved component or external dependency? If so, has this happened before (routine) or is it new behavior?
- Think about defect prevention and proactive controls. There are a variety of strategies to achieve this: load shedding, deferring non-critical maintenance activities, monitoring trends for out of band behavior, and so on. Invest in automated controls that warn threshold breaches: availability of individual services within the platform, unusual peak / drop in requests, rogue clients that hog file system or other critical platform resources, etc.
The above isn’t an exhaustive list but the key message is to use the incident as an opportunity to improve the managed platform holistically. Don’t settle for a band-aid that will simply postpone a repeat incident!
June 4, 2016
Creating a managed platform is a powerful strategy – key is to help your clients and proactively manage adoption risks. Risks are everywhere from losing control on infrastructure, release management, upgrades to reduced learning curve and operational supportability. Here are a few strategies to manage adoption risks – these will not only help your clients but help the platform team as well:
- Understand key technical drivers for platform adoption – what do your clients care about the most? Is it faster functional development? ease of deployment? rich tooling? testability? ability to dip into a rich developer ecosystem?
- Provide an integrated console for integrating provisioning, runtime management, and operational support. The key word here is integrated – an integrated toolset that makes it easier for a team to provision a resource, deploy / activate it, elastically scale it , and troubleshoot problems is extremely important.
- Empathize with your client’s adoption challenges: they are losing direct control and access in exchange for a host of powerful platform benefits. But they still need answers to questions like:
- how rich and useful is the instrumentation (for transparency into transactions or events or requests being handled, for errors / warnings whilst processing, historical metrics / trends)?
- how do I get access to log messages? are the logs linked to particular request ids or transaction references? how much is the latency between actual processing and log messages reflecting them?
- can I help myself is something goes wrong during production use? e.g. what if a process or execution takes longer than expected? what if it crashes mid-way? is there support for automatic alerting? how easy or difficult is train my devops team members?
- Provide automated controls to reduce risk when hosting untrusted code. Let’s face it – managed platforms take on a large amount of risk by hosting code that is largely outside it’s control. It is therefore, very critical to reduce defects and address risks via automated controls. You can check for unsupported API calls in your SDK, risky or unsafe libraries being packaged, etc. to address risks while provisioning. This is a vast topic and I will author a follow up post on controls and why they are indispensable to create stable managed platforms
June 1, 2016
How many times have you heard someone say – “we want to implement this once so we can reuse it over and over again…” – or some variation of this theme? The underlying assumption here is that it is better to get to the right implementation of a component so the team doesn’t have to touch it again. Let’s make it perfect, is the reasoning.
I have rarely seen this work in practice. In fact, it is very difficult to create a single perfect software implementation. After all, your team’s understanding of the nuances and subtleties of your domain grows with time and experience. That experience is earned using a combination of trying out abstractions, continuously validating functional assumptions, and ensuring that your software implementation is providing the right hooks to model and accommodate variations.
Instead of trying for perfection, focus on continuous alignment between your domain and the software abstractions. Instead of trying to write once and reuse many times, focus instead on anticipating change and continuous validation of requirements and associated assumptions. Instead of pursuing the one right implementation, enable easy pluggability of behavior and back it up with a robust set of automated tests. This way, you can ensure your team’s domain understanding is reflected appropriately in the software implementation.
You won’t write once – specially if your team lacks hard-won experience to create high quality abstractions. Embrace the idea that you will write something multiple times – not because it is desirable, more because it is inevitable. Deliver value to your business iteratively and deepen your understanding of both the problem and solution spaces. You will be pleasantly surprised with the results. Remember, pursuing reuse without continuous value is the wrong goal.
May 28, 2016
Although starting from scratch is simpler when building reusable assets, reality is that you are probably maintaining one or more legacy applications. Refactoring existing legacy assets has several benefits for the team. Here are a few:
- The refactoring effort will make you more knowledgeable about what you already own
- Will help you utilize these assets for making your systematic reuse strategy successful
- Saves valuable time and effort with upfront domain analysis (of course this is assuming what you own is relevant to your present domain)
- Make your legacy system less intimidating and more transparent.
- Provides the opportunity to iteratively make the legacy assets consistent with your new code
If you cannot readily identify which legacy module or process is reusable you have two places to get help – your customer and your internal subject matter experts. Your customer can help you with clarifying the role of a legacy process. Likewise, your team probably has members who understand the legacy system and have deep knowledge of the domain. Both of them can guide your refactoring efforts.
The act of examining a legacy module or process also has several benefits. You can understand the asset’s place in the overall system. The existing quality of documentation around this asset and usage patterns can be understood as well. Now, you can make an informed opinion of the current state of what you have and how you want to change it. Before making any changes though it helps to consider the next few moves ahead of you. Ask a few questions:
- Is the capability only available in the legacy application and not in any other system that you active maintain and develop?
- Is it available only to a particular user group, channel, or geography?
- Is the capability critical to your business sponsor or customer? If so, are they happy with the existing behavior?
- How is the capability consumed currently? Is it invoked as a service or via a batch process?
- How decoupled is the legacy capability from other modules in the application?
These questions will help you get clarity on the role of the legacy asset, its place in the overall application, and a high level sense of the effort involved in refactoring it to suit your requirements
November 10, 2013
In an earlier post, I listed reasons why automated tests are foundational for reuse. In this post, want to provide some approaches that will ease automated testing of your components.
- Mock API interactions when using external dependencies. Mocking will reduce runtime dependencies and make your unit tests faster and more robust. Use JUnit with Mockito – mockito has excellent support for a variety of mocking use cases.
- If an external dependency is required from multiple classes, you can define an Adapter that will wrap the external API via an interface. The interface can then be mocked or stubbed and will provide an abstraction layer for your classes. Word of caution: abstractions are leaky and resist the need to wrap every single API provided by the external dependency.
- Use in-memory databases and provide a consistent API for your tests. A common class could initialize and clean up the in-memory db and can be leveraged from tests. Alternatively, it can be provided as an abstract class that your tests can extend. Take the opportunity to standardize location, naming, and directory structure of test resources – if you are using maven for instance, the db related data files can be placed under src/test/resources/db/<db-name>. Finally, this is very useful in ensuring that the database-bound code is indeed testable – forcing the in-memory db test will make technical debt apparent.
- Use db-deploy or some automated database deployment tooling to define and populate databases from tests – these can enable developers to define and execute tests without sharing / corrupting each other’s data. It will also make your database deployment repeatable and well tested eliminating a key deployment risk.
- Provide a common API for routinely used tasks for developers – e.g. APIs that can create test data in in-house / proprietary formats, parse, and populate appropriate data structures will be useful.
- Use JUnit Rule extensions for having a common API for developers – provide a custom rule that will manage the lifecycle of a legacy component or a API that is difficult to use – these are all opportunities to both facilitate testing and add value via reuse.
November 3, 2013
Here are some tips when authoring web service clients:
- Decouple connectivity from request construction. This will isolate variations in input construction and the mechanics of service invocation cleanly separated. Additionally, the request construction might depend on the particular resource – e.g. they can be set of query string parameters or a more complex object structure.
- Connectivity logic should encapsulate the service URL and automatic-retry considerations. The client can automatically retry GET requests specified number of times if invocation encounters a connection timeout. It should also ensure response is OK (either via HTTP status codes or by examining appropriate response-specific data structures).
- Don’t swallow exceptions – the service might return a resource not found or an internal server error – the code that is using the client should be given the flexibility to deal with these exceptions appropriately – the client code shouldn’t assume or mask these exceptions. When in doubt, don’t suppress runtime exceptions.
- Decouple domain logic from service client – domain logic might dictate whether or not a service call needs to be made, or the nature of input resource data, etc. – this logic is more likely to change per the consuming application’s evolving requirements and shouldn’t be hosting service invocation code in the same class.
- Provide reusable API hooks for addressing cross-cutting concerns – such as response time capture and input and output messages – if you want to report response time trends when invoking a service you will not want to clutter this all over the consuming application’s codebase – the client can and should centralize these.
Remember the above is useful whether you are consuming a service or providing clients for your prospective service consumers.
November 2, 2013
Reading the book Code Simplicity by Max Kanat-Alexander. When you have to make lots of design improvement and implementation decisions it is important to keep our solutions simple. Reducing complexity is an important aspect of good code and particularly relevant to systematic software reuse – so hoping to learn new concepts from this book.
You might want to also check out Max’s interview on the rewards of simple code.
November 2, 2013
You can wait for that dream initiative or project to build a whole new set of reusable components that will magically make your teams more productive. The only issue is – it is highly likely that it will be just that – a dream. Instead of planning for systematic reuse, start executing on it by taking a few simple steps. Ask yourself the following questions:
1. Are you capitalizing on identifying and sharing common components with your department / team?
2. Is every project encouraged to continuously refactor and harmonize classes for reducing redundancies? If not, why not?
3. Do you have code that caters to common infrastructural concerns – logging, exception management, alerting, monitoring, metrics.? If yes, is their reuse mandated via common framework hooks that your developers are already using? If not, what is preventing adoption of these concerns into your development stack? Ask your developers and listen to their concerns – you will need to unearth and attack the root causes behind reuse barriers.
4. Do you utilize ad-hoc, informal pairing and code review sessions to identify and harmonize similar / duplicate / redundant classes? If you review code the first time before a project go-live, odds are you either will regret missed opportunities or bemoan the lack of time within your development cycle for making improvements. Key is to intervene early and often and front load your investments for systematic reuse
5. How do you ensure reafactoring to reuse opportunities are tracked? do you create improvement tickets and action them on a best-effort basis or are they managed as part of the product backlog of things that have to get done? If its the former, it will be difficult to make much progress. Creating and tracking tickets will will provide visibility – however, for you to make tangible progress in acting on them you need to partner with developers and development managers to action work on an ongoing basis.
These are just example questions to help you get your journey started and it should be abundantly clear that discipline and continuous alignment is key. If you don’t do anything else, just force your team to converge on a common implementation on key functionality. You will be surprised what discipline can deliver.