Make Services Fault Tolerant & Supportable for Production Use

January 3, 2015

A lot of teams are building services for clients both internal and external to your organization. Typically, there is quite a bit of focus on succeeding from a functional sense – did we get the key requirements addressed? does it cover the plethora of rules across markets / jurisdictions? and so on. Some of the more experienced teams, consider the non-functional aspects as well – e.g. logging, auditing, exception handling, metrics, etc. and I talked about the value of service mediation for addressing these in an earlier post.

There is an expanded set of capabilities that are also necessary when addressing non-functional requirements – those that are very relevant specially when your service grows in popularity and usage. They fall under two categories: operational agility and fault-tolerance.  Here are a few candidate capabilities in both these categories:

Operational Agility / Supportability:

  • Ability to enable / disable both services and operations within a service
  • Ability to provision additional service instances based on demand (elastic scaling)
  • Maintenance APIs for managing resources (reset connection pool, individual connections, clear cached data, etc.)
  • Ability to view Service APIs that are breaching and ones that are the risk of breaching operational SLAs
  • Model and detect out of band behavior with respect to resource consumption, transaction volumes, usage trends during a time period etc.

Fault Tolerance:

  • Failing fast when there is no point executing operations partially
  • Ability to detect denial of service attacks from malicious clients
  • Ability to gracefully handle unexpected usage spikes via load shedding, re-balancing, deferring non-critical tasks, etc.
  • Detecting failures that impact individual operations as well as services as a whole
  • Dealing with unavailable downstream dependencies
  • Leveraging time outs when integrating with one or more dependencies
  • Automatically recovering from component failures

In future posts, I will expand on each of these topics covering both design and implementation strategies. It is also important to point out that both these aspects are heavily interconnected and influence each other.

Making The Most of Projects To Drive Systematic Reuse

January 2, 2015

Systematic reuse takes conscious, disciplined effort – question is – where are the systematic reuse opportunities? how can we maximize these opportunities? – it may or may not surprise you that there is rarely a special, designated ‘project’ to achieve reuse or build a reusable API. Teams are busy and they want to solve tangible problems – they work on projects more often than reusable libraries and services. Some reuse advocates lament this situtation – they wish the organization supports systematic reuse with a protected budget, appropriate team members, and the organizational mandate to enforce reuse APIs and standards. I used to wish this too and I was wrong – way off the mark 🙂

Now for the good news – the great thing about the lack of a reuse budget is that, you can focus on the more important thing: achieving business objectives either by saving costs, creating new revenue lines, and reducing uncessary risk. Projects have the necessary business objectives baked into them – that’s why they are funded, resourced, executed, and tracked. Finally, they have an important constraint: time. Projects have deadlines. Schedule risk is a key one with reuse efforts and having a project deadline ensures the asset is going to be useful and relevant to the project at hand. Below are a few tips to get the maximum out of projects:

  • Review the requirements – whether it is a set of tickets, a sprint plan, or a formal document – review and categorize requirements into ones that are specific to the project, common to the product line, and common across projects.
  • Ensure you are engaged with the development team throughout the project lifecycle – reviewing and identifying opportunities for identifying, leveraging, and refactoring code. Very often, reviews are done long after development is complete with an impending deadline. This leaves little room for introspection, refactoring, or improvements to the codebase.
  • Identify existing components and services that are potential reuse candidates – most importantly, identify assets that are readily reusable (even if they are part of an existing project codebase), ones that need minor refactoring, and ones that need substantial refactoring / development. Minor refactorings and enhancements which can be done in parallel while users are testing / verifying functionality is one of many possibilities.
  • Insist on development teams using agreed upon interfaces for reuse-eligible functionality. Dev teams can either create a bespoke implementation, fork off an existing implementation, or leverage a reusable API.  However, none of this is possible if the projects use bespoke classes / APIs. Interfaces give your team the freedom and pluggability to swap implementations, evolve asset maturity, and ultimately carve code off the project to seed and grow a reusable API. It doesn’t matter if the interface is implemented by a local component, a remote service, with or without persistence, etc. – the implementation will depend on performance and resiliency characteristics in addition to functional needs.

Tips for Identifying Reusable Candidates from Existing Code

July 6, 2014

Here are a few quick tips to examine your existing code to identify reuse candidates:

  • Introduce Factory or Builder instead of repetitive boiler-plate code when constructing key objects. Several benefits: makes it easy to refactor and evolve how underlying objects are stitched together, makes it easy to write more intention revealing code, and very relevant / convenient when writing automated tests
  • Clarify functional behavior and tease out non-functional logic – look at how functional logic is implemented across your codebase. Do multiple projects share a common set of domain assets – objects, rules, services, and the like? if not, look for areas where functional logic is tightly bound to non-functional aspects like fine-grained metrics capture, exception handling, retry / timeout handling, etc.
  • Public API that needs to be accessed across platforms / devices are strong reuse candidates – for instance, are there functional APIs that don’t have a remote interface (e.g. a plain Java implementation without an appropriate REST-ful web service resource)? Do you need functionality to be made available across multiple devices with varying User Interface semantics? If so, look to carve the shared logic out into a service that can be accessed from multiple devices / integration points

Don’t Implement Reusable Assets Unless Necessary

July 6, 2014

Resisting the temptation to implement a story is very hard for a dev team – it is all too easy to get carried away in introducing a new idea as a reusable asset. However, these are the moments where you must pause and ask yourself a few basic questions before rushing to write code:

  • Is this a one-off requirement or part of a recurring theme? Remember – when in doubt, don’t plan for reuse but continuously align and refactor your codebase. Use before reuse!
  • Is there a confirmed consumer of this asset beyond the immediate project? If not, keep it in the originating project and promote it for reuse when you find another team or project that needs the functionality
  • What are the likely variations that aren’t captured in the existing implementation? are they all really required for day 1, day N ? Don’t implement anything unless there is a business mandate to do so. It is very easy to add code wishing someone reuses it rather than refactoring a known good piece of functionality
  •  Do you have the bandwidth to develop unit tests, capture the right domain abstractions, and write developer facing quick start documentation? If not, resist the urge to prematurely introduce new code that ends up being technical debt

Getting Developer Buy-in for Systematic Reuse

May 31, 2014

Too many systematic reuse initiatives fail because they fail to get their most important constituency. Below are a few tips to get developers to buy into reuse efforts.

Make every design and implementation artifact available for active engagement and feedback. If you say something must be used as a mandate – make sure your developers can actually use and benefit from that capability. This isn’t about functionality alone – it is about transparency, willingness to involve and incorporate a diverse set of ideas, and most importantly, inviting ongoing dialogue and feedback

Next key aspect is – Do you know the key developer pain points? Is it extensibility? Ease of integration? Testability? Performance? If you don’t really know then don’t thrust a reusable asset on them without working out candidate solutions. Get the developer community to help you be part of the overall solution- chances are, there are existing assets, already in production use. Don’t make he mistake of evangelising a mediocre solution that makes it more difficult for developers who have to work with your software day in and out.

Interception Points for Systematic Reuse

July 21, 2013

There are certain key interception points during the development process that can greatly increase the likelihood of systematic reuse. Some agile practices can really help:

  • Estimating User Stories – constantly seek synergies and ways to connect new requirements with existing ones or those from other projects the team is working on. You know you are successful when the most junior member of the team is finding commonalities and points out areas to explore for systematic reuse. Very often however, reality is starkly different – similar, slightly varying functionality gets implemented over and over again across projects due to time constraints, lack of awareness, and implementation considerations – e.g. legacy system won’t be able to use this component” or “the existing system implementation is hard to refactor…”, etc
  • Pair programming – the real strength with pairing is the variety in perspectives. One person can go for depth (identify a very efficient algorithm) and another can focus on breadth (identify existing components to reuse/extend or contribute current work). It is also convenient when pairing to switch perspectives and identify ways to improve the code by eliminating what is not necessary. Hard to expect one developer to be your systematic reuse superstar every time across every project.
  • Code reviews – this is a rich area for reuse opportunities because of two key reasons – one, the idea is proven – it is in code and executing – not just vaporware. Equally important, it is a chance to re-look at the implementation with a detached perspective. Discover unspoken design assumptions, needless couplings, and free up code to be more extensible. Look for unnecessary 3rd party library dependencies, code quality violations that compromise layering (presentation code invoking database or data access logic having UI formatting logic, etc.), and constantly eliminate duplicate boiler-plate code. Remember – continuous alignment and not perfection is the intent here.
  • Retrospective – when discussing what went well and what needs to improve – explicitly invest in learning the team’s challenges with systematic reuse. Listen for key signals from the dev team – do they have access to source code? how easy or difficult is the existing component library? what functional goal can be achieved faster if there was further investment in a reusable component?


Client Integration Mini-Checklist for Services

May 27, 2012

Working with clients who are consuming your services? Here is a mini-checklist of questions to ask:

  1. While executing request/reply on the service interface is there a timeout value set on the call?
  2. Is there code/logic to handle SOAP Faults /system exceptions when invoking the service?
  3. Is building service header separated from the payload? This will facilitate reuse across services that share common header parameters
  4. If there are certain error codes that the calling code can handle, is there logic for each of them?
  5. Is the physical end point information (URL string for HTTP, Queue connection and name for MQ/EMS) stored in an external configuration file?
  6. Is UTF-8 encoding used while sending XML requests to the service i.e. by making use of platform-specific UTF encoding objects?
  7. If using form-encoding are unsafe characters such as ‘&’, ‘+’, ‘@’ escaped using appropriate %xx (hexadecimal) values?
  8. While processing the service response is the logic for parsing/processing SOAP and service-specific headers decoupled from processing the business data elements?
  9. Is the entire request/reply operation – invocation and response handling logic – encapsulated into its own class or method call?
  10. While performing testing, is the appropriate testing environment URL/queue manager being used?
  11. Is a valid correlation id being used in the service request? This is very essential for aynchronous request/reply over JMS (JMS Header) or HTTP (callback handler)

%d bloggers like this: