Make Services Fault Tolerant & Supportable for Production Use

January 3, 2015

A lot of teams are building services for clients both internal and external to your organization. Typically, there is quite a bit of focus on succeeding from a functional sense – did we get the key requirements addressed? does it cover the plethora of rules across markets / jurisdictions? and so on. Some of the more experienced teams, consider the non-functional aspects as well – e.g. logging, auditing, exception handling, metrics, etc. and I talked about the value of service mediation for addressing these in an earlier post.

There is an expanded set of capabilities that are also necessary when addressing non-functional requirements – those that are very relevant specially when your service grows in popularity and usage. They fall under two categories: operational agility and fault-tolerance.  Here are a few candidate capabilities in both these categories:

Operational Agility / Supportability:

  • Ability to enable / disable both services and operations within a service
  • Ability to provision additional service instances based on demand (elastic scaling)
  • Maintenance APIs for managing resources (reset connection pool, individual connections, clear cached data, etc.)
  • Ability to view Service APIs that are breaching and ones that are the risk of breaching operational SLAs
  • Model and detect out of band behavior with respect to resource consumption, transaction volumes, usage trends during a time period etc.

Fault Tolerance:

  • Failing fast when there is no point executing operations partially
  • Ability to detect denial of service attacks from malicious clients
  • Ability to gracefully handle unexpected usage spikes via load shedding, re-balancing, deferring non-critical tasks, etc.
  • Detecting failures that impact individual operations as well as services as a whole
  • Dealing with unavailable downstream dependencies
  • Leveraging time outs when integrating with one or more dependencies
  • Automatically recovering from component failures

In future posts, I will expand on each of these topics covering both design and implementation strategies. It is also important to point out that both these aspects are heavily interconnected and influence each other.


Tips When Authoring Web Service Clients

November 3, 2013

Here are some tips when authoring web service clients:

  1. Decouple connectivity from request construction. This will isolate variations in input construction and the mechanics of service invocation cleanly separated. Additionally, the request construction might depend on the particular resource – e.g. they can be set of query string parameters or a more complex object structure.
  2. Connectivity logic should encapsulate the service URL and automatic-retry considerations. The client can automatically retry GET requests specified number of times if invocation encounters a connection timeout. It should also ensure response is OK (either via HTTP status codes or by examining appropriate response-specific data structures).
  3. Don’t swallow exceptions – the service might return a resource not found or an internal server error – the code that is using the client should be given the flexibility to deal with these exceptions appropriately – the client code shouldn’t assume or mask these exceptions. When in doubt, don’t suppress runtime exceptions.
  4. Decouple domain logic from service client – domain logic might dictate whether or not a service call needs to be made, or the nature of input resource data, etc. – this logic is more likely to change per the consuming application’s evolving requirements and shouldn’t be hosting service invocation code in the same class.
  5. Provide reusable API hooks for addressing cross-cutting concerns – such as response time capture and input and output messages – if you want to report response time trends when invoking a service you will not want to clutter this all over the consuming application’s codebase – the client can and should centralize these.

Remember the above is useful whether you are consuming a service or providing clients for your prospective service consumers.


Client Integration Mini-Checklist for Services

May 27, 2012

Working with clients who are consuming your services? Here is a mini-checklist of questions to ask:

  1. While executing request/reply on the service interface is there a timeout value set on the call?
  2. Is there code/logic to handle SOAP Faults /system exceptions when invoking the service?
  3. Is building service header separated from the payload? This will facilitate reuse across services that share common header parameters
  4. If there are certain error codes that the calling code can handle, is there logic for each of them?
  5. Is the physical end point information (URL string for HTTP, Queue connection and name for MQ/EMS) stored in an external configuration file?
  6. Is UTF-8 encoding used while sending XML requests to the service i.e. by making use of platform-specific UTF encoding objects?
  7. If using form-encoding are unsafe characters such as ‘&’, ‘+’, ‘@’ escaped using appropriate %xx (hexadecimal) values?
  8. While processing the service response is the logic for parsing/processing SOAP and service-specific headers decoupled from processing the business data elements?
  9. Is the entire request/reply operation – invocation and response handling logic – encapsulated into its own class or method call?
  10. While performing testing, is the appropriate testing environment URL/queue manager being used?
  11. Is a valid correlation id being used in the service request? This is very essential for aynchronous request/reply over JMS (JMS Header) or HTTP (callback handler)

Have a Reuse Strategy for Business Process Integrations

January 29, 2012

When implementing process automation initiatives, it is important to have a reuse strategy – why? Because, the process flows are a rich minefield for reusing services and common interfaces across a variety of use cases. It can also act as a service provider for other teams to invoke/integrate a common set of processing flows.

Host business process definitions and instances

  • Provide a modeling and execution environment for designing and implementing business processes
  • Implement a generic data structure for manipulating & orchestrating workflow state
  • Provide the ability to reuse a workflow patterns across business processes. E.g. enable reuse via sub-processes, process extension points, etc.
  • Provide the ability to access and orchestrate activities requiring interaction with data services and business rules, and legacy services

Act as services consumer & provider

  • Host process orchestrations, while consuming persistence, validation, and security services
  • Abstract legacy capabilities and reduce tight coupling between internal systems
  • Publish and consume business events to reduce application to application coupling

 

Evolve a reusable asset catalog

  • Ensure technology components and APIs have domain relevance – data, events, and relationships are fundamental abstractions need to be brought together
  • Reduce learning curve for application developers to identify, evaluate, and integrate process definitions and services from a library of reusable assets

Detect Service Availability Issues Before Your Clients Do

January 17, 2012

When service capabilities get reused across applications and processes, high availability becomes imperative – key question: do you detect availability issues before your clients do? This is important for several reasons:

  • Unlike stand alone applications/processes, shared services impact several consumers. Not every consumer might be okay with your service being unavailable for an extended period of time. The same service might be in the critical path for some and not so much for others
  • For some service capabilities, running them in a partial mode might be acceptable – e.g.  operating out of a cached copy of data rather than fetching it from a live database, or servicing only read only operations during an unexpected outage, etc.
  • Some consumers might have regulatory processes that are dependent on services being available – a service being unavailable might cause SLA breaches

Finally, consumer trust is key for systematic reuse – if they perceive service availability as a limiting factor, it will be harder to convince them to use services – including current and upcoming integrations


Governance Enables Service Reuse – New Podcast Episode

December 27, 2011
Want to listen using iTunes?

Got iTunes?

podcast

New episode added to the Software Reuse Podcast Series on service governance covering design, implementation, testing, and provisioning and how they enable reuse.

Like this post? Subscribe to RSS feed or get blog updates via email.


Track Service Reuse Metrics

December 24, 2011

Service driven systematic reuse takes conscious design decisions, governance, and disciplined execution – project after project. In order to sustain long running efforts such as service orientation, it is critical to track, report, and get buy-in from senior management in the organization. So what metrics are useful? Here are a few:

  • Total number of service operations reused in a time period
  • Total effort saved due to systematic reuse in a time period
  • Number of new service consumers in a time period
  • Number of new consumer integrations in a time period (this includes integrations from both new and existing consumer
  • Service integrations across transports/interface points (for instance, the service operation could be accessed SOAP over HTTP, or as SOAP over JMS, or REST, etc.)

What metrics do your teams track?


%d bloggers like this: