Make Services Fault Tolerant & Supportable for Production Use

January 3, 2015

A lot of teams are building services for clients both internal and external to your organization. Typically, there is quite a bit of focus on succeeding from a functional sense – did we get the key requirements addressed? does it cover the plethora of rules across markets / jurisdictions? and so on. Some of the more experienced teams, consider the non-functional aspects as well – e.g. logging, auditing, exception handling, metrics, etc. and I talked about the value of service mediation for addressing these in an earlier post.

There is an expanded set of capabilities that are also necessary when addressing non-functional requirements – those that are very relevant specially when your service grows in popularity and usage. They fall under two categories: operational agility and fault-tolerance.  Here are a few candidate capabilities in both these categories:

Operational Agility / Supportability:

  • Ability to enable / disable both services and operations within a service
  • Ability to provision additional service instances based on demand (elastic scaling)
  • Maintenance APIs for managing resources (reset connection pool, individual connections, clear cached data, etc.)
  • Ability to view Service APIs that are breaching and ones that are the risk of breaching operational SLAs
  • Model and detect out of band behavior with respect to resource consumption, transaction volumes, usage trends during a time period etc.

Fault Tolerance:

  • Failing fast when there is no point executing operations partially
  • Ability to detect denial of service attacks from malicious clients
  • Ability to gracefully handle unexpected usage spikes via load shedding, re-balancing, deferring non-critical tasks, etc.
  • Detecting failures that impact individual operations as well as services as a whole
  • Dealing with unavailable downstream dependencies
  • Leveraging time outs when integrating with one or more dependencies
  • Automatically recovering from component failures

In future posts, I will expand on each of these topics covering both design and implementation strategies. It is also important to point out that both these aspects are heavily interconnected and influence each other.


Tips When Authoring Web Service Clients

November 3, 2013

Here are some tips when authoring web service clients:

  1. Decouple connectivity from request construction. This will isolate variations in input construction and the mechanics of service invocation cleanly separated. Additionally, the request construction might depend on the particular resource – e.g. they can be set of query string parameters or a more complex object structure.
  2. Connectivity logic should encapsulate the service URL and automatic-retry considerations. The client can automatically retry GET requests specified number of times if invocation encounters a connection timeout. It should also ensure response is OK (either via HTTP status codes or by examining appropriate response-specific data structures).
  3. Don’t swallow exceptions – the service might return a resource not found or an internal server error – the code that is using the client should be given the flexibility to deal with these exceptions appropriately – the client code shouldn’t assume or mask these exceptions. When in doubt, don’t suppress runtime exceptions.
  4. Decouple domain logic from service client – domain logic might dictate whether or not a service call needs to be made, or the nature of input resource data, etc. – this logic is more likely to change per the consuming application’s evolving requirements and shouldn’t be hosting service invocation code in the same class.
  5. Provide reusable API hooks for addressing cross-cutting concerns – such as response time capture and input and output messages – if you want to report response time trends when invoking a service you will not want to clutter this all over the consuming application’s codebase – the client can and should centralize these.

Remember the above is useful whether you are consuming a service or providing clients for your prospective service consumers.


Client Integration Mini-Checklist for Services

May 27, 2012

Working with clients who are consuming your services? Here is a mini-checklist of questions to ask:

  1. While executing request/reply on the service interface is there a timeout value set on the call?
  2. Is there code/logic to handle SOAP Faults /system exceptions when invoking the service?
  3. Is building service header separated from the payload? This will facilitate reuse across services that share common header parameters
  4. If there are certain error codes that the calling code can handle, is there logic for each of them?
  5. Is the physical end point information (URL string for HTTP, Queue connection and name for MQ/EMS) stored in an external configuration file?
  6. Is UTF-8 encoding used while sending XML requests to the service i.e. by making use of platform-specific UTF encoding objects?
  7. If using form-encoding are unsafe characters such as ‘&’, ‘+’, ‘@’ escaped using appropriate %xx (hexadecimal) values?
  8. While processing the service response is the logic for parsing/processing SOAP and service-specific headers decoupled from processing the business data elements?
  9. Is the entire request/reply operation – invocation and response handling logic – encapsulated into its own class or method call?
  10. While performing testing, is the appropriate testing environment URL/queue manager being used?
  11. Is a valid correlation id being used in the service request? This is very essential for aynchronous request/reply over JMS (JMS Header) or HTTP (callback handler)

Have a Reuse Strategy for Business Process Integrations

January 29, 2012

When implementing process automation initiatives, it is important to have a reuse strategy – why? Because, the process flows are a rich minefield for reusing services and common interfaces across a variety of use cases. It can also act as a service provider for other teams to invoke/integrate a common set of processing flows.

Host business process definitions and instances

  • Provide a modeling and execution environment for designing and implementing business processes
  • Implement a generic data structure for manipulating & orchestrating workflow state
  • Provide the ability to reuse a workflow patterns across business processes. E.g. enable reuse via sub-processes, process extension points, etc.
  • Provide the ability to access and orchestrate activities requiring interaction with data services and business rules, and legacy services

Act as services consumer & provider

  • Host process orchestrations, while consuming persistence, validation, and security services
  • Abstract legacy capabilities and reduce tight coupling between internal systems
  • Publish and consume business events to reduce application to application coupling

 

Evolve a reusable asset catalog

  • Ensure technology components and APIs have domain relevance – data, events, and relationships are fundamental abstractions need to be brought together
  • Reduce learning curve for application developers to identify, evaluate, and integrate process definitions and services from a library of reusable assets

Detect Service Availability Issues Before Your Clients Do

January 17, 2012

When service capabilities get reused across applications and processes, high availability becomes imperative – key question: do you detect availability issues before your clients do? This is important for several reasons:

  • Unlike stand alone applications/processes, shared services impact several consumers. Not every consumer might be okay with your service being unavailable for an extended period of time. The same service might be in the critical path for some and not so much for others
  • For some service capabilities, running them in a partial mode might be acceptable – e.g.  operating out of a cached copy of data rather than fetching it from a live database, or servicing only read only operations during an unexpected outage, etc.
  • Some consumers might have regulatory processes that are dependent on services being available – a service being unavailable might cause SLA breaches

Finally, consumer trust is key for systematic reuse – if they perceive service availability as a limiting factor, it will be harder to convince them to use services – including current and upcoming integrations


Governance Enables Service Reuse – New Podcast Episode

December 27, 2011
Want to listen using iTunes?

Got iTunes?

podcast

New episode added to the Software Reuse Podcast Series on service governance covering design, implementation, testing, and provisioning and how they enable reuse.

Like this post? Subscribe to RSS feed or get blog updates via email.


Track Service Reuse Metrics

December 24, 2011

Service driven systematic reuse takes conscious design decisions, governance, and disciplined execution – project after project. In order to sustain long running efforts such as service orientation, it is critical to track, report, and get buy-in from senior management in the organization. So what metrics are useful? Here are a few:

  • Total number of service operations reused in a time period
  • Total effort saved due to systematic reuse in a time period
  • Number of new service consumers in a time period
  • Number of new consumer integrations in a time period (this includes integrations from both new and existing consumer
  • Service integrations across transports/interface points (for instance, the service operation could be accessed SOAP over HTTP, or as SOAP over JMS, or REST, etc.)

What metrics do your teams track?


5 Service Governance Practices for Effective Reuse

December 24, 2011

Pursuing service based systematic reuse or business process development? Then, these five practices will help your teams achieve increased level of service reuse.

  1. Manage a common set of domain objects that are leveraged across service capabilities. This could be a library of objects (e.g. Plain Old Java Objects) or XML Schema definitions or both. Depending on the number of service consumers and the complexity in the domain, there will be need for supporting multiple concurrent versions of these objects.
  2. Provide common utilities for not only service development but WSDL generation, integration and performance testing, and ensure interoperability issues are addressed
  3. Appropriate functional experts are driving the service’s requirements and common capabilities across business processes are identified early in the software development lifecycle
  4. Governance model guidelines are clearly documented and communicated  – for example, there are a class of changes that can be made to a public interface such as a WSDL that don’t impact existing service clients and there are some that do.
  5. Performance testing needs to be done not only during development but during service provisioning – i.e. integrating a new service consumer. If your teams aren’t careful, one heavy volume consumer, can overwhelm a service impacting both new and existing consumers. Execute performance testing in an automated fashion – every time you integrate with a new client to reduce risks of breaching required SLAs

What additional practices do your teams follow?


10 Signs Services Are Accumulating Technical Debt

November 6, 2011

Your teams are busy building services and service enabled processes – great! – how do you know if these services are built at the appropriate level of quality? Here are ten signs that your services might be accumulating technical debt:

  1. Service contracts are modeled for a specific consumer, and/or exposes technical implementation details (e.g. service interfaces that force the client to set ‘default’ values on legacy system attributes).
  2. New clients are integrated to services without doing performance testing – this increases the likelihood of sudden spike in volume and consequently the risk of breaching SLAs
  3. Each service is implemented using an ad-hoc set of technologies, design patterns, and idioms – if you are starting to see the same functionality over and over being implemented across modules that’s a sure sign!
  4. Service dependencies are not captured and managed – each service uses a rat’s nest of dependencies causing classpath conflicts and maintenance burden when updating versions.
  5. Deployments are manual – binaries and configurations are assembled and made available via manual steps – automated deployment scripts either don’t exist or they are out of date
  6. Exceptions are not handled consistently – depending on the nature of the exception your service might need manual support intervention, adjustment to resources, and/or targeted alerts.
  7. Services are not reusing business object definitions and introduce redundant definitions instead
  8. WSDLs don’t import schemas and instead define them in-line – this might be easier to implement to start with but will cause a maintenance burden over time.
  9. Context information is not shared when implementing service to service interactions – as more reuse happens across services, it becomes essential to share context data among them. It will make authorization, logging, and integration much simpler
  10. Service business logic is in end point classes and not encapsulated well – if your service endpoints contain any logic beyond data transformation, question it to make sure that it really belongs there. Don’t implement validation rules, defaulting logic, or complex domain rules in them

In the upcoming post, I will elaborate on each of these above to provide concrete examples. Are there other signs you can think of?


Enabling Self-Service for Easier Service Integrations

May 31, 2011

There are a number of benefits with exposing services and enabling application to application integration through them. However, when the number of services increase, the support and integration effort involved goes up as well – moving to a self-service model for service integrations makes it simpler and faster for clients, cheaper and more reliable for the service provider. Here are a few tips to make this happen:

  • Ensure all service interfaces are URL accessible – e.g. WSDL, associated XSD schemas etc. are hosted in a consistent structure that tools like WSDL2Java or wsdl.exe can point to and generate client side code
  • Templatize the WSDL and XSD to capture client-specific information such as application identifier and name (e.g. if your service interface mandates these values, you can default them to use the appropriate client-specific value).
  • Capture authentication and authorization requirements and similar to above point, generate code to set appropriate message headers (e.g. SOAP Header or JMS Header as appropriate) – this will be common to most clients except the exact credentials will vary
  • Keep documentation up to date as part of the build and deployment process – e.g. if you are deploying services to a client facing environment (e.g. integration testing), deploy documentation as well as part of the exercise. For instance, generate javadoc style XML schema documentation and maven site-deploy target can be leveraged to upload the generated artifacts to a web application.
  • Capture and maintain service metadata – this metadata should not only cover functional aspects but non-functional attributes as well. For instance, if your service cannot honor a response time requirement, it is critical to know that during the integration effort and not after the client has migrated their code to a production environment. This could translate to a service enhancement or an alternate integration approach altogether

This is by no means an exhaustive list but meant to give a window into the opportunities and benefits that self-service can provide.


%d bloggers like this: