Do you know your response times?

The complexity of modern property and casualty core systems can seem daunting. While the business value being delivered is quite high, it comes at the cost of inherent complexity needed in the core system as well as the integration and communications between systems.

As a result, one of the critical aspects of software evaluation has nothing to do with functional capability, but instead how the application support team can keep the system performing at a high level.

I think it boils down to knowing the answers to two questions

Are my systems stable and running smoothly right now?

If not, what’s causing the problem?

If you have the tools and knowledge to answer these questions quickly and reliably you will sleep well at night.

Let’s start with the first one. Usually an IT architect will quickly identify what I would call lower-level monitoring, including CPU and memory usage, frequency and duration of garbage collection cycles and others. These metrics are widely-used and definitely vital, but they alone cannot always tell you how things are working.

There is another way to look at this. This question might best be viewed through the eyes of your system users themselves, typically a person performing business functions by locating the data they need, navigating through screens and completing transactions or other tasks. We can’t forget about all of those automated integrations as well, where other systems need to complete similar actions through a variety of interfaces and mechanisms.

One of the first steps you can take to really start understanding and measuring performance is to understand the response times these users are experiencing. From a human’s perspective this is the total round-trip from the time they click the button until the time they see the result of their action. Poor or degrading response times are one of the first signs that something is wrong.

Once you know there is an issue, or think there might be one soon, you can quickly switch over to the use of profiling tools which tell you, for a given operation, where the time is being spent. Other tools like Oracle AWR reports and Microsoft SQL Server Dynamic Management Views (DMVs) can also be used to zoom in on database-specific problem areas. This combination is going to help you answer the second question.

Some of our customers have had great success taking a response-time approach to always-on monitoring.

One customer reported degraded performance, but had no idea when it occurred or for what actions or data, let alone why. Local CPU utilization and garbage collection logs did not indicate any problems with the infrastructure. We asked the customer to install our Response Time Monitor tool, so those unusually long response times could be measured. Using the measured response times, it was determined that most of long responses involved a custom transaction search screen which had been added to the configuration. At this point the Guidewire Profiler tool was used to capture what was happening during this operation. Some very expensive database queries were occurring, which were then traced to the custom code which powered the search. This code was refactored with assistance from Guidewire, and acceptable performance was restored.