Why monitoring is hard

(and why your vendor will only sell you tools, not solutions)

Internet wiring

Intro

Monitoring infrastructure in a meaningful way is important to any IT operations, yet it is hard to realize. Many vendors adress this problem and promise a silver bullet.

Infrastructure

Infrastructure supporting hundreds of thousands of users is the big vision of many organizations. Most won’t have these amounts of users, and those who have grew these over years. And so did their infrastructure, which makes most of the architecture discussions you’ll find on the internet redundant. The architecture is many years old, for these cases, having grown to support the increasing requirements.Over time new solutions were chosen to solve a problem short notice.

Diversity

A typical infrastructure in three tier architecture consist of plenty of products. Coming from the internet, you’ll find Cisco Routers, F5 Loadbalancers, Juniper Firewalls, Checkpoint IPS, HP Switches, Dell Servers, Oracle Databases and BEA Application servers. Not to mention specialized session management software, Business Intelligence or functionality specific to anything else.

Having a product zoo consisting of 30 brands of 15 vendors makes central management, say, challanging. Maintaining knowledge and experience for such a diverse platform poses a huge issues for any commercial organization. Shared resources will allow to supoprt multiple installations.

External Service Providers

Service providers are in the position to educate product specialists, that may support one product in great depth, solving the need for experience. This is the way to follow, commercially. Sharing. Hardware, Software, Resources, Experience. It allows more efficient usage of scarce resources. However, many parties being involved raise the transaction cost for any decision to be made. Every vendor needs to provide feedback, share information and insight. Somebody else needs to manage information flow and track feedback, thus adding to the bill. The sum of available information, and not to neglect, opinion grows.

Piles of information

Out of all these available sources, anything can be read. But in practice, monitoring will be limited to the obvious meaningful metrics. Not everything that can be measured needs to be useful for interpretation. Not every correlation may be a causation. The human approach is to collect what can be understood, limiting the data that is collected to meaningful sources. That may be the number of logged in users, CPU or memory, DB transactions per second.

and the Interpretation thereof

Any of this information may have a different interpretation, in it’s own context. UserToken/Second may not be very useful, when Login/Second is just as meaningful. Memory usage won’t have a meaning unless the total available % is available. And, CPU load may be useless, when memory may be exceeded but runs unnoticed. Hits my be irrelevant while bandwidth is exceeded. Each individual system in the architecture may have other parameters that need attention.

Reality and outlook

Vendors pick up this need and offer products to deal with this Big Data problem. Most prominently there are Splunk, LogRythm or LogLogicbut also traditional DB vendors like Oracle or SAP will offer products to deal with this challange. They will not offer any means to deal with the diversity of products, metrics, vendors or service providers. Architecture and process to allow these products demonstrate their best value is still an investment the IT department has to make. In the end it is only tools that can be utilized to build a better architecture, that has transparency built in.