I often get asked this question especially in regards to the best NMS or Application Management solution but the answer is loaded due to 3 strategic issues with the space:
1. “Tool only” solutions are limited due to the industries “security stance toward monitoring” (see http://bit.ly/mVuHSq) Thus, Organizational Structure, Business Processes, and Knowledge Management are required to complete the end-to-end solution.
2. The normal NMS Industry cycle limits actual evolution of solutions. As a result despite more widgets and toys and new jargon, very little progress has occurred in the last 20 years. (see http://bit.ly/nnPvAt)
3. The problems with Virtual Cloud monitoring are akin to the problems with outsourcing NMS monitoring to Amazon, RackSpace, Savvis, etc. (I haven’t got around to writing this one up yet. 😎
Beyond that you have 6 flavors. Think of this problem as requiring a tool chest, not a hammer. The correct solution varies widely depending on whether you have a nail or a board or both. The reason for the diversity required is the strategic issues mentioned above prevent viewing NMS as a “commodity” project despite what Gartner, Forester, and others are pushing. Anyway here they are:
1. Large players: IBM Tivoli, CA, HP, BMC. Most fortune 1000 companies buy into one of these spaces and they might have another from a legacy environment under a past VP administration. The one exception is ticketing which is nearly exclusively BMC Remedy.
2. Parasitic players: Monolith, Rivermuse, SevOne, LogMatrix, etc. These generally are aimed at pricing themselves within the maintenance fee of the bigger players. Some are very good and they are generally more nibble. The cost is risk. Of course this is hard to assess because #1 to #2 is more of a spectrum. Some like SevOne are getting quite big. Some of these such as Spiceworks and Solarwinds have OpenSource versions of their tools as they try to leverage the try and buy market or services market.
3. Parts players: Mimic, Tavve, Bluestripe, Abilisoft, etc. These go after a piece of the framework of the big players. Generally their solutions are quite solid as the scope is much smaller than the #2 players. The price is mostly education and sometimes integration depending on the nitch they are going after.
4. Cloud Players: CoolAlerts, Silverback, etc. This is the typical rent vs own play. The trade off is less involvement for less control and awareness.
5. Open Source Players: OpenNMS, Nagios, etc. Many of these are good but have the same limitations as with any open source. They are not along the lines of PERL, Linux, WordPress but more like Python, Drupal, etc due to a smaller community.
6. Root Open Source Tools: NET-SNMP, nmap, etc. Ironically almost all the big players use these technologies. However they are building blocks and require a lot of work to make effective.
So, what the punchline? Are there any NMS / APM / ITSM / BSM / monitoring / event management products that you tolerate?
This article seems critical and pretty negative of most options out there. What’s the message here?
OMG. Great question. I tend to forget I am in the weeds and skip over what is obscure to most but obvious in the trenches. The short answer is: “what is the right tool” has the wrong assumptions behind it. It is like saying what is the right hammer that will ensure a good home at the end. The hammer is a minor point, in fact it isn’t often the issue – so many options avail themselves.
The real problem is our culture does not tolerate bad news the latest spin is that positive thinking alone fixes everything. The cold fact is that 80% of these solutions fail in between 1-3 years in the wild. This makes an easy career path for executives who jump every 2-3 years before the problems are recognized. However, for those of us that stay behind and fix the mess it is a bit more challenging.
Changing the message to something more positive, won’t fix the dynamics and actually prevents success. As I tell my teenager the faster you accept and adapt to the reality (which can be unfair), the better you will do and in fact thrive. This is one of those cases. There are some key strategic approaches that help:
1. This is a not a shrink-wrap, commodity business. The tool can no more build a solution than a hammer by itself build a house. This means many tools sets are actually viable if implemented correctly. IBM Tivioli and CA are currently the products of choice among fortune 1000s. OpenNMS and Nagios work as well with tinkering. However, if you think the tool is the only hammer you need you will have 80% chance of failure because you are not addressing the real problem which is the end-to-end solution involving: politics/ stakeholders, organizational structure, business processes, and knowledge management.
2. Experience over academics. With an 80% failure rate that tells you “thinking” your way out of a problem isn’t going to work. For every 10 solutions that seem good on paper, only 2 will work in the wild. So instead you cheat and find the 2 people who have already gone through that pain for the selected product and approach the problem that way. The 4 biggest challenges here are:
A. Letting go of ownership as well as the problem. (100% outsourcing or going to the cloud without proper controls.)
B. Improper vetting of the vendor. You need to do your homework and actually talk to past customers here. Ideally get past the manager and talk to the engineers and see what they are complaining about. If they complain about the interface, that is good. If they complain they cannot do their job, that is bad.
C. Not trusting the vendor you eventually select (grilling the vendor on requirements for 50% of the project.)
D. Gravitating to what you know. (i.e. burning 70% of the billable time in documentation and project management.)
4. Non-iterative feedback approach. (i.e. failure should be expected and as such create multiple iterations and a flexible approach to leverage experiences learned along the way.)
3. Keep the 80% figure in your mind. If you know that 80% of these things fail you are less likely to make unnecessary risks and at the same time be pushed to risk where you need to. For example, you will not connect the monitoring to a “bus” when only ticketing is on the other side. However, you will honestly let the vendor know you are having trouble trusting them and then work through those emotional issues to keep the project functioning as a team.
4. Do not boil the ocean. Everything is connected in this space. So it is easy to let the scope snowball. However, that always leads to failure. The best bet is to find out what the customers and stake holders are most upset about and focus on a proper fix keeping future projects in context. Usually the project go: monitoring, performance management/reporting, and then configuration/CMDB management.
This is a big topic and a big can of works but hopefully that gives a bit more context.
Under
it isn’t trendy
I agree with you 100%, you don’t mention EMC here, what are your thoughts about them, the likes of SMARTS(IONIX)?
Gillmer
As far as I know I can count the number of successful EMC SMARTS installs on both my hands. “Successful” means SMARTS in production by itself for 3 years plus. Places such as the USAF where Netcool rides on top and NetIQ takes up some slack along side doesn’t count as the SMARTS functionality is really being duplicated and shored up by these other products. However, this still means it makes honorable mention as it is really really difficult to get these things right. The industry is deceptive that way.
So what went wrong? The issue with EMC Smarts is early in its product life cycle. The life cycle of an NMS tool is two stage. Basically some start up creates the product and innovates but having young blood they often make the same mistakes that are avoided by experience. After some time the product gets absorbed by a “borg” – in this case EMC. At that point innovation comes to a grinding halt (or at least useful innovation.)
When I met with SMART executives back in 2000 pushing BPM integration into the product, one thing was clear. You got no where in this company without a PhD. This is a problem. I grew up in an academic household as a professors son and one thing I learned is academic think they are intrinsically empirical, so they tend to not be. It is what you don’t suspect that kills you.
That is, academics are less grounded and drink heavily of their own KoolAid. It is ok as a scientist for example to proclaim vaccines are safe and everyone else is stupid. Just as it is more ok to say Arabs tout guns verses disparaging some other race. In the case of EMC SMARTS this lack of introspection and humility led to even more reinvention of the wheel than normal so along with the normal “bugs” additional strategic mistakes were made. As a result the platform is exceptionally complex for what it delivers, believes it walks on water (even more than CA or IBM), and follows the Microsoft model, not the UNIX model – a death nail for fortune 500 in NMS. That is, you can get 80% of the way there with SMARTS but in almost all cases an additional 10% is absolutely required to be considered – hence successful installs model what the USAF did.
WOW again 100% spot on! Thanks for the internal insight to EMC though that makes absolute sense now. We have been using it for going 3 years now and it is becoming more apparent it is not even getting to the 80% mark, not without serious “help” from other vendor software…
Great blog!