So you want to write a poller…

Posted in Fault Mgmt & Monitoring on January 17th, 2013 No Comments

INTRODUCTION

So you want to write a poller… my first word of advice – don’t. At least don’t if this is anything more than an academic exercise. A quick Google of poller along with the source code of your choice will provide a dozen of fully functioning examples.

However, if you decide not to listen to me and your ADHD (80% of us in this industry have it) isn’t blocking you from reading further, then these are the major considerations. After you have reviewed these you can move onto the more tactical article “Poller? Show me the code..” which discusses and provides the code for a simple SNMP asynch poller: http://bit.ly/109lasJ

1. Module & Scoped: A poller is a slippery slope because it naturally bleeds into other things. Often discovery mechanisms both across the network as well as finger printing are coded into the poller when really they should be abstracted out. Further, polling for network, systems, and applications are created redundantly as well as fault, performance, and configuration. Again all these can utilize the same mechanism as long as the “meaning” determination is abstracted out. Finally different protocols from ICMP to SNMP to SSH will have their own poller. Again, not necessary. Having these divisions not only adds additional overhead it makes it more difficult to correlate and orchestrate a holistic solution. So what does this translate to exactly? Well create a poller that feeds a memory resident database or the equivalent where outside clients can request what to poll and the poller does its best to determine what gets polled by whom and when. Any sort of finite automata action beyond filtering out duplicate requests is left for the elements above the poller.

2. Asynchronous as low as you can go: One way to view the line between user and kernel is synchronicity. No, not in the context of the Police’s last album. This is in the context of programming in a predictable step 1, step 2, etc way verses the kernel reacting to reality – i.e. a user striking the key, a packet hitting the network interface, etc. When you are dealing with polling almost all the time is wasted waiting around, so you need to push that line as far down into the kernel as possible and make everything non-blocking. This meaning doing “threads” is nothing short of a cop out. Sure it is easier to write a synchronous program to handle poller connections and abstract the asynch handling to the language, but it isn’t efficient. What is required is mounting raw sockets and using multiplexing I/O. Most solid languages can handle this from PERL, C, to PYTHON. It is likely JAVA and .NET can even handle this, but there are issues there. Finally, just because a protocol like TCP or HTTP over TCP is synchronous, it doesn’t mean it has to be handled that way. Each stage should be handled asynchronous and not tie up a process or socket. Obviously this is a bit more work, but you did decide to write a poller yourself.

3. Code reuse. Make sure to leverage existing code. NET-SNMP, FPING, Cacti all have good C based async pollers. NET-SNMP, looperng have good examples in PERL. Finally, Python is strangely rich with examples from async web browsers to the medusa poller. Another area that should definitely be explored is message queue approach especially since it enables distributed polling locations and collectors. Both ZeroMQ (0mq) and zmq and rabbitmq are good starting points.

4. Right API and features. The API is the most important as the poller is at the bottom of the stack in NMS and any change to the interface will require a code overhaul. This means plan for distributed polling and collection and possible multiple point collection. Also allow for alternate protocol binding as the engine should handle everything from ICMP to SNMP to SSH and HTTPS. Also handle alternate destination bindings such as ports as well as hosts and IP addresses. This allows the solution to morph into system and application monitoring. In addition security credentials and encodings should be considered within the API as well as “hooks” for unanticipated features. (This is similar to IOCTRL hook at the OS level in UNIX S5R4 based streams.)

5. Asynchronous collection. That is, usually folks spew out the packets, wait a while, and then mark any still outstanding packets as “dead”. Obviously this limits the number of packets that can go out as well as the polling timeout possibilities for packets. Further, it also creates spikes on the network which can “fry” WAN links and other low bandwidth network connections.

6. Brittle DNS. One of the most common failures of even advanced pollers is their reliance on DNS. Usually DNS resolution is outside the control of the NMS / APM/ SMS groups. Further, it is synchronous and if the resolvers are not set up correctly, the poller can sit and spin for a minute on a handful of devices that do not have proper DNS resolution. As such at least the option to make DNS brittle as well as engineer the rest of the monitoring application to rely on something other than host name is critical.

7. IP Plumbing. Service providers and most large customers are aggregates. That is, they have multiple overlapping private IP ranges. As such the ability to poll locally and encapsulate the results back to the database becomes critical. Often folks use NAT but that becomes complicated when alerting the end customer as the customer will not recognize the NATed addresses and thus reverse resolution for notifications is necessary. As such, the easiest method is to treat the IP address as data, not control data when piping it back to the overall database used for display.

A second area in IP Plumbing is filters and firewalls. This is always and issue with NMS as it is diametrically opposed to the efforts of security. As such, the poller has to be adaptable on at least a node+protocol level to change ports as required. Also sometimes protocol encapsulation, encoding, and translation along with the application of a few proxy agents is required to get the job done.

SUMMARY

Obviously there is more “fleshing out” required, but these are the major gotchas and they will prevent you from following the path of nearly every predecessor as usually pollers are written by the young, highly motivated, intelligent engineers who lack any experience. As a result the same mistakes are made over and over again since the young never listen to their elders. 😎

So if you’ve fought your ADHD and made it this far with the intangible platitudes of a senior engineer, then you deserve something tangible. Take a look at the post “Poller? Show me the code…” which provides a leg up on writing a proper poller: http://bit.ly/109lasJ However the code isn’t perfect since at your age I didn’t have the patience or foresight to read and not reinvent the wheel. However, I did dodge many of the classic mistakes and the code is structured such that it can be updated into a “proper” world class poller of this century, not the last. 😎

Categories: Fault Mgmt & Monitoring

SeniorCenter

So you want to write a poller…

Like this:

Related

Leave a Reply

Recent Posts

Categories

Links

Archives

Email Us

SeniorCenter

So you want to write a poller…

Share this:

Like this:

Related

Leave a Reply

Recent Posts

Categories

Links

Archives

Email Us