The inform retry mechanism fails when reusing the Snmp object

DanMo · April 27, 2022, 3:24pm

I am using the snmp4j:2.8.10 and I am creating one Snmp object that I am reusing. Everything works fine in my implementation with the exception of the retry mechanism for informs.

For the first inform it works, as well as for the case in which my app is sending many informs at short intervals (smaller than the retry timeout) but after the last retry is sent the retry mechanism will start to fail if an inform is sent at a later time.

Decompiling the code of the snmp4j I found out that when we create the Snmp object and send a message, a java.util.Timer is created and that uses a TimerThread which is kept alive only until all the scheduled tasks are completed after which the Thread will no longer accept the scheduling of new tasks to be performed and this is causing the retry mechanism to fail.

If the Timer would be handled a little different by the Snmp object, this wouldn’t be a problem but apparently, the Snmp creates just one timer and it uses it while it accepts new tasks after which it never recreates it.

My question is, what is the recommended/intended way of reusing the Snmp object to avoid this problem ?

AGENTPP · April 27, 2022, 9:33pm

I cannot follow your arguments for the following reason as long as you are using the DefaultTimerFactory (which is the default):

The cancel() method of the Timer is never called in the SNMP4J code except in the Snmp.close() method.

After a call to Snmp.close() any call to Snmp.send(..) for a confirmed PDU will create a new timer and start it.

If the Timer’s internal thread ends prematurely, then probably an exception occurred while running the scheduled task. Most likely this happens in the callback method ResponseListener.onResponse(..) users of the asynchronous PDU sending methods implement. Does this happen in your case?

DanMo · April 28, 2022, 1:32pm

A big thank you for your suggestion.
Indeed, an error in the onResponse caused this.

Best regards!

AGENTPP · April 28, 2022, 1:38pm

Version 2.8.11 and 3.7.0 will not rethrow exceptions and errors in the timer code anymore. It will log them however as before.
That might make it more difficult to find the root cause but will make the retry handling more resilient in production.

DanMo · April 30, 2022, 10:35am

Thanks for the support.
I just noticed that because there was an error in my ResponseListener.onResponse(…) I also had a memory leak.