SNMP4j odd behaviour in async TableUtils.getTable

Hi!

My scenario is this - I am bulk walking if/ifX tables, nothing exotic there. There are 11 OIDs to fetch, and around 1K indexes.

I discovered that in one of my production environments - the code was missing polled data, due to the fact that maxColumnsPerPDU was set to 2 (nothing wrong with that I believe?)

I debugged further - and sure enough found that the TableEvents passed to my TableListener next() callback often had nulls for some of the columns. I also spotted that for some indexes, next() was called many many times - with repeat varbind information.

I tried a few ways of fixing/working around the bug, and found:

  • The synchronous version of getTable does not suffer the same issue
  • When maxColumnsPerPDU is set to >= the number of OIDs being fetched, no issue.
    This isn’t ideal, as I have to walk some quite under-powered agents that don’t like big requests.
  • Changing the sparse mode to dense/using getDenseTable API fixes the issue. This isn’t workable for me, as I do have sparse tables to walk sometimes.

I tried upgrading to the latest code (3.7.0) in case it was an old bug - no joy there.
I tried swapping back from Multithreaded MessageDispatcher to regular kind, no change.

Any advice on how to fix/what else to check would be greatly appreciated!
Many thanks,

Marcus

Without wishing to reply to my own post(!) - I did stumble across what would seem to be a very similar description of my issue here: [SNMP4J] max-bindings with big tables

Best Regards,

Marcus

If that is indeed the case, then there is not a problem with the code, because the synchronous getTable is using the asynchronous getTable internally :innocent:

Hi @AGENTPP - thanks for your very prompt reply!
I had indeed looked at the InternalTableListener implementation, as you say, and I scratched my head wondering why it wasn’t suffering as my listener was.

I’m ashamed to say, I found the culprit code :man_facepalming.
My listener implementation was mutating the VariableBinding references received, by trimming the index off the varbind OID so they can be MIB translated. It took me longer than I would have liked to track it down - but at least an easy fix, clone the original before doing any mutations! It was just unfortunate for me that I got away with that code for so long, and it was only in a very niche scenario that it bit back!

Thanks again for the reply and contributing to such a useful library!

Best Regards,

Marcus

1 Like