SNMP4j stuck in a loop when agent doesn't follow lexicographic ordering

We are querying a table for which the agent responds in a loop (Not in lexicographic order).
As per the documentation, I understand that this behavior is fixed in 2.5.10 release.
From the Changelog:

2018-01-05] Version 2.5.10:

  • Fixed [SFJ-161]: TableUtils does not check for lexicographic ordering in SNMP4J 2.5.9 which could
    cause endless looping with incorrectly implemented agents. The ordering checking can be now disabled
    but is enabled by default.

But this is not working. SNMP4j is stuck forever in this scenario.
Can you help us in understanding if this scenario should be supported by SNMP4J?

Example Response:
Oid order

1.2.0
1.2.1

1.2.6
1.2.0
1.2.1

1.2.6

Name/OID: lldpLocPortIdSubtype.0; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.1; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.2; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.3; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.4; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.5; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.6; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.0; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.1; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.2; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.3; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.4; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.5; Value (Integer): interfaceName (5)
Name/OID: lldpLocPortIdSubtype.6; Value (Integer): interfaceName (5)

Hi,
Which SNMP4J version are you using?
In 2.5.11 there was a change to limiting the retrieval stop to 3 consecutive lexical ordering errors by default. In your example, there is only a single error returned by the agent.

[2018-01-05] Version 2.5.11:

  • Improved [SFJ-162]: TableUtils now waits until three (3) primary lexicographic ordering errors occurred and returns all rows until then. Rows that contain cell values based on incorrectly order data will be returned now with status TableEvent.STATUS_WRONG_ORDER. That state will be also set in the finishing TableEvent then.

Maybe you need to set TableUtils.setIgnoreMaxLexicographicRowOrderingErrors to 0 in order to exit the loop earliest.

We are using SNMP4j 2.8.5. TableUtils.getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) method is stuck in wait().

On further debugging, I found the following condition never returns true for the above scenario.

rowCache.getFirst()).getRowIndex().compareTo(
                  lastMinIndex) < 0)

The first index is always 1.2.0 (The responses are sorted and added to the rows) and the lastMinIndex is also 1.2.0. Due to this it always returns false.

The error handling code is present inside the while loop which never gets invoked.
TableUtils Line 658:
while ((rowCache.size() > 0) &&
((rowCache.getFirst()).getNumComplete() ==
columnOIDs.length) &&
// make sure, row is not prematurely deemed complete
(receivedInOrder) &&
((lastMinIndex == null) ||
((rowCache.getFirst()).getRowIndex().compareTo(
lastMinIndex) < 0))) {

Yes, that is indeed a bug that occurs when the lexicographic loop is hitting the first row still in the row cache and if that row itself is starting the loop.

To fix that, use the following while loop condition instead:

while (((firstCacheRow = (rowCache.isEmpty()) ? null : rowCache.getFirst()) != null) &&
       (firstCacheRow.getNumComplete() == columnOIDs.length) &&
       // make sure, row is not prematurely deemed complete
       (receivedInOrder) &&
       ((lastMinIndex == null) || firstCacheRow.orderError ||
        (firstCacheRow.getRowIndex().compareTo(lastMinIndex) < 0))) {

The next SNMP4J versions 2.8.7 and 3.4.5 will have this fix included.

Thank you for the solution.

Also I see that TableUtils.getTable() waits forever. Due to this in case of these unforeseen circumstances, the thread is stuck forever. I think it would be good to perform a time-bound wait and throw a timeout error if the response is not received within that timeout period.

Let me know your thoughts on this.

Sure, the wait is potentially risky, but life too :wink:

The problem is, that the wait has to wait for an arbitrary number of subsequent requests. The maximum timeout to wait, cannot be derived from the Target.getTimeout() value. A separate timeout is necessary.
I will provide an optional timeout parameter for the next update to be able to control that limit if needed.

To avoid the wait, you could use the asynchronous getTable call too.

Which version have this parameter ,please?Now I have the same problem.

In SNMP4J 3.7.0 you can use:
https://www.agentpp.com/doc/snmp4j/org/snmp4j/util/TableUtils.html#getTable-org.snmp4j.Target-org.snmp4j.smi.OID:A-org.snmp4j.smi.OID-org.snmp4j.smi.OID-long-

What’s the difference between Target.getTimeout() and getTable​(Target<?> target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex, long timeoutSeconds) ? Should Target.getTimeout() always be bigger then gettable timout?

No, the opposite is true. But you can try it out of course.

Do the snmp4j have any version with an optional timeout parameter in JDK8? When I update the version in 3.7.0,it runs error:org/snmp4j/smi/Variable has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0

No, only SNMP4J 3.4.5 or later has this method with the extra timeout.

But nevertheless, you can use the asynchronous getTable call too in SNMP4J 2.8.x (as already noted above).

1.Can you tell me where I can get the asynchronous getTable call example? I find the example is different ,some use TableListener ,but some use ResponseListener?
2.If use asynchronous getTable call,should I still add a timeout parameter to ,such as latch.await() to get the finally result? I do want the thread to loop forever.