Incorrect timeout handling (TableUtils)

Hi

Over the past months, we have been receiving reports about incomplete SNMP tables from our customers. Yesterday, we were finally able to track down the problem. We came to the conclusion that in certain situations the SNMP4J TableUtils processes table retrieval incorrectly. I’m including a tcpdump screenshot that shows the SNMP dialogue. Here is what happens:

  1. We are requesting a table with 12 columns from IF-MIB.
  2. By default, SNMP4J retrieves the first 10 columns (according to org.snmp4j.util.TableUtils.maxNumColumnsPerPDU).
  3. The SNMP agent decides to return 100 variable bindings (10 rows x 10 columns).
  4. SNMP4J then tries to retrieve the remaining 2 columns.
  5. This fails for some reason (timeout, packet-loss, inconsistent MTU configuration, …).
  6. The whole table retrieval operation should now fail with a timeout, but what happens is that SNMP4J includes null values for the two columns.
  7. If SNMP4J is really supposed to operate that way, it should continue the table retrieval, with the first 10 columns done, from row 11. But this does not happen. The whole table retrieval finishes at this point.

We believe that the best and correct result would be a timeout. That would be best, because then we know the table is incomplete. Second best would be a result from r0-rB (excl. columns A and B). What we actually get is an incomplete table without any indication of missing rows.

+ -------------------------+
| r0   0 1 2 3 4 5 6 7 8 9 | A B
| r1   0 1 2 3 4 5 6 7 8 9 | A B
| r2   0 1 2 3 4 5 6 7 8 9 | A B
| r3   0 1 2 3 4 5 6 7 8 9 | A B
| r4   0 1 2 3 4 5 6 7 8 9 | A B
| r5   0 1 2 3 4 5 6 7 8 9 | A B
| r6   0 1 2 3 4 5 6 7 8 9 | A B
| r7   0 1 2 3 4 5 6 7 8 9 | A B
| r8   0 1 2 3 4 5 6 7 8 9 | A B
| r9   0 1 2 3 4 5 6 7 8 9 | A B
+--------------------------+
  rA   0 1 2 3 4 5 6 7 8 9   A B
  rB   0 1 2 3 4 5 6 7 8 9   A B

Here are the variable bindings from the last row:

org.snmp4j.util.TableEvent[index=10,vbs=[
     1.3.6.1.2.1.2.2.1.2.10 = eth8,
     1.3.6.1.2.1.2.2.1.3.10 = 6,
     1.3.6.1.2.1.2.2.1.4.10 = 1500,
     1.3.6.1.2.1.2.2.1.5.10 = 0,
     1.3.6.1.2.1.2.2.1.6.10 = 00:50:56:b4:a1:de,
     1.3.6.1.2.1.2.2.1.7.10 = 2,
     1.3.6.1.2.1.2.2.1.8.10 = 2,
     1.3.6.1.2.1.2.2.1.9.10 = 0:00:00.00,
     1.3.6.1.2.1.2.2.1.10.10 = 0,
     1.3.6.1.2.1.2.2.1.14.10 = 0,
     null,
     null
],status=0,exception=null,report=null]

(Note that we were trying to retrieve columns 2, 3, 4, 5, 6, 7, 8, 9, 10, 14, 16 and 20.)

Thanks and best regards,
Steffen Brüntjen

Hi Steffen,

As always, I have first to ask which version of SNMP4J you are using?

Best regards,
Frank

Working with Java 8, we’re using SNMP4J 2.8.2.

OK, SNMP4J 3.3.3 has some more fixes than 2.8.0but for your issue the behaviour should be nearly the same.
From the source Coe,I do not see any chance of a timeout to not return a TableEvent row object without any values but status TableEvent.STATUS_TIMEOUT as last row returned to the caller of the getTable method. Have you checked that?

Steffen Brüntjen answered on March 4th, 2020:

Hi Frank

We updated to the newest release and for now we can’t reproduce the problem any more! I’ll get back when I get new insights.

Thank you very much
Steffen Brüntjen