SNMP over TLS for communication getBulk is slow/ no response

Hi
We are having SNMP over TLS communication with SNMP client for request and response (where as SNMP traps are v3).
The implementation is multi-threaded dispatcher as below with 3 as snmp4j thread pool for multi threaded dispatcher and our application code having 3 has pool size to send requests

The issue is unable to receive response at the SNMP manager application code although sent form SNMP client. This is sporadic for simple get operation and commonly seen for table get operations.
I also noticed SNMP_MP_UNKNOWN_MSGID (= -1409) in the logs

INFO: Message from 135.249.189.239/10161 not dispatched, reason: statusInfo=noError, status=-1409

Although I do see MPv3.prepareOutgoingMessage() had entered this in cache.

Java version: 1.8
SNMP4j version: 2.8.7
SnmpConfigurator.P_TLS_VERSION value set to TLSv1.2

Note: Same requests in SNMPv3 doesn’t have any issue.

Any help around this area is greatly appreciated.
Also is there a way to decrypt these packets in tools like wireshark, I tried using SSLKEYLOGFILE from Decrypt SSL with Wireshark - HTTPS Decryption: Step-by-Step Guide, but it decrypted only HTTP over TLS packets and not SNMP over TLS application data. Any way of decrypting this data using keystore/ trustore or keys used in them etc?

Thanks in advance for the support.

Regards,
Anjali

Unknown message ID either means, that there was a timeout before the response arrived or that the command responder fails to do the message ID handling correctly.

Most likely, of course, it is a timeout issue.

Please note that TLS has a tremendous overhead compared to SNMPv3 USM. So it is expected that the overall performance is much better with SNMPv3 USM, at least if connections cannot be kept open.
Decryption of the packets for TLSTM should be possible through standard TLS mechanism.

Hi Frank
After analysis by enabling logs and packet data, I see response was received within 30seconds well within timeout (7 min) but snmp4j was discarded with msg id not found.
Please find the captured and filtered logs from my analysis.

2021-12-02T13:08:45.982 Adding cache entry: StateReference[msgID=49106,pduHandle=PduHandle[2070222699],securityEngineID=,securityModel=org.snmp4j.security.TSM@bfb599e,securityName=,securityLevel=3,contextEngineID=00:00:1d:3b:00:00:00:a1:87:f9:bd:ef,contextName=,retryMsgIDs=null]
2021-12-02T13:09:45.893 Adding cache entry: StateReference**[msgID=49108,pduHandle=PduHandle[2070222699]**,securityEngineID=,securityModel=org.snmp4j.security.TSM@bfb599e,securityName=,securityLevel=3,contextEngineID=00:00:1d:3b:00:00:00:a1:87:f9:bd:ef,contextName=,retryMsgIDs=null]
2021-12-02T13:09:45.898 Adding previous message IDs [49106] to new entry StateReference[msgID=49108,pduHandle=PduHandle[2070222699],securityEngineID=,securityModel=org.snmp4j.security.TSM@bfb599e,securityName=,securityLevel=3,contextEngineID=00:00:1d:3b:00:00:00:a1:87:f9:bd:ef,contextName=,retryMsgIDs=null]
2021-12-02T13:09:45.909 SNMPv3 header decoded: msgId=49106, msgMaxSize=2900, msgFlags=03, secModel=4
Newly added log my me in if (e.getMessageIDs() != null) block:
MPv3.Cache.deleteEntry():: Removing cache entry for [49108, 49106]
2021-12-02T13:09:45.919 SNMPv3 header decoded: msgId=49108, msgMaxSize=2900, msgFlags=03, secModel=4
2021-12-02T13:09:45.919 RFC3412 §7.2.10 - Received PDU (msgID=49108) is a response or internal class message, but cached information for the msgID could not be found

Please note how for two different msgIds 49106,49108 same PDU handle is chosen and hence when response received for 49106, cache entry for both messages are discarded. Hence, discarding 49108 message ID response.

The above analysis is just my speculation of issue. The implentation is attached above with Multithreaded dispatcher of threadpool size 3. Please let me know in case of any information required.
Note: Whenever I have cache not found issue, it is falling into same pattern above where same pduhandle is present for 2 different message Ids.

Please make sure that the PduHandle is unique. Most likely there is some code that copies the requestID from a PDU that is being sent (or has been sent) by copying/cloning the PDU object. That is not a good idea in any case, because you could accidentally modified the PDU that is being sent (race condition!).

Because SNMP4J allows you to define and use your own request IDs, this will not be prevented by the API.
If you have do not have your own Request-ID generator, then please make sure that the requestID (=transactionID of the PduHandle) of each sent PDU is set to 0. If the requestID of a PDU is 0, then SNMP4J will generate a unique transactionID for the PduHandle.

Hi Frank,
We are not setting any request ID. Requesting for more information as we are using SNMP TLS newly.
Does engineID, contextEngineID matter in SNMP over TLS?
Our environment is where request, response is over TLS but traps are over SNMP v3. Any Ids can create conflict here?
Any information in this regard will be greatly helpful.

Thanks,
Anjali

If you never set the request ID and only use a single MessageDispatcher then you should be getting unique request IDs, because MessageDispatcherImpl.getNextRequedtID():

public synchronized int getNextRequestID() {
    int nextID = nextTransactionID++;
    if (nextID <= 0) {
        nextID = 1;
        nextTransactionID = 2;
    }
    return nextID;
}

will generate a lot of unique IDs until it hits a previous generated one.

So I assume, that there might be code, that clones a PDU after it has been sent and then resents that PDU with the internally assigned non-zero request ID.

Hi Frank,
After analysis looks like I noticed different message IDs and same transaction id (or Pdu handle) for same request but for retry. Can you please confirm if that is the behaviour and I probably mislead my analysis.

Also, we have exponential timeout model with timeout 60s and 3 retries hence, making retries after 1,2,4 minutes. Any recommended timeout model for TLS? I see getNext requests are processed slow although received at the packet layer.

Thanks,
Anjali

Retries are sent with same request ID (= transaction ID) but different msgID as required by the SNMP standard.
Using a retry model (that actually does retries) with TLS is probably overhead because TLS is connection oriented and does retries already on transport level.

Hi Frank,

After analysis and trying to figure out the reason, we are not able get how and why getBulk response is slow. Enabled all SNMP4j logs for analysis but unable to find the root cause. We need to decrypt the packets for confirmation if there is any issue with agent we are using or data received or processing at snmp4j. We tried https://timothybasanov.com/2016/05/26/java-pre-master-secret.html  and Decrypt SSL with Wireshark - HTTPS Decryption: Step-by-Step Guide to decrypt SNMP over TLS but not successful. 

Can you please share the pointers as how to decrypt SNMP over TLS or how else we can debug this issue.

Thanks for your support in advance.

Regards,
Anjali

I’m trying to help with the TLS decode in Wireshark.

I was able to follow the steps in Minimum C application for a snmp agent to build an agent (you should pin this post or something similar to the top of the snmp++ forum to make it easy to find)

Can the examples be modified to support TLS or DTLS? Is there a TLS example somewhere I missed?

If not, is there a canned Java example that supports TLS?

TLS and DTLS are not yet supported by SNMP++.

1 Like