SNMP4j over TLS stuck at "Key is writable"

Hi,
I am using snmp4j 2.8.7 with java 8, connection was successful and able to receive response, after some time (randomly), snmp4j logs running continously with “Key is writable” (16589 times in a second) and no reponse processed after that.

Any help here would be greatly appreciated :slight_smile:
Any further information needed shall be shared to provide details on the issue.

Thanks,
Anjali

Hi Anjali,

I saw this with older Java runtimes too. I think that this is somehow an error in the SSLEngine or NIO selection key processing, because there are no “holes” in the key registration, the case, that there is a “key is writable” selection but no socket (SocketEntry) to actually write the data to is “logical” impossible.

The following code change in TLSTM.ServerThread.writeData tries to detect the misbehaviour and remove the key selection to avoid the busy loop. The inserted else begins with the comment “This should never happen...”:

    @Override
    protected TcpAddress writeData(SelectionKey sk, TcpAddress incomingAddress) {
        SocketEntry entry = (SocketEntry) sk.attachment();
        try {
            SocketChannel sc = (SocketChannel) sk.channel();
            incomingAddress = createIncomingAddress(sc.socket());
            if ((entry != null) && (!entry.hasMessage())) {
                synchronized (pending) {
                    pending.remove(entry);
                    entry.removeRegistration(selector, SelectionKey.OP_WRITE);
                }
            }
            if (entry != null) {
                writeMessage(entry, sc);
            }
            else { // This should never happen, but if it happens we need to cancel the key to avoid a busy loop!
                sk.cancel();
                logger.warn("Key was writable for incoming address "+incomingAddress+" but SocketEntry is null, key is canceled");
            }
        }
        catch (IOException iox) {
            logger.warn(iox);
            // make sure channel is closed properly:
            closeChannel(sk.channel());
            removeSocketEntry(incomingAddress);
            TransportStateEvent e =
                    new TransportStateEvent(TLSTM.this, incomingAddress,
                            TransportStateEvent.STATE_DISCONNECTED_REMOTELY, iox);
            fireConnectionStateChanged(e);
         }
        return incomingAddress;
    }

Please try this fix and let me know if the issue is solved then. The debug log just before the code is entering the busy loop would be of interest too. Maybe it provides a hint about the TLS session state when this issue happens. So far, I could not reproduce this with latest Java versions.

Which Java 8 version are you using exactly?

Best regards
Frank

Hi Frank,
We are using
openjdk version “1.8.0_282”
OpenJDK Runtime Environment (build 1.8.0_282-b08)

The issue is not always seen. So, I will apply the changes indicated and continue monitoring. I will report back if issue is seen and gets captured in else block.

Thanks a ton for the quick responses.

Regards,
Anjali

Hi Frank,
While testing now I faced loop running in Key is readable

This is similar issue but with reading from buffer? Any insights here would be helpful.

Thanks,
Anjali

Hi Anjali,

OK, I anticipated that a bit and will provide a workaround for it. Maybe my past observations were related to this report:

This bug report gives a hint on what might be going wrong here: https://bugs.openjdk.java.net/browse/JDK-8220703

I will add a comment here any updates. I think the underlying problem of the TLS half-close is probably fixed in later Java 8 builds, for example the bug fixes for update 291 look promising:
https://www.oracle.com/java/technologies/javase/8u291-bugfixes.html

Nonetheless, I will provide a workaround and security measures in the code to prevent the read key is ready busy loop too.

Hi Frank,
We have been seeing issue once in a while now, it would be of great help if a work around can be given which we can test as a patch :slight_smile:

Thanks,
Anjali

I think it is enough to change the TLSTM.runDelegatedTasks method’s switch statement on the case “CLOSED” to:

            case CLOSED:
                entry.removeRegistration(selector, SelectionKey.OP_READ);
                entry.removeRegistration(selector, SelectionKey.OP_WRITE);
                entry.sslEngine.closeOutbound();
                entry.sslEngine.closeInbound();
                closeChannelAndRemoveSocket(entry, null);
                return false;

to solve the “read key is ready” busy loop.

A new SNMP4J 3.6.4-SNAPSHOT is available now with this fix included.

Thanks, Frank. I applied above changes for my test but I see Key is writable issue is still repeating. :frowning:
Changes suggested above to handle that is also included and it is not captured in the else block as suggested.

Any insights around this area?
Observation: Once this issue occurs and process is restarted to overcome this, even if no sockets open, issue is repeating everytim. Only if I reboot my machine (machine, I mean container as process is deployed in container), issue is resolved.
Anyway linux/ os config could be contributing to this issue?

Ok, that kind of busy loop case wasn’t handled by the last change.
Are you using meanwhile the latest JRE 8?

The latest SNMP4J 3.6.4-SNAPSHOT contains a probably fix for busy “key is writable” loop described above. That issue could occur when the TLS session needs to process a task but returns OK on wrap:
https://snmp.app/dist/snapshot/org/snmp4j/snmp4j/3.6.4-SNAPSHOT/snmp4j-3.6.4-20211213.015414-3-distribution.tar.gz

The new change is basically in TLSTM.ServerThread.writeMessage where runDelegatedTasks is executed even if wrap returns OK but handshake status is NEED_TASK:

                if (result.getStatus() == SSLEngineResult.Status.OK) {
                    if (result.bytesProduced() > 0) {
                        writeNetBuffer(entry, sc);
                    }
                    else {
                        if (logger.isDebugEnabled()) {
                            logger.debug("SSL Engine status of wrap is ok, but 0 NET bytes produced, handshake status is "+result.getHandshakeStatus());
                        }
                    }
                }
                if ((result.getHandshakeStatus() == SSLEngineResult.HandshakeStatus.NEED_TASK) &&
                        runDelegatedTasks(result, entry)) {
                    logger.debug("SSL session OK");
                }

Hi @AGENTPP
We started using SNMP 2.8.8 and still notice that key is readable is stuck in loop. Please find snmp4j logs below.


After these logs, last 4 lines are continously repeated.

Any suggestion on what could be causing this?

Thanks,
Anjali

What jdk version are you using now?

Tested with latest available java jdk1.8.0_321, but issue is still seen

Same as stated earlier. Do you recommend to try with version >= jdk 1.8.0_291 ?

OK, thanks for confirming that the issue is still present with the latest jdk1.8.0_321 build.
The source of the issue seems to be some weird state of the SSLEngine (status CLOSED but handshake status NEED_WRAP). I have im proved the SNMP4J 2.8.x code to cover this case too in TLSTM.ServerThread.runDelegatedTasks. The following new if-statement ensures that the socket is registered with the outQueue to write out-data:

    case CLOSED:
      if (status == SSLEngineResult.HandshakeStatus.NOT_HANDSHAKING) {
        return false;
      }
      break;

Please try this latest 2.8.9-SNPSHOT from today https://snmp.app/dist/snapshot/org/snmp4j/snmp4j/2.8.9-SNAPSHOT/snmp4j-2.8.9-20220215.112730-2.jar if it fixes the issue. I am not sure, because the SSLEngine does not seem to produce any out-data in your case, which is unexpected (at least from my understanding).

Thank you @AGENTPP
I used the latest 2.8.9 jar mentioned and Status = CLOSED HandshakeStatus = NEED_WRAP is actually closing the session and firing the same event.

But we are seeing data recieved at socket but not processed at snmp4j many a times and hence causing SNMP transaction timeout. We are looking into the details to share but any suggestions around this area is greatly appreciated :slight_smile:

Thanks,
Anjali

Stuck in Key is writable loop now :frowning:
Our agent side idle connection timeout is 5min. Once reconnected I see below issue some times

Any other info needed shall be shared.

Is there any chance you may be encountering 8274524?

Hi norrisjeremy,
Thanks for pointing us to this bug. This could be the root cause, indeed. As expected my fix in the 2.8.9 SNAPSHOT did not really solve it, because then instead looping in the read key (without data coming in) it looped on the write key without having any data to send. As a result, the SSLEngine is a dead lock situation. In SNMP4J 3.6.x I closed the session and the TCP connection in this case. This could solve the issue for SNMP4J 2.x too.
Best regards,
Frank

Thanks @norrisjeremy and @AGENTPP
Can you please share work around code or a snapshot jar with changes in 2.8.x for handling this until fix is released in Java 1.8 load.
This would help use test further on SNMP over TLS.

It would also really help if this could be backported to 2.8.x and delivered.