The new change is basically in TLSTM.ServerThread.writeMessage where runDelegatedTasks is executed even if wrap returns OK but handshake status is NEED_TASK:
if (result.getStatus() == SSLEngineResult.Status.OK) {
if (result.bytesProduced() > 0) {
writeNetBuffer(entry, sc);
}
else {
if (logger.isDebugEnabled()) {
logger.debug("SSL Engine status of wrap is ok, but 0 NET bytes produced, handshake status is "+result.getHandshakeStatus());
}
}
}
if ((result.getHandshakeStatus() == SSLEngineResult.HandshakeStatus.NEED_TASK) &&
runDelegatedTasks(result, entry)) {
logger.debug("SSL session OK");
}
OK, thanks for confirming that the issue is still present with the latest jdk1.8.0_321 build.
The source of the issue seems to be some weird state of the SSLEngine (status CLOSED but handshake status NEED_WRAP). I have im proved the SNMP4J 2.8.x code to cover this case too in TLSTM.ServerThread.runDelegatedTasks. The following new if-statement ensures that the socket is registered with the outQueue to write out-data:
case CLOSED:
if (status == SSLEngineResult.HandshakeStatus.NOT_HANDSHAKING) {
return false;
}
break;
Thank you @AGENTPP
I used the latest 2.8.9 jar mentioned and Status = CLOSED HandshakeStatus = NEED_WRAP is actually closing the session and firing the same event.
But we are seeing data recieved at socket but not processed at snmp4j many a times and hence causing SNMP transaction timeout. We are looking into the details to share but any suggestions around this area is greatly appreciated
Hi norrisjeremy,
Thanks for pointing us to this bug. This could be the root cause, indeed. As expected my fix in the 2.8.9 SNAPSHOT did not really solve it, because then instead looping in the read key (without data coming in) it looped on the write key without having any data to send. As a result, the SSLEngine is a dead lock situation. In SNMP4J 3.6.x I closed the session and the TCP connection in this case. This could solve the issue for SNMP4J 2.x too.
Best regards,
Frank
Thanks @norrisjeremy and @AGENTPP
Can you please share work around code or a snapshot jar with changes in 2.8.x for handling this until fix is released in Java 1.8 load.
This would help use test further on SNMP over TLS.
It would also really help if this could be backported to 2.8.x and delivered.
I have changed the handling of the CLOSED SSL engine status once again in the latest 2.8.9 SNAPSHOT. Now the underlying TCP connection will be closed in any case to fully reset the connection.
Maybe that helps.
Of course, you can try newer JDKs too. The SSL engine was improved a lot from 8 to 17 - I am not sure if everything has been back ported to 8.
Hi @AGENTPP
We are communicating with Oracle to get the fix backported but we are unsure of the rootcause.
This issue was in JKS keystore and occationally seen.
In BCFKS keystore we are frequently seeing this issue when we are establishing connection itself. Took the thread dump and noticed a dead lock.
This seems to be a bug. Currently unclear why it does not occur on SNMP4J 3.x too, but that might be related to using not Hashtable but a Collections.synchronizedMap there for Snmp.pendingRequests.
I am going to provide a fix ASAP.
We don’t see the below exception when key is in writable loop .
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:48)
at java.base/sun.nio.ch.NioSocketImpl.tryRead(NioSocketImpl.java:261)
at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:312)
at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
at java.base/java.net.Socket$SocketInputStream.read(Socket.java:981)
at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478)
at java.base/sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:461)
at java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243)
at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:110)
at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1491)
at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1397)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:444)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:415)
SSLSocket.close() also reads from socket after JDK-8268965:
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:48)
at java.base/sun.nio.ch.NioSocketImpl.tryRead(NioSocketImpl.java:261)
at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:312)
at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
at java.base/java.net.Socket$SocketInputStream.read(Socket.java:981)
at java.base/java.io.InputStream.skip(InputStream.java:544)
at java.base/sun.security.ssl.SSLSocketInputRecord.deplete(SSLSocketInputRecord.java:504)
at java.base/sun.security.ssl.SSLSocketImpl.closeSocket(SSLSocketImpl.java:1762)
at java.base/sun.security.ssl.SSLSocketImpl.shutdown(SSLSocketImpl.java:1739)
at java.base/sun.security.ssl.SSLSocketImpl.bruteForceCloseInput(SSLSocketImpl.java:773)
at java.base/sun.security.ssl.SSLSocketImpl.duplexCloseOutput(SSLSocketImpl.java:638)
at java.base/sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:570)
I was just guessing that the JDK bug is related to the issue you observe. I have no proof for that, except that this behavior was not observed with recent JDKs and closing connections is part of your issue.
Does not SNMP4J 2.8.9 fix your problem? It has been released today.
Hi Frank,
We are thankful for the so far support.
We took java 11 openjdk-11.0.10.0.9-8 with latest snmp4j jar snmp4j-3.6.5-20220302.101842-5.jar
But we are seeing Key Is Writable issue post connections established as well.
Can you please share some pointers as how we can debug the issue on snmp4j side. As I had requested, how can we decrypt SNMP over TLS data? This might help us understand the cause. We have tried TLS decryption and was successful for HTTP over TLS but not SNMP. In your tests, how are you decrypting data for tests? Is there any other memory analyser tools or code used for decryption?
The agent side is confirming the response is sent. Java side issue pointed earlier also couldn’t be seen as we contacted Oracle. Looks like issue is irrespective of java version.