Hi,
I am using snmp4j 2.8.7 with java 8, connection was successful and able to receive response, after some time (randomly), snmp4j logs running continously with “Key is writable” (16589 times in a second) and no reponse processed after that.
Any help here would be greatly appreciated
Any further information needed shall be shared to provide details on the issue.
I saw this with older Java runtimes too. I think that this is somehow an error in the SSLEngine or NIO selection key processing, because there are no “holes” in the key registration, the case, that there is a “key is writable” selection but no socket (SocketEntry) to actually write the data to is “logical” impossible.
The following code change in TLSTM.ServerThread.writeData tries to detect the misbehaviour and remove the key selection to avoid the busy loop. The inserted else begins with the comment “This should never happen...”:
@Override
protected TcpAddress writeData(SelectionKey sk, TcpAddress incomingAddress) {
SocketEntry entry = (SocketEntry) sk.attachment();
try {
SocketChannel sc = (SocketChannel) sk.channel();
incomingAddress = createIncomingAddress(sc.socket());
if ((entry != null) && (!entry.hasMessage())) {
synchronized (pending) {
pending.remove(entry);
entry.removeRegistration(selector, SelectionKey.OP_WRITE);
}
}
if (entry != null) {
writeMessage(entry, sc);
}
else { // This should never happen, but if it happens we need to cancel the key to avoid a busy loop!
sk.cancel();
logger.warn("Key was writable for incoming address "+incomingAddress+" but SocketEntry is null, key is canceled");
}
}
catch (IOException iox) {
logger.warn(iox);
// make sure channel is closed properly:
closeChannel(sk.channel());
removeSocketEntry(incomingAddress);
TransportStateEvent e =
new TransportStateEvent(TLSTM.this, incomingAddress,
TransportStateEvent.STATE_DISCONNECTED_REMOTELY, iox);
fireConnectionStateChanged(e);
}
return incomingAddress;
}
Please try this fix and let me know if the issue is solved then. The debug log just before the code is entering the busy loop would be of interest too. Maybe it provides a hint about the TLS session state when this issue happens. So far, I could not reproduce this with latest Java versions.
Hi Frank,
We are using
openjdk version “1.8.0_282”
OpenJDK Runtime Environment (build 1.8.0_282-b08)
The issue is not always seen. So, I will apply the changes indicated and continue monitoring. I will report back if issue is seen and gets captured in else block.
Thanks, Frank. I applied above changes for my test but I see Key is writable issue is still repeating.
Changes suggested above to handle that is also included and it is not captured in the else block as suggested.
Any insights around this area?
Observation: Once this issue occurs and process is restarted to overcome this, even if no sockets open, issue is repeating everytim. Only if I reboot my machine (machine, I mean container as process is deployed in container), issue is resolved.
Anyway linux/ os config could be contributing to this issue?
The new change is basically in TLSTM.ServerThread.writeMessage where runDelegatedTasks is executed even if wrap returns OK but handshake status is NEED_TASK:
if (result.getStatus() == SSLEngineResult.Status.OK) {
if (result.bytesProduced() > 0) {
writeNetBuffer(entry, sc);
}
else {
if (logger.isDebugEnabled()) {
logger.debug("SSL Engine status of wrap is ok, but 0 NET bytes produced, handshake status is "+result.getHandshakeStatus());
}
}
}
if ((result.getHandshakeStatus() == SSLEngineResult.HandshakeStatus.NEED_TASK) &&
runDelegatedTasks(result, entry)) {
logger.debug("SSL session OK");
}
OK, thanks for confirming that the issue is still present with the latest jdk1.8.0_321 build.
The source of the issue seems to be some weird state of the SSLEngine (status CLOSED but handshake status NEED_WRAP). I have im proved the SNMP4J 2.8.x code to cover this case too in TLSTM.ServerThread.runDelegatedTasks. The following new if-statement ensures that the socket is registered with the outQueue to write out-data:
case CLOSED:
if (status == SSLEngineResult.HandshakeStatus.NOT_HANDSHAKING) {
return false;
}
break;
Thank you @AGENTPP
I used the latest 2.8.9 jar mentioned and Status = CLOSED HandshakeStatus = NEED_WRAP is actually closing the session and firing the same event.
But we are seeing data recieved at socket but not processed at snmp4j many a times and hence causing SNMP transaction timeout. We are looking into the details to share but any suggestions around this area is greatly appreciated
Hi norrisjeremy,
Thanks for pointing us to this bug. This could be the root cause, indeed. As expected my fix in the 2.8.9 SNAPSHOT did not really solve it, because then instead looping in the read key (without data coming in) it looped on the write key without having any data to send. As a result, the SSLEngine is a dead lock situation. In SNMP4J 3.6.x I closed the session and the TCP connection in this case. This could solve the issue for SNMP4J 2.x too.
Best regards,
Frank
Thanks @norrisjeremy and @AGENTPP
Can you please share work around code or a snapshot jar with changes in 2.8.x for handling this until fix is released in Java 1.8 load.
This would help use test further on SNMP over TLS.
It would also really help if this could be backported to 2.8.x and delivered.