Impacted methods of org.snmp4j.util.TreeUtils class:
public List<TreeEvent> getSubtree(Target<?> target, OID rootOID)
public List<TreeEvent> walk(Target<?> target, OID[] rootOIDs)
Affected SNMP4j versions:
Observed in SNMP4j 3.5.1
Latest version 3.6.4 appears to be affected as well
Issue Description:
In some situations, org.snmp4j.util.TreeUtils$TreeRequest::onResponse invokes org.snmp4j.util.TreeUtils$InternalTreeListener::finished multiple times
The first invocation of finished(TreeEvent) will append an element to the listener’s collectedEvents list and notify the thread waiting for the walk to complete
In most cases, the resumed thread will start iterating over the java.util.LinkedList<TreeEvent> which is vended by InternalTreeListener::getCollectedEvents
A second invocation of finished(TreeEvent) will add another element to the result list while the list is potentially already being traversed
Results in ConcurrentModificationException thrown by the ListIterator class
Workarounds like iteration based on List::size / List::get, Deque::pop or List::of tend to be expensive
Solution options:
Do not invoke finished multiples times
No dot append append to result list if listener is already in finished state
Return a list list implementation whose iterator tolerates that elements are appended to the list while the list is being iterated
Minor observation:
The javadoc of org.snmp4j.util.TreeUtils.walk(Target<?> target, OID[] rootOIDs) wrongly states that the method is asyncronous
I have fixed this issue with your solution suggestion (2) in InternalTreeListener:
public synchronized void finished(TreeEvent event) {
if (!finished) {
collectedEvents.add(event);
finished = true;
}
notify();
}
The reason for this choice is simple: A duplicate call of the TreeListener.finished method is only possible for a correct (in sense of the SNMP standards) working agent, if a SNMPv1 agent returns noSuchName error. This is another bug and fixed by:
public <A extends Address> void onResponse(ResponseEvent<A> event) {
session.cancel(event.getRequest(), this);
PDU respPDU = event.getResponse();
if (respPDU == null) {
listener.finished(new TreeEvent(this, userObject,
RetrievalEvent.STATUS_TIMEOUT));
} else if (respPDU.getErrorStatus() != 0) {
if (target.getVersion() == SnmpConstants.version1 && respPDU.getErrorStatus() == PDU.noSuchName) {
listener.finished(new TreeEvent(this, userObject, new VariableBinding[0]));
}
else { // This else is fixing the duplicate finished call for noSuchName errors:
listener.finished(new TreeEvent(this, userObject, respPDU.getErrorStatus()));
}
} else if (respPDU.getType() == PDU.REPORT) {
listener.finished(new TreeEvent(this, userObject, respPDU));
} else {
....
}
}
Thus, double checking, that finished is never called twice is very simple and does not cost too much CPU cycles.
I recently removed the workaround which we put in place and upgraded to SNMP4J 3.7.7. We are still seeing this issue (very rarely). It happens when speaking to a device with Snmpv3 authNoPriv mode. Unfortunately it is hard to debug the concrete event.
When looking at the code, the only remaining cause could be that listener.next(TreeEvent) (which does not have the if (!finished) check in place) is invoked after listener.finished(TreeEvent) has been invoked.