getTransportMappings() of MessageDispatcherImpl is not thread-safe

###Background

Due to traffic flow management issues with a certain type of edge firewalls, we have the need to periodically enforce a new source port being used by the org.snmp4.Snmp manager. To be able to continue using singleton instances of org.snmp4.Snmp and org.snmp4j.security.USM, we periodically execute the following steps:

  1. Init a new instance of the TransportMapping interface implementation (which has a concept of active and to-be-deleted)
  2. Remove transport mapping which are marked as to-be-deleted
  3. Mark the currently active transport mappings as to-be-deleted (no immediate removal to not impact running queries)
  4. Add a new active transport mapping

We also sub-classed MessageDispatcherImpl and changed the implementation of <A extends Address> TransportMapping<? super A> getTransport(A, TransportType) to only return active* transport mapping. This implementation is invoking getTransportMappings() and applies the particular filtering steps.

Problem

The getTransportMappings() method of the MessageDispatcherImpl clases uses synchronized blocks to access the receiverTransportMappings / senderTransportMappings instances. These are synchronized maps whose values are of type LinkedList.

The synchronized addTransportMapping(TransportMapping) and removeTransportMapping(TransportMapping) methods of MessageDispatcherImpl can modify the values of the syncronized maps (in particular the values of LinkedList) while another thread is currently iterating over the lists in getTransportMappings().

public Collection<TransportMapping<? extends Address>> getTransportMappings() {
        HashSet<TransportMapping<? extends Address>> l = new HashSet<>();
        synchronized (receiverTransportMappings) {
            for (List<TransportMapping<? extends Address>> tm : receiverTransportMappings.values()) { // ConcurrentModificationException
                l.addAll(tm);
            }
        }
        synchronized (senderTransportMappings) {
            for (List<TransportMapping<? extends Address>> tm : senderTransportMappings.values()) { // ConcurrentModificationException
                l.addAll(tm);
            }
        }
        return l;
    }

Thank you for reporting this issue!

In fact the method getTransportMappings is fine, but the private addTransportMapping and removeTransportMapping methods need to be fixed/adapted to changes that introduced sender and receiver specific transport mappings. Synchronisation on the receiver and sender specific transport mapping lists need to be done. The latest snapshot from Index of /dist/snapshot/org/snmp4j/snmp4j/3.7.8-SNAPSHOT contains the necessary fixes to MessageDispatcherImpl.

For reference, this is how I worked around the problem (in a subclass of MessageDispatcherImpl):

  • Avoid the use of list iterators in getTransportMappings() and getTransport(Address, TransportType)
    • Use ArrayList instead of LinkedList
    • Allows to iterate the underlying array based on indexes and size
  • Never reduce the size of the lists
    • Removed elements are set to null
    • null-elements are “filled” on subsequent addTransportMapping(TransportMapping) operations
    • The getTransportMappings() and getTransport(Address, TransportType) methods skip null values when iterating
  • Adjusted removeTransportMapping(TransportMapping) to be a synchronized method

This approach avoids exceptions, guarantees a correct synchronization of modifications and avoids synchronization of read-only list access (which is common case). The downside is that getTransportMappings() or getTransport(Address, TransportType) might

  1. Fail to return a transport mapping which has been added in parallel or
  2. Return a transport mapping which has been removed in parallel

For the initially described use case, these edge cases are tolerable. Modifications occur only rarely. Read access happens very frequently while its synchronization is only needed if modifications take place. Failing to return a newly added transport mapping is not a problem since there is another previously added transport which is still functional (it just got marked as to-be-deleted). A removed transport mapping is never going to be returned since it must have been marked as to-be-deleted long before (which causes it to be omitted from the results anyway).

I think, the snapshot is still not 100% fixing the issue. In the private addTransportMapping and removeTransportMapping methods, the list modification is now syncronized. However, in getTransportMappings(), the lists are iterated (via addAll) without synchronization.

The two mentioned private obtain locks on the value of the receiverTransportMappings / senderTransportMappings maps while public getTransportMappings() locks just on the maps (which is correct) but not on the values.

Thanks. I think with this change, the synchronized keyword could be removed from the public addTransportMapping method (the public removeTransportMapping method is not synchronized)

That had been done already in the snapshot :slightly_smiling_face:

Yes, I agree. I have fixed that in the getTransportMappings() method of the latest snapshot.