Synchronous treeUtils.getSubtree() deadlock

Synchronous treeUtils.getSubtree(target, oid) method in snmp4j, seems to have bug in its internal synchronization implementation. In case of sending request resulting in exception e.g. timeout or credentials mismatch, the method hangs indefinitely on listener.wait().

Caused by logic in TreeUtils.walk(Target<?> target, OID[] rootOIDs): notify() method (invoked by TreeUtils.TreeRequest.send(), invoked by walk(target, rootOIDs, null, listener)) is called before wait().

Hi Matee,

Which version of SNMP4J are you using? At least for the latest version, your analysis does not seem to be true (notify is never called before wait because of synchronisation).

Best regards,
Frank

I am using snmp4j in version 3.4.0.

See example below, the async version of walk fails with exception, but the sync one hangs:

package com.example;

import org.snmp4j.*;
import org.snmp4j.mp.*;
import org.snmp4j.security.SecurityLevel;
import org.snmp4j.smi.*;
import org.snmp4j.transport.DefaultUdpTransportMapping;
import org.snmp4j.util.*;
import java.io.IOException;
import java.util.*;
import java.util.concurrent.CountDownLatch;

public class Main {

    static Snmp snmp() {
        try {
            MessageDispatcherImpl messageDispatcher = new MessageDispatcherImpl();
            messageDispatcher.addMessageProcessingModel(new MPv3());
            ThreadPool threadPool = ThreadPool.create("Thread pool", 2);
            Snmp manager = new Snmp(new MultiThreadedMessageDispatcher(threadPool, messageDispatcher), new DefaultUdpTransportMapping(new UdpAddress("0.0.0.0/8765")));
            manager.listen();
            return manager;
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }

    static UserTarget<Address> createTargetV3() {
        UserTarget<Address> target = new UserTarget<>();
        target.setTimeout(5000);
        target.setRetries(1);
        target.setAddress(GenericAddress.parse("udp:10.1.233.120/161"));
        target.setVersion(SnmpConstants.version3);
        target.setSecurityModel(org.snmp4j.security.SecurityModel.SECURITY_MODEL_USM);
        target.setSecurityLevel(SecurityLevel.NOAUTH_NOPRIV);
        target.setSecurityName(new OctetString("Test"));
        return target;
    }

    static List<TreeEvent> walk(TreeUtils treeUtils, Target<Address> targetV3, OID oid){
        List<TreeEvent> walkResult = new LinkedList<>();
        InternalTreeListener treeListener = new InternalTreeListener(walkResult);

        treeUtils.getSubtree(targetV3, oid, null, treeListener);
        try {
            treeListener.waitForResult();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
        return walkResult;
    }

    public static void main(String[] args) {

        TreeUtils treeUtils = new TreeUtils(snmp(), new DefaultPDUFactory());
        UserTarget<Address> targetV3 = createTargetV3();

        System.out.println("Async: ");
        System.out.println(walk(treeUtils, targetV3, new OID ("1.3.6.1.2.1.1")));
        System.out.println("finished");

        System.out.println("Sync: ");

        System.out.println(treeUtils.getSubtree(targetV3, new OID ("1.3.6.1.2.1.1")));
        System.out.println("finished");
    }

    static class InternalTreeListener implements TreeListener {

        private final List<TreeEvent> collectedEvents;
        private final CountDownLatch latch = new CountDownLatch(1);

        public InternalTreeListener(List<TreeEvent> eventList) {
            collectedEvents = eventList;
        }

        @Override
        public synchronized boolean next(TreeEvent event) {
            collectedEvents.add(event);
            return true;
        }

        @Override
        public synchronized void finished(TreeEvent event) {
            collectedEvents.add(event);
            latch.countDown();
        }

        @Override
        public boolean isFinished() {
            return latch.getCount() == 0;
        }

        void waitForResult() throws InterruptedException {
            latch.await();
        }
    }
}

Meanwhile I think I understand the problem better. The critical part is not after the first SNMP message has been sent by TreeUtils but before that. If in the preparation, an IOException is being thrown, the walk operation enters the wait even though the walk is already finished.

I have fixed that behaviour in the latest 3.4.3-SNAPSHOT available here:
https://snmp.app/dist/snapshot/org/snmp4j/snmp4j/3.4.3-SNAPSHOT/snmp4j-3.4.3-20200803.225225-2.jar

Hello Frank,

I am experiencing an issue that looks like the same: lots of my threads are stuck on org.snmp4j.util.TableUtils.getTable(TableUtils.java:127). I am using SNMP4J 3.2.1

“MyThread started at 2020-09-18 16:10:00-pool-29785-thread-13” #3382591 prio=5 os_prio=0 tid=0x00007f9b1c091800 nid=0x14f96 in Object.wait() [0x00007f98cff3b000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base@9.0.4/Native Method)
- waiting on
at java.lang.Object.wait(java.base@9.0.4/Object.java:516)
at org.snmp4j.util.TableUtils.getTable(TableUtils.java:127)
- waiting to re-lock in wait() <0x000000043a5f7580> (a org.snmp4j.util.TableUtils$InternalTableListener)

I am going to test the 3.4.3-SNAPSHOT version as well.

@AGENTPP ,Do you have the version when fix the bug in jdk8 ?I have meet the bug in 2.5.6
Fixed [SFJ-236]: TableUtils could enter endless loop if agent returns rows not within lexicographic order
and looping row hits first row in row cache. That could happen only if the agent returns a row again,
that was returned already before.

@kovacsz ,can I use setIgnoreMaxLexicographicRowOrderingErrors to fix this problem?

This issue has been fixed in SNMP4J 2.8.7 since 2021-05-09T22:00:00Z already.

@AGENTPP ,I have change the version to 2.8.7 from maven ,but lots of my threads are stuck on org.snmp4j.util.TableUtils.getTable(TableUtils.java:123).
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.snmp4j.util.TableUtils.getTable(TableUtils.java:123)
- locked <0x00000007ba4aaa80> (a org.snmp4j.util.TableUtils$InternalTableListener)

I change the version with 2.8.7 from maven reposity,but my threads are stuck on org.snmp4j.util.TableUtils.getTable(TableUtils.java:123).
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.snmp4j.util.TableUtils.getTable(TableUtils.java:123)
- locked <0x00000007ba4aaa80> (a org.snmp4j.util.TableUtils$InternalTableListener)

That is a stack trace from a normal waiting.

  • The question are why is there no callback to the finished(…) method of the InternalTableListener?
  • Do you have a stack from the Snmp’s TransportMapping listener thread? (Are you blocking that in some other response handler?)
  • Do you have DEBUG log output of the deadlock?

@AGENTPP ,Here is the way I call the query table method with some thread. And Some threads are wait forever.

Snmp snmp = SnmpComunication.getSession(snmpTarget.getSnmpVersion());
Target target = SnmpComunication.getSnmpTarget(snmp, snmpTarget);
BER.setCheckSequenceLength(true);
TableUtils tableUtils = new TableUtils(snmp, new DefaultPDUFactory(PDU.GET));
OID[] columns = new OID[oids.size()];
for (int i = 0; i < oids.size(); i++) {
columns[i] = new OID(oids.get(i));
}
tableUtils.setMaxNumColumnsPerPDU(1);
tableUtils.setMaxNumRowsPerPDU(1);
List events = tableUtils.getTable(target, columns, null, null);

As written before, you need to make sure that all ResponseListeners are returning immediately and do not dead lock somehow with each other.
BTW, maxColumnsPerPDU == 1 is very unusual. This setting should be used only, if the size of the PDU is too big for the agent to return (i.e. you receive a tooBig error). I recommend not setting this value of setting it to oids.size().

  1. I using the defualt listener called TableUtils.InternalTableListener, and from the stack log and do not see dead lock ,but most threads are wait like kovacsz say.Do the deault method in tableUtils.getTable(target, columns, null, null); support muti thread?
    2.I know both maxColumnsPerPDU and maxRowsPerPDU default value is 10. But I have meet the problem like this https://www.mail-archive.com/snmp4j@agentpp.org/msg03120.html, so I have set the value to make sure the result is correct. Do you have any good idea if I do not set the value and get a correct result?
  1. TableUtils.getTable(…) supports multi-threading.
  2. Set maxColumnsPerPDU to the number of columns you want to get.

Then you have to set a smaller timeout value on the target. You probably run more threads than they can time out because of a not responding, otherwise misbehaving, or simply overloaded agent.

From your answer from the other question,do the version in 2.8.7 have the parameter to set timeout?
Thank you.

The problem is, that the wait has to wait for an arbitrary number of subsequent requests. The maximum timeout to wait, cannot be derived from the Target.getTimeout() value. A separate timeout is necessary.
I will provide an optional timeout parameter for the next update to be able to control that limit if needed.

To avoid the wait, you could use the asynchronous getTable call too.

No, the timeout change has not been back-ported to 2.x. I do not think that it is really necessary, because more important is the timeout for each single request sent to the agent.

tableUtils.setMaxNumRowsPerPDU(1);//always set 1
I have try to set tableUtils.setMaxNumColumnsPerPDU(oids.size() );
but I get nothing in the query result, if I setMaxNumColumnsPerPDU(1),I can get the correct result

There is either a logic error in what you are trying to accomplish or the agent is broken.
I have too few information to help anymore. Sorry.