TreeUtils.getSubtree is not able to get all varbindings under the subtree

rpradeep · May 3, 2023, 4:30pm

I have developed a simple snmpwalk tool using snmp4j library. I used TreeUtils to walk through the given OID.
my OID is somethinkg like “.1.3.6.1.4.1.2552.200.300” (say MYOID) and it has 4 tables in it (.MYOID.1.1,.MYOID.1.2, .MYOID.1.3 and .MYOID.2.1). when i run the application on “.1.3.6.1.4.1.2552.200.300” i m getting only the first 2 tables (.MYOID.1.1 and .MYOID.1.2) and after that i am getting the following request time out error and it stops there.

Running pending async request with handle PduHandle[1072476930] and retry count left 2
Sending message to 10.131.42.146/8001 with length 52: 30:32:02:01:01:04:06:70:75:62:6c:69:63:a5:25:02:04:3f:ec:b3:02:02:01:00:02:01:0a:30:17:30:15:06:11:2b:06:01:04:01:93:78:81:48:82:2c:01:02:01:16:81:5e:05:00
Running pending async request with handle PduHandle[1072476930] and retry count left 1
Sending message to 10.131.42.146/8001 with length 52: 30:32:02:01:01:04:06:70:75:62:6c:69:63:a5:25:02:04:3f:ec:b3:02:02:01:00:02:01:0a:30:17:30:15:06:11:2b:06:01:04:01:93:78:81:48:82:2c:01:02:01:16:81:5e:05:00
Running pending async request with handle PduHandle[1072476930] and retry count left 0
Sending message to 10.131.42.146/8001 with length 52: 30:32:02:01:01:04:06:70:75:62:6c:69:63:a5:25:02:04:3f:ec:b3:02:02:01:00:02:01:0a:30:17:30:15:06:11:2b:06:01:04:01:93:78:81:48:82:2c:01:02:01:16:81:5e:05:00
Request timed out: 1072476930

what is the issue here? why it is not getting all the varbindings? please help.

rpradeep · May 3, 2023, 6:09pm

Update:
i added few logs in my code and observed that in a single request, i could see 5 events. (events.size() in above code) out of which one is an error event. and for each valid event i see 10 varbindings. so in total i am getting only 40 varbindings. may be this is why i am not able to get the complete subtree (of .1.3.6.1.4.1.2552.200.300). But don’t know how and where to change and try.

rpradeep · May 3, 2023, 6:09pm

I have tried increasing retry count and request timeout values but no luck. i feel this is not exactly about request time out bust some thing else i am missing.
my program is simple.
TreeUtils treeUtils = new TreeUtils(snmp, new DefaultPDUFactory());
List events = treeUtils.getSubtree(target, new OID(tableOid));
for (TreeEvent event : events) {
if (event == null) {
continue;
}
if (event.isError()) {
System.out.println(“Error: table OID [” + tableOid + "] " + event.getErrorMessage());
continue;
}

        VariableBinding[] varBindings = event.getVariableBindings();
        if (varBindings == null || varBindings.length == 0) {
            continue;
        }

}

AGENTPP · May 3, 2023, 11:30pm

Please note that OIDs do not start with a dot (“.”). But that is not the cause of this issue. I guess it is a bug around lexicographic ordering within the agent.

rpradeep · May 4, 2023, 6:54am

Hi Frank,

please ignore . in my OID. i pasted just a part of the actual OID. But in general the issue is it is able to get only 5 events in a single getSubtree request of TreeUtils. I increased max repetitions to 20 from default 10 (TreeUtils.setMaxRepetitions()) and able to get more varbindings ( 5events * 20 repetitions=100).But this is not a solution as there might be many table rows and a getSubTree on the top OID should give me all the rows under the root.
If it is a limititaion/bug in the agent, is there any work around for this? or am i missing any configurations? please help

AGENTPP · May 4, 2023, 6:58am

I think there is a bug in the agent regarding finding the next successor of some of the OIDs returned by a previous GETNEXT or GETBULK request. You can check if SNMP4J detects it by checking the error code (if any). If there is a timeout, the agent hangs somewhere internally, i.e., (loops probably endless through its registered OIDs until an agent timeout for this thread/task occurs or the agent is restarted.

rpradeep · May 4, 2023, 1:46pm

Frank,

There is no specific error reported by the agent. it is just timed out message i see (i increased request timeout using CommunityTarget class but no luck).

what i observed is , in a single getSubtree response of TreeUtils i see only 5 events out of which one is always error event (says request timed out).
Aslo, i am not sure if it is a problem of finding the next successor because when I increased max repetitions to 20 from default 10 (TreeUtils.setMaxRepetitions()) i can see more varbindings ( 5 events * 20 repetitions=100). so looks like why the response event list is always 5 in size. if there is configuration to change this limit i can try it out.please let me know

// FYI, this is my program outline

TreeUtils treeUtils = new TreeUtils(snmp, new DefaultPDUFactory());
List events = treeUtils.getSubtree(target, new OID(tableOid));
for (TreeEvent event : events) {
if (event == null) {
continue;
}
if (event.isError()) {
System.out.println(“Error: table OID [” + tableOid + "] " + event.getErrorMessage());
continue;
}

    VariableBinding[] varBindings = event.getVariableBindings();
    if (varBindings == null || varBindings.length == 0) {
        continue;
    }

}

Logs from the snmp4j:

Running pending async request with handle PduHandle[569180709] and retry count left 1
Sending message to 10.131.42.146/8001 with length 51: 30:31:02:01:01:04:06:70:75:62:6c:69:63:a5:24:02:04:21:ed:02:25:02:01:00:02:01:0a:30:16:30:14:06:10:2b:06:01:04:01:93:78:81:48:82:2c:01:02:01:16:01:05:00
Running pending async request with handle PduHandle[569180709] and retry count left 0
Sending message to 10.131.42.146/8001 with length 51: 30:31:02:01:01:04:06:70:75:62:6c:69:63:a5:24:02:04:21:ed:02:25:02:01:00:02:01:0a:30:16:30:14:06:10:2b:06:01:04:01:93:78:81:48:82:2c:01:02:01:16:01:05:00
Request timed out: 569180709
Cancelling pending request with handle PduHandle[569180709]

AGENTPP · May 4, 2023, 8:43pm

There is no such limit. The agent is buggy.

rpradeep · May 8, 2023, 8:27am

Frank,
can we expect a fix/workaround for this?
BTW, i m using snmp4j-2.8.18, snmp4j-agent-2.7.9

AGENTPP · May 8, 2023, 9:41am

Are you using SNMP4J-Agent for the agent you are trying to get the subtree from?
What are the DEBUG logs there?
Are there exceptions on the agent?

I am pretty sure that SNMP4J-Agent would not be the source of the problem. So please do not expect any patch for that.

Are you using SNMPv3? Are you using different engine IDs for agent and manager? I am just asking for the most common errors…

rpradeep · May 8, 2023, 9:56pm

Frank,
Agent is not built with SNMP4J. Only the snmp walk tool (client) is built with SNMP4J.Also, if i use other client it is working fine.
Debug logs - i have only the client logs (which i already shared). there were no exceptions on the agent side (another vendor).
Both the agent and the client are using SNMP v2 only.

AGENTPP · May 8, 2023, 10:12pm

Ok, understood. I am still quite sure that the agent has a bug. Because the agent does not respond, the manager log cannot help to debug the issue.

If the walk is working with other tools, then the agent simply does not fail with the request pattern used by those tools.
That is a very common pattern because many agents are only tested against very simple request patterns. If the agent framework does not ensure lexicographic ordering, many errors could occur on such an agent if the instrumentation code does not implement the ordering correctly on each OID.

rpradeep · May 9, 2023, 9:52am

I shall try with an agent built with SNMP4J and let you know if that works fine.

rpradeep · May 9, 2023, 2:58pm

Frank,
Now i used both the manager (it is kind of simple client) and agent built with SNMP4J APIs. I still see the issue.
I have 4 tables under my root OID (1.3.6.1.4.1.2552.200.300)
1.3.6.1.4.1.2552.200.300.1.1 - 20 columns
1.3.6.1.4.1.2552.200.300.1.2 - 20 columns
1.3.6.1.4.1.2552.200.300.1.3 - 20 columns
1.3.6.1.4.1.2552.200.300.2.1 - 42 columns
and with default maxRepitions (treeUtils.setMaxRepetitions) , snmpwalk on1.3.6.1.4.1.2552.200.300 i am getting only 40 varbinds (the first 2 tables).There are no errors or exceptions reported in either agent or manager logs.
Only significant logs i see from manager/client are as follows:

Running pending async request with handle PduHandle[1072476930] and retry count left 1
Sending message to 10.131.42.146/8001 with length 52: 30:32:02:01:01:04:06:70:75:62:6c:69:63:a5:25:02:04:3f:ec:b3:02:02:01:00:02:01:0a:30:17:30:15:06:11:2b:06:01:04:01:93:78:81:48:82:2c:01:02:01:16:81:5e:05:00
Running pending async request with handle PduHandle[1072476930] and retry count left 0
Sending message to 10.131.42.146/8001 with length 52: 30:32:02:01:01:04:06:70:75:62:6c:69:63:a5:25:02:04:3f:ec:b3:02:02:01:00:02:01:0a:30:17:30:15:06:11:2b:06:01:04:01:93:78:81:48:82:2c:01:02:01:16:81:5e:05:00
Request timed out: 1072476930

So overall 5 TreeEvents are returned from TreeUtils.getSubTree() and the last one is always error but there are no errors/exceptions reported on Agent side.

but if i change the maxRepitions to 30 (treeUtils.setMaxRepetitions(30)), i can see all the varbinds of above 4 tables.
Not sure what else could be wrong on agent side? Is anything wrong in my program in using TreeUtils? (my code is posted in one of the post above).
why treeUtils is always returning max 5 events? why it is nore iterating through all the nodes?

AGENTPP · May 9, 2023, 3:30pm

There is still a timeout occurring as you can see in the log. So maybe you have a network issue or you are shutting down the agent, before the subtree walk has finished. I do not know what is going wrong at your side. Without information like the agent log, I cannot help.

rpradeep · May 11, 2023, 3:25pm

Frank,
I could not attach log file as the forum is accepting only image file. how can i attach a log file (txt)?
Also, as i was going my agent logs, i found an issue with an snmp walk command. Lets say my MIB structure is as shown below.

with the above MIB, an SNMP WALK command should actually iterate through all the OIDs above (10.1.1, 10.1.1.1, 10.1.1.2, 10.2.1, 10.2.1.1 and 10.2.1.2) is my understanding correct?
I added logs to print OID and LookUpResult type of each event and i see my agent after 10.1.1.2 (LokUpResult type as ATableEntry) is again trying to process GETNEXT on 10.1.1.2 but with LookUpResult type as BTableEntry. I feel it should be GETNEXT on 10.2.1 with a type as BTableEntry. My agent code is simple that i just added log in queryEvent (of MOServerLookupListener) to see the type and OID of the request.

Logs:
queryEvent OID:10.1, reqType:-91, LookupResult: ATableEntry
queryEvent OID:10.1.1.1, reqType:-91, LookupResult: ATableEntry
queryEvent OID:10.1.1.2, reqType:-91, LookupResult: ATableEntry
queryEvent OID:10.1.1.2, reqType:-91, LookupResult: BTableEntry ====> I am expecting this request to be OID:10.2.1, reqType:-91, LookupResult: BTableEntry

Frank, we are actually planning to move away from a old SNMP framework to SNMP4J and implemented the agent and SNMPWalkTool as per our understanding by following the manuals. But we are still facing these kind of issues and it is some times difficult to explain the issue in the forum. what is the procedure to get a support kinds of help (so that we can show/explain our implementation to see if the design is proper or not)?

AGENTPP · May 11, 2023, 4:16pm

You can order support online see: AGENT++/SNMP4J Support

There are several false assumption in your question above:

There is nothing like a “walk command” in SNMP. SNMP knows GETNEXT and GETBULK only.
Table OIDs are always “not-accessible”. Thus, they will never get retuned by GETNEXT/GETBULK operations.
Internal queries within SNMP4J-Agent are much more complex than the SNMP GETNEXT/GETBULK operations, because they need to deal with access control and inner agent structures. If an internal ManagedObject of the agent returns not matching OID for a query, the SNMP4J-Agent framework will query the next (potentially accessible) ManagedObject.
“ATableVB1” is not a variable binding, it is a column. To access the instances of a column, a row index (OID suffix) must be specified. That is not included in your description and sample data.

rpradeep · May 14, 2023, 8:42am

Frank,
understood and i am inline with your replies. some how i am not able to properly explain the issue here. would like to try it one more time in simple.
While we are moving from another snmp API provider to SNMP4J, we are facing 2 issue. one in the agent and one in another tool which is kind of manager (just walks through the OID Tree and display the data items).
Agent:
Our goal is to build an agent which will accepts GET/GETNEXT/GETBULK. We will have standalone process which can feed the data to the Managedobjects at runtime. As suggested by you in one of my earlier posts, we extended our MIB class from MOServerLookupListener and implemented queryEvent().
In queryEvent, using MOServerLookupEvent we are getting OID, SnmpRequest, reqType and ManagedObject type on which the LookUpEvent is fired (getLookupResult) we create a row, fill the data from our DB and add it to the corresponding MOTable. So that by the time getRow() is called the table will have the row object.

public void lookupEvent(MOServerLookupEvent event) {
if (event.getLookupResult()==tableMO1) {
myMOTable1.addRow();
}
if (event.getLookupResult() == tableMO12) {
myMOTable2.addRow();
}
if (event.getLookupResult() == tableMO3) {
myMOTable3.addRow();
}

Our MIB is having 3 ManagedObjects (of type table) under a rootOID. By using SNMPB, we are able to succefully walk (GETBULK) through these tables individually. But with a GETBULK on the rootOID what i observed (by looking at the requestType, LookUpResult and the OID of the current subrequest) the agent in the sequence of subrequests, after all the instances of firtst table, for the subrequest of second table lookUpEvent type is correct ut the OID is not of Second table. i would expect the usbrequest should be for the SecondTable with its corresponding OID. In the bove exapmle type of the MO should be tableMO2Entry and OID 2000.2.2. where as i see tableMO2Entry with OID as 200.1.2.2.index2.
because of this even though i know the type of the table object agent is looking for , i cannot create a rowEntry as the OID is not correct. finally the agent is responding back to the manager only with the rows from tableMO1.

the sequence is : lets say each table has 2 row instances.

queryEvent OID:2000.1 reqType:-91
LookUpEvent:tableMO1Entry, queryEvent OID:2000.1.1.1.index1 reqType:-91
LookUpEvent:tableMO1Entry, queryEvent OID:2000.1.1.1.index2 reqType:-91
LookUpEvent:tableMO1Entry, queryEvent OID:2000.1.1.2.index1 reqType:-91
LookUpEvent:tableMO1Entry, queryEvent OID:2000.1.1.2.index2 reqType:-91
LookUpEvent:tableMO2Entry, queryEvent OID:2000.1.1.2.index2 reqType:-91 → this OID should be 2000.2.1 ?
LookUpEvent:tableMO3Entry, queryEvent OID:2000.1.1.2.index2 reqType:-91 → this OID should be 2000.2.1 ?
i can handle this issue by simply ignoring the OID and creating a row (supposed to be the first row as per the index) into the subsequent tables but this may trigger other issues.

Issue with SNMP manager (built with SNMP4J):
I used TreeUtils and implemented a simple manager kind of application. here the issue is it is giving me partial tree data. with tools like like SNMPB or MG-SOFT i am able to walk through the root completely. some times it is TimedOut (but no errors/timeouts reported on the agent side) or some times i see following errors.

Initialized Salt to 62e2920e20b82185.
UDP receive buffer size for socket 0.0.0.0/0 is set to: 131072
Running pending async request with handle PduHandle[1372127675] and retry count left 2
Sending message to 10.131.176.121/8001 with length 46: 30:2c:02:01:01:04:06:70:75:62:6c:69:63:a5:1f:02:04:51:c9:01:bb:02:01:00:02:01:32:30:11:30:0f:06:0b:2b:06:01:04:01:93:78:81:48:82:2c:05:00
Running pending async request with handle PduHandle[1372127675] and retry count left 1
Sending message to 10.131.176.121/8001 with length 46: 30:2c:02:01:01:04:06:70:75:62:6c:69:63:a5:1f:02:04:51:c9:01:bb:02:01:00:02:01:32:30:11:30:0f:06:0b:2b:06:01:04:01:93:78:81:48:82:2c:05:00
Received message from /10.131.176.121/8001 with length 1446:
Looking up pending request with handle PduHandle[1372127675]
Cancelling pending request with handle PduHandle[1372127675]
Running pending async request with handle PduHandle[1372127677] and retry count left 2
Sending message to 10.131.176.121/8001 with length 51: 30:31:02:01:01:04:06:70:75:62:6c:69:63:a5:24:02:04:51:c9:01:bd:02:01:00:02:01:32:30:16:30:14:06:10:2b:06:01:04:01:93:78:81:48:82:2c:01:01:01:13:03:05:00
Received message from /10.131.176.121/8001 with length 1446:
Looking up pending request with handle PduHandle[1372127675]
Received response that cannot be matched to any outstanding request, address=10.131.176.121/8001, requestID=1372127675

BTW, how to upload text file into the post?

AGENTPP · May 14, 2023, 9:24am

That error indicates that there is either

a bug in the agent (request ID is not properly processed) or
a response sent by the agent, although another PDU type should have been used or
much more likely a late response from the agent is received.

The following statement confused me little bit:

The other tools might use different OID request patterns, so this is no proof for anything. But why are you posting a SNMP4J manager log when talking about other tools?

Regarding the MOServerLookupEvent, please take into account the MOScope.isLowerBoundIncluded() and MOScope.isUpperBoundIncluded() flags. Those are really important to make the lexicographic ordering working correctly.
Your assumption about what the starting OID of a GETNEXT query could/should be is therefore not correct. That needs to be fixed in the agent instrumentation code.