SNMP++ v3MP and USM Thread Safety

lincoln.turley · May 19, 2021, 3:10pm

I am currently maintaining a SNMP Manager application that uses SNMP++ (v3.4.2) that I am having some issues with. The SNMP Manager is currently set up to create a separate thread for each SNMP V3 agent that the manager application is configured to monitor. Each monitored-agent thread has its own Snmp object that the thread uses to send GET requests at regular intervals to verify that the agent is still up and functioning.

The original developer that wrote the SNMP v3 agent monitoring portion of the application placed a static thread guard that is shared by all SNMP v3 monitored-agent threads that allows only 1 thread to send a GET request at a time. The thread guard is not released until the agent responds to the GET request or a timeout occurs (timeouts are either 4 or 11 seconds). This was not a significant issue for us when the number of monitored SNMP v3 agents was small but we recently increased the number of SNMP v3 agents being monitored by more than double.

I think the original developer was attempting to protect the static instances of v3MP and USM classes referenced by the various monitored-agent threads.

Can you give any guidance as to whether or not it is thread safe to have multiple Snmp objects performing GET requests at the same time on separate threads (specifically in regards to the v3MP and USM classes)? Can I safely remove the static thread guard that is shared between all monitored SNMP v3 agent threads? Do I need to have the static thread guard in place only during engine id and USM discovery?

If the application architecture is completely undesirable, do you have any suggestions on how I might re-write the application? Should I have only a single Snmp object for the entire manager application and use asynchronous function calls?

Thanks in advance for the help and guidance.

jkatz · May 19, 2021, 9:54pm

Hi,

you wrote that all Snmp objects share the same single USM and v3MP object. Because of this, the answer depends on the code that is protected by the global lock guard: If there is a reconfiguration of the USM/v3MP (especially some code that removes non localized users), then you will have to rewrite this application code before you can remove the lock.

In general the best option would be to use a separate Snmp object to discover the engine id of an agent and then add the localized users to the USM. With this approach it is not a problem if the same user name is used for different agents with different passwords. You even can add and remove localized users while other requests are processed. Of course, it will result in errors (but not crashes) if you delete a localized user while waiting for a response that will need the deleted user.

To answer your questions:

It is thread safe to use multiple Snmp objects that share one USM/v3MP object.
With the restrictions mentioned above, you should be able to remove the global lock.
If you change the application code to only add localized users, the lock is not needed. If it can happen that two threads will do a discovery for two different agents with the same user name, then you should synchronize only these threads.

My approach would have been to start with a single Snmp object and use async requests. Optimizations would then be

Reduce the processing time of the callback function: Just add the received data to a queue and process it asynchronously.
Add more Snmp objects that share the existing single USM and v3MP object
Create a v3MP/USM object for each Snmp object. So a group of agents is monitored using one specific Snmp object.
Only as last resort if you are continuously requesting a really huge amount of data from each agent. One Snmp object per agent and each Snmp object with its own v3MP/USM.

Please note, that each Snmp object opens a socket which is a waste of resources if this is done for each agent.

In your case the removal of the global lock is the first step (that can be sufficient).

Kind regards,
Jochen

lincoln.turley · May 20, 2021, 3:17pm

Thanks for your reply.

Allow me to try and clarify some details. I think the original developer went with static v3MP and USM objects was due to the code in the v3MP constructor that sets the public static v3MP pointer with each construction of a new v3MP object
I = this;

This coupled with the fact that new instances of Snmp class have their mpv3 member variable initialized with the same static pointer:
mpv3(v3MP::I),

Looking at this more closely now there is nothing preventing me from modifying the SNMP manager from using multiple non-static instances of v3MP and USM objects and simply calling Snmp::set_mpv3 on each Snmp object.

As far as the code protected by the global lock guard, it contains the following
1. A single call to Snmp::get(Pdu &pdu, SnmpTarget &target);
a. The GET request is for a single OID that returns a very small amount of data and its main function is to verify that the agent is up and able to process SNMP requests.
b. As far as I understand, discovery of the engine ID and engine time occur the first time a GET request is sent to the agent after startup, as well as when the Engine ID is removed after a timeout occurs.
2. In the event an SNMP timeout occurs from the above GET request, the engine ID of the target agent will be removed by calling V3MP::remove_engine_id(const OctetStr &engine_id); so the engine ID and engine time can be rediscovered during the next successful GET request.
a. The reason for removing the engine id is that if/when the agent is rebooted, the agent will respond to the manager with a usmStatsNotInTimeWindows report. The SNMP manager application is not currently architected to process these reports and take appropriate action. I’m certain this is not the “right” way to do things but the budget and schedule didn’t allow for re-writing the application.

Based on the additional information on the code protected by the global lock guard and prior feedback:
1. Is it thread-safe to have multiple threads performing engine ID discovery at the same time on a shared v3MP/USM objects? What about engine ID removal?
2. Would it be advisable to simply create non-static instances of v3MP and USM objects for each SNMP object?
3. Are there any code examples that I could reference for how to perform asynchronous GET requests?

Thanks in advance

jkatz · May 20, 2021, 9:38pm

Hello,

from what you wrote I can assure that you can safely remove the global lock, as your application code guarantees the following two conditions:

There are not two threads that query the same agent == the same engine id at the same time.
The users that are added to USM with passwords (using USM::add_usm_user(security_name, auth_protocol, priv_protocol, auth_password, priv_password)) are not removed from USM by one thread while another thread is sending a request to another agent using the same user name.

Yes, this is safe if every thread is processing a different agent (== different engine id)
I would think that you will get enough speedup if you remove the global lock.
There are the examples in the directory consoleExamples, in this case the snmpNextAsync.cpp.

Side notes:

The static members “v3MP::I” are still there from the time where snmp++ only supported a single v3MP instance.
Your colleague must have seen several buggy agents and a very bad system time behaviour of the systems where the agents and the management application are running on The automatic time synchronization of SNMPv3 will only fail if the agent is resetting its engine boot and time values to lower values (buggy agent) or if between two requests, the time elapsed on the management system is 150 seconds greater than the elapsed time on the agent side.
There is a bug in USM::remove_engine_id(const OctetStr &engine_id): This function should call usm_user_table->delete_engine_id(engine_id) instead of usm_user_table->delete_entries(engine_id). → Will be fixed in next release

Kind regards,
Jochen