Best practice for multithreaded client using SNMP v3

I have a multithreaded client application which needs to connect to multiple SNMP devices simultaneously. My challenge is that these devices don’t have the same security configurations: different usernames, some even have the same users with different security configurations (auth & priv proto/keys)

Each thread has own Snmp() object for its unique IP. Initially I had a single v3MP object which the Snmp() object shared (via the global v3MP::I object). To avoid security clashes in the USM, I had to initiate engine ID discovery myself so I could add localized users … this works most of the time but occasionally things fail with “USM: Not in time window”. My theory is something is confused about which USM time table entry is needs to be updated?

To avoid this, I’m now experimenting with having a v3MP per Snmp() which seems to avoid the “Not in time window” issue but now some of the devices fail with a “General error” when making request concurrently. If I repeat the failed devices individually they work fine, indicating some kind of concurrency issue :frowning:

I do have some concerns about the v3MP::I global in this scenario. For the first Snmp() I create the mpv3 member will be nullptr, the next one will initially get the pointer to the first one and so on, somewhat randomly. I don’t think this is an issue as I call set_mpv3() as soon as I’ve created the Snmp() object

Am I doing something fundamentally wrong here? Just looking for any advice on how I should be doing this so I don’t spend too much time down a rabbit hole

If you want to use multiple USMs and multiple MPv3 instances, then you should completely ignore there static ::I instances. Latest SNMP++ versions are not using them internally anymore.
If you get “not in time window” errors when using a single USM, then this “single USM” approach is not the reason for it. Most likely, there are same security names used with different keys and same or even different engine IDs.

The general error might be a problem with the agents which might not be capable of processing concurrent requests correctly.

Thanks for the confirmation that I am kind of on the right track and that using multiple v3MPs will avoid any clashes in security names/keys or engine IDs

I think the general error might be related to how I am testing this at scale, as its using a simulator which is quite old and slow

Further to this, I’ve confirmed the general error was just the simulator being rubbish. However the latest issue is that I am getting random crashes on Windows (I’ve not been able to reproduce it on Linux at this point).

Looking at the crash dumps its always in v3MP::Cache::delete_entry, for example:

000000de`37c7c250 00007ff8`c91243b0     : 00000000`000048d6 00000000`000048d6 00000000`00000001 000002aa`627e18a8 : ntdll!RtlpEnterCriticalSectionContended+0xdc
000000de`37c7c280 00007ff8`afa30472     : 000002aa`62477bd0 00000001`00000000 0000c125`e957f475 000000de`37c7c890 : ntdll!RtlEnterCriticalSection+0x40
000000de`37c7c2b0 00007ff8`afa33cd9     : 000002aa`627e18a8 00000000`000048d6 000000de`37c7c410 00000000`fffffffb : snmp__!v3MP::Cache::delete_entry+0x32
000000de`37c7c2e0 00007ff8`afa2c4f9     : 00000000`67bc700b 00000000`0168031f 000002aa`62477d30 00000000`00000000 : snmp__!CSNMPMessageQueue::DoRetries+0x139
000000de`37c7c390 00007ff8`afa2ce74     : 000002aa`4fa3e4d0 00000000`00000000 000000de`37c7c4c0 000002aa`4fa3e4c0 : snmp__!CEventList::DoRetries+0x49
000000de`37c7c3c0 00007ff8`afa2c8cd     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`000048d6 : snmp__!EventListHolder::SNMPProcessEvents+0x1e4
000000de`37c7d100 00007ff8`afa48daa     : 00000000`00000000 000000de`37c7d230 000000de`37c7e880 000000de`37c7e880 : snmp__!EventListHolder::SNMPBlockForResponse+0x2d
000000de`37c7d130 00007ff8`afa46fd8     : 00000000`00000000 00000000`00000000 000002aa`58e487c0 000002aa`58e487c0 : snmp__!Snmp::snmp_engine+0x89a
000000de`37c7e7e0 00007ff8`ac4928cd     : 000000de`37c7e880 00000000`00000000 000002aa`58e487c0 000000de`37c7ead0 : snmp__!Snmp::get+0x38

I don’t always see the EnterCriticalSection part of the stack, sometimes its LeaveCriticalSection or its not there at all. My guess is something has corrupted the Cache memory block? I am guessing this is a new problem, not something anyone has seen before?

The v3MP::Cache class has a mutex and if the functions to lock/unlock are crashing, most likely the memory has been corrupted or has been already freed. Is it possible that you delete a v3MP object while there are still outstanding requests?

I don’t think so - any requests made via a given Snmp session and its related v3MP should have completed or timed out before the objects are deleted.

Then you will have to take a look at the pointers with a debugger.
The stack trace you posted indicates that at this time a request has timed out: CSNMPMessageQueue::DoRetries calls v3MP::Cache::delete_entry right after a message has timed out. The snmp++ code calls the application callback before calling delete_entry, so if the callback immediately triggers deletion of the v3MP, then this crash could happen.

I found the issue: it was a race condition with the poll thread. I’ve got my own SNMP Session classes which wrap the Snmp object. The V3 sessions are a subclass which manage the v3MP.

I figured out that given certain kinds of failure we’d give up on the agent and delete our Session object. For V3 sessions, this deletes the v3MP, then the superclass destructor runs, deleting the Snmp object - its Snmp::~Snmp() which stops the polling thread. If you are unlucky and the response you gave up on arrives in the window between the v3MP being deleted and the thread being stopped, it will crash.

My fix is to explicitly call Snmp::stop_poll_thread in the V3 session destructor before deleting the v3MP.

Thanks for the help on these issues. It seems to be working well now and the code is much improved over our previous approach.