I have a multithreaded client application which needs to connect to multiple SNMP devices simultaneously. My challenge is that these devices don’t have the same security configurations: different usernames, some even have the same users with different security configurations (auth & priv proto/keys)
Each thread has own Snmp() object for its unique IP. Initially I had a single v3MP object which the Snmp() object shared (via the global v3MP::I object). To avoid security clashes in the USM, I had to initiate engine ID discovery myself so I could add localized users … this works most of the time but occasionally things fail with “USM: Not in time window”. My theory is something is confused about which USM time table entry is needs to be updated?
To avoid this, I’m now experimenting with having a v3MP per Snmp() which seems to avoid the “Not in time window” issue but now some of the devices fail with a “General error” when making request concurrently. If I repeat the failed devices individually they work fine, indicating some kind of concurrency issue
I do have some concerns about the v3MP::I global in this scenario. For the first Snmp() I create the mpv3 member will be nullptr, the next one will initially get the pointer to the first one and so on, somewhat randomly. I don’t think this is an issue as I call set_mpv3() as soon as I’ve created the Snmp() object
Am I doing something fundamentally wrong here? Just looking for any advice on how I should be doing this so I don’t spend too much time down a rabbit hole
If you want to use multiple USMs and multiple MPv3 instances, then you should completely ignore there static ::I instances. Latest SNMP++ versions are not using them internally anymore.
If you get “not in time window” errors when using a single USM, then this “single USM” approach is not the reason for it. Most likely, there are same security names used with different keys and same or even different engine IDs.
The general error might be a problem with the agents which might not be capable of processing concurrent requests correctly.
Thanks for the confirmation that I am kind of on the right track and that using multiple v3MPs will avoid any clashes in security names/keys or engine IDs
I think the general error might be related to how I am testing this at scale, as its using a simulator which is quite old and slow
Further to this, I’ve confirmed the general error was just the simulator being rubbish. However the latest issue is that I am getting random crashes on Windows (I’ve not been able to reproduce it on Linux at this point).
Looking at the crash dumps its always in v3MP::Cache::delete_entry, for example:
I don’t always see the EnterCriticalSection part of the stack, sometimes its LeaveCriticalSection or its not there at all. My guess is something has corrupted the Cache memory block? I am guessing this is a new problem, not something anyone has seen before?
The v3MP::Cache class has a mutex and if the functions to lock/unlock are crashing, most likely the memory has been corrupted or has been already freed. Is it possible that you delete a v3MP object while there are still outstanding requests?
Then you will have to take a look at the pointers with a debugger.
The stack trace you posted indicates that at this time a request has timed out: CSNMPMessageQueue::DoRetries calls v3MP::Cache::delete_entry right after a message has timed out. The snmp++ code calls the application callback before calling delete_entry, so if the callback immediately triggers deletion of the v3MP, then this crash could happen.
I found the issue: it was a race condition with the poll thread. I’ve got my own SNMP Session classes which wrap the Snmp object. The V3 sessions are a subclass which manage the v3MP.
I figured out that given certain kinds of failure we’d give up on the agent and delete our Session object. For V3 sessions, this deletes the v3MP, then the superclass destructor runs, deleting the Snmp object - its Snmp::~Snmp() which stops the polling thread. If you are unlucky and the response you gave up on arrives in the window between the v3MP being deleted and the thread being stopped, it will crash.
My fix is to explicitly call Snmp::stop_poll_thread in the V3 session destructor before deleting the v3MP.
Thanks for the help on these issues. It seems to be working well now and the code is much improved over our previous approach.