I am a snmp4j and snmp4j-agent user, currently I found a in ThreadPool, let’s talk about the background.
We built up a snmp agent process to handle outer snmp request, but customer report that out process will no response after 10 hours, the only way to recover the system is restart the process. there is no any logs can help debug this issue. after 2 months struggle we found the root cause, there has an Exception by our product code but it not will handled, the exception popup until to TaskManger, there is also no try…catch in TaskManager’s run method, so it caused the thread dead. The worst is ThreadPool cannot recreate a new TaskManager process to replace the dead thread, so when the exceptions raise, there reduce a live thread from ThreadPool until the pool is empty, after that snmp cannot response outer requests.
the first issue I found from ThreadPool.execute. line 93-96, if set respawnThreads = true, the pool will create a new TaskManger to replace the old one, but the issue is not call the start() method.
The second issue is in TaskManager.run, there should have a try…catch to surround task.run(), and print out the exceptions when there has something wrong. you cannot trust sub-processes are always right.
My JAR package info:
Thanks a lot,