-
I am trying to use SNMP4J-SMI-PRO to parse a signal containing special characters, but it’s returing an error with “[driver11.mib] [0050]: Lexical error at line 139, column 8. Encountered: “@” (64), after : “”“ , even though some characters are allowed(like “-“, “_“). I am confused about which special characters are unparseable, or where the part of the code is located
-
Furthermore, I found that it can parse files of UTF-8 and ANSI formats successfully, but fails to parse UTF-16 BE formats. Also, parsing fails if the file contains Chinese characters. Therefore, I would like to know which types can not be parsed successfully, and whether there are any special checks or processing measures in place. (The files have been uploader as attachments. Please manually remove the .txt extension.)
-
I look forward to your reply. Thanks.
@AGENTPP -
geist_v5 - utf16-be -tc18640.mib.txt (347.8 KB)
geist_v5-ANSI.mib.txt (173.9 KB)
The Structure of Management Information (see RFC 2578 - Structure of Management Information Version 2 (SMIv2) ) is defined on strings with 7-bit ASCII. Therefore if any UTF or ISO-xxxx character set is being used in a MIB file and the character being parsed cannot be mapped to 7-bit ASCII, SNMP4J-SMI-PRO will reject it, because it violates the standard.
In lenient mode it might accept some illegal characters as long as the parses can handle it.
Hope this helps.
Thank you for your previous explanation regarding SNMP4J-SMI-PRO’s strict adherence to the SMIv2 standard (7-bit ASCII). While I understand the foundational principle, I am facing practical issues with specific files and would appreciate further clarification on the tool’s implementation specifics to resolve them.
Below are my detailed questions:
- Specific Character Restrictions in Strict Mode
Beyond the “non-7-bit-ASCII” rule, does the tool’s parser maintain a more explicit list of disallowed characters—even within the 7-bit ASCII range—due to SMI syntax rules?
For example, characters like @, $, %, or ~might be rejected if they appear in object names or identifiers. Is the @error I encountered a known case? Could you provide or point me to a list of such characters that are explicitly rejected during lexical analysis?
- Supported File Encodings and Detection Logic
My tests show:
-
UTF-8 files (without BOM) containing Chinese characters fail to parse.
-
UTF-16 BE files (even with pure ASCII content) also fail.
Could you clarify:
-
How does the tool detect file encoding? Does it rely on Byte Order Marks (BOM), or use other heuristics?
-
Is there a list of explicitly supported encodings (e.g., UTF-8 without BOM, UTF-8 with BOM, etc.)?
-
For UTF-16 files containing only 7-bit ASCII characters, does the tool reject them solely due to encoding detection, or are they fundamentally unsupported?
Thank you for your time and reply.
As written before, every character in SMI must be 7-bit ASCII.
Object names must start with a lowercase letter and the may contain upper and lowercase characters, digits, and hyphen. Underscore is not allowed.
BOM information is always not allowed because it is not 7-bit.