Product Ideas

Add a Function for Multi-Byte Character Encoding, similar to GetOneByteEncoding()

As a customer, I would like a function similar to ImGearPDEFont.GetOneByteEncoding(), but that will work with Multi-Byte Character Encoding.

 

A Basic Example:

 

The ImGearPDEText class may return a character with code 000 (a non-printable control character) but ImGearPDEFont.GetOneByteEncoding() allows me to look up how this font is being stored (which can vary every document) and map this to the correct character with code 065 or 'A'.

A very similar thing happens with multi-byte fonts. So inside one of my sample PDFs I can see that there is a lookup table for a multibyte font.  The table looks like

/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> <FFFF> endcodespacerange 2 beginbfchar <0003> <0020> <043E> <2212> endbfchar endcmap CMapName currentdict /CMap defineresource pop end end 

This defines a few things but the important thing is that it provides mappings between how each character is stored and what it's real character code is. (e.g the table above defines that code 043E maps to 2212 (a hyphen))

  • Guest
  • Mar 28 2018
  • Attach files
  • Julian Melville commented
    March 28, 2018 21:57

    This would be very good to have, as without it you can't consistently decode multi-byte text from documents.

  • +6