Product Ideas

metadata, hyperlinks, macros / ActiveX objects removal from MSOffice files

As a user, I would like to share my MSOffice files as PDF only, removing, during conversion, any sensitive metadata as well as any risky objects, including hyperlinks, macros and activex objects that are ole'd or otherwise embedded.

Runexy have stated that it is a high priority for Hitachi. Robin called it 'detoxification' but I think that is just translation. In the document management world I see it referred to as metadata cleaning.

Examples - https://www.workshare.com/use-cases/file-and-metadata-security and http://www.docscorp.com/products/cleandocs/metadata-removal-software/.

AFAIK GroupDocs does not advertise this capability.

Relates to https://accusoft-pm.ideas.aha.io/ideas/PCC-I-59

  • Brandon Mount
  • May 31 2016
  • Shipped
  • Attach files
  • Admin
    Mark Fears commented
    22 Jun, 2016 08:03pm

    Our engineering has investigated this request, and provided the following summary:

    1. We can extend PrizmDoc Office conversion service to allow the administrator to configure the service to always disable the action of external hyperlinks, which means that the result PDF will keep all text formatting for the hyperlink (it will still look like a hyperlink), but the user won't be able to click on it, b/c there will be no action information in the PDF (and there will be no action information in PrizmDoc HTML 5 viewer).

    2. Please also be aware that the following "detoxification" steps are already supported in the currently available version of PrizmDoc:

    • VBA Code and Macros are already disabled
    • OLE/ActiveX controls are already disabled
    • Interactive content like forms is already either being rasterized or converted to the paths when converted to PDF
    • MS Excel formulas "SYSTEM", "OSVERSION", "RELEASE" and "FILENAME" are disabled
    • MS Excel Scenarios are disabled
    • MS Word & MS Excel Comments are not shown
    • "FILENAME" field is disabled for formats: DOC, DOCX, ODT, ODP, ODG
    • All Word 2010+ hidden objects are not shown
    • MS PowerPoint hidden slides are not shown
    • External entities and DTD validation is disabled for XML-based Office formats

    3. Please also note the following 2 special cases:

    • MS Excel hidden sheets and cells are shown by default, but we provide central config parameter to keep them them hidden
    • Track Changes are not accepted or disabled in the current version
  • Admin
    Mark Fears commented
    14 Jun, 2016 02:21pm

    PrizmDoc supports most of this today with the exception of removing hyperlinks. Nick is responding to the Infor ticket with a detailed response.