What are Polyglot Files?

Fixxx · Jun 14, 2024

What are polyglots and how do they work?

A polyglot file is a file that changes it's type depending on the context of use. To better understand the point we can take the following example - a JPEG image file that also contains an exploit for scanning computer information on Linux. And depending on how this polyglot is used it's type will also change, this is achieved by storing signature bytes with the help of which different systems are able to determine the file extension.

Some examples of signature bytes:

JPEG files - start with FF D8 and end with FF D9
PDF files - start with "%PDF" (in hex format - 25 50 44 46)

The main set of polyglots contains code written in a single programming language. They are most often used to bypass file extension-based security - a user is more likely to download a file with JPEG, PNG and PDF extensions than with potentially dangerous JS, SH and HTML. You can read more about polyglots here or here.

Principles of creating polyglots of different formats.

Since each file extension has a unique structure, there is a different way to inject exploits for each type.

JPEG format

When hiding exploits in JPEG the method used to hide exploits is to change the length value by the required number of bytes.

The primary structure of the file:

*example of original content

Once the value is changed, there is an additional location in the file where the exploit can be placed:

*example of modified content

PNG format

Due to the fact that the structure of a PNG file is a sequence of chunks, the method of adding a new chunk after the IHDR is mainly used to hide the necessary information.

The primary structure of the file:

*structure before modification

It's necessary to add a tEXt chunk allowing to write the information to be hidden in it, in our case JS + HTML exploit:

*structure after modification

PDF format

Different methods of editing strings are often used. Usually, each string is enclosed in parentheses, but an attacker can write it as a "column" or replace each character of the string with it's octal or hexadecimal representation and numbers can be separated by spaces an unlimited number of times. Also it's possible to "hide" JS exploits in the PDF using /JavaScript /JS objects which themselves may contain executable code or may reference another JS object.

In addition to the above methods for hiding exploits you can obfuscate the code with hex sequences. For example, /JavaScript can turn into /J#61#76#61Script (When converting from hex to text we get a = 61, v = 76, a = 61).

Conclusion.

Polyglots are an interesting information security threat concept. It's definitely capable of taking advantage of non-vigilant users, but it's likely that modern defenses will not allow it to run rampant.

What are Polyglot Files?

Fixxx