Wednesday, November 24, 2021

Efficient removal of Unicode hidden characters that backdoors your Javascript code

From the great post The Invisible JavaScript Backdoor – Certitude Blog and Invisible characters could be hiding backdoors in your JavaScript code  - invisible characters one could also introduce backdoors using Unicode characters that look very similar “Invisible Character Attacks” and “Homoglyph Attacks“. This technique has been around awhile using the have Unicode bidirectional mechanism (Bidi). As the article states, that messing with Unicode to hide vulnerable or malicious code is not a new idea (also using invisible characters) and Unicode inherently opens up additional possibilities to obfuscate code. We believe that these tricks are quite neat though, which is why we wanted to share them. In our experience non-ASCII characters are pretty rare in code. It might therefore be a good idea to disallow any non-ASCII characters. As article states, we mostly see non-ASCII characters being substituted with normalized ASCII characters (e.g. ä → ae, ß → ss) or removal them completely. But how ? 

My Clipboard PlainText Powertool provides easy text transformations for these substitutions for code de-obfuscation for Javascript (or any languages) to reveal  “Invisible Character Attacks” and “Homoglyph Attacks“. 

Here some transformations you can perform in 1-click; 

  1. Paste ANSI text (ISO-8859-1, Western languages), moins les caractères de contrôle && non imprimable
  2. Paste Unicode universal text (all languages), replacing all non-printable characters with ♦
  3. Paste Unicode universal text (all languages), striping all non-printable characters (most general)
  4. Paste plain ASCII text with normalized substitutions. eg. Æ ⇒ AE, ß⇒ss, è⇒e
  5. Paste plain ASCII text, striping bad control characters && formatting (most restrictive)
  6. Paste plain text ASCII,  extended range (Latin-1 Supplement) translated. eg. Ã⇒A(tilde)