Trojan Source

Abdun Nihaal
https://nihaal.me
11 December 2021 at ILUGC
Creative Commons License

Storytime !

Imagine

You are the maintainer of Reacto

Reacto powers a nearby nuclear reactor

Develpment through good old email

reacto.c

Someone sends you a patch

 1: From c18868a5183830c814b5ef9e02570800427a10fc Mon Sep 17 00:00:00 2001
 2: From: Dinesh <dinesh@xyz.com>
 3: Date: Fri, 10 Dec 2021 22:54:34 +0530
 4: Subject: [PATCH] Add quote in comments
 5: 
 6: ---
 7:  reacto.c | 1 +
 8:  1 file changed, 1 insertion(+)
 9: 
10: diff --git a/reacto.c b/reacto.c
11: index ac98b14..dd38309 100644
12: --- a/reacto.c
13: +++ b/reacto.c
14: @@ -37,6 +37,7 @@ void react() {
15:  void cool_down() {
16:     if (reactor_temperature > 200) {
17:         sleep(2);
18: +       /* Time is flying never to⁧/*/ return ;
19:         reactor_temperature -= (55 + random()%5);
20:     } else {
21:         sleep(1);

We definitely need more quotes in the source code

Patch Accepted

Next day

You get a call

“After the update, The reactor started overheating. 😨

Thankfully we stopped it in time. 😐“

What happened?

Unicode

  • Character encoding
  • Has multiple scripts, symbols and emojis
  • Supports Right-to-Left languages

Homoglyphs

  • Different characters that look alike
  • aka Confusables
  • Homoglyphs in ASCII

    L​I​NUX != L​l​NUX

  • Unicode gives us more options

    Eg: Latin and Cyrillic script

    He​ll​o != Не​ll​о

  • Used for Domain name spoofing

Invisible Characters

  • Zero Width Space character

    “Hi” != “H​i”

  • Bidirectional Control Characters

    Need for Right-to-Left languages

Bidirectional Control Characters

Abbreviations

  • LRO: Left to Right Override
  • RLO: Right to Left Override
  • PDF: Pop Directional Formatting
  • LRI: Left to Right Isolate
  • RLI: Right or Left Isolate
  • PDI: Pop Directional Isolate

Bidirectional Control Characters

  • Unicode Bidirectional Algorithm
  • Directional Overrides, Embeddings, Isolates
  • <LRO>, <RLO>
    • Override direction of text
    • Terminated by newline or <PDF>
  • <LRI>, <RLI>
    • Text within overrides with differnt direction
    • Terminated by newline or <PDI>

Examples

Text Appearance
Hi <RLO>!! World<PDF> Hi ‮ !! World‬
Hi <RLO>!! <LRI>World<PDI><PDF> Hi ‮ !! ⁦World⁩‬
Hi <RLO>!! <LRI>World Hi ‮ !! ⁦World
<RLO>}<PDF> ‮}‬

Malicious Usecase

  • Expectation

    ann‮⁦doc⁩.exe
    
  • Reality

    ann<RLO><LRI>doc<PDI>.exe
    

Hide extension names

Trojan Source

  • Discovered by researchers at University of Cambridge
  • Public disclosure: 01 November 2021
  • Tricks with Unicode

Idea

  • Sneak in Bidi control characters in
    • Comments
    • Strings
  • Code looks different to human and compiler
  • Supply chain attack (Bypass code review)
  • Affects almost every programming language

1. Homoglyph functions

int function() {
  return 10;
}

int functiоn() {
  return 20;
}

Homoglyph ’o’ in function name

2. Invisible functions

int function() {
  return 10;
}

int f​unction() {
  return 20;
}

Zero Width Space (<ZWS>) in function name

3. Early Returns

  • Expectation

    void cool_down() {
        sleep(2);
        /* Time is flying never to⁧/*/ return ;
        reactor_temperature -= (55 + random()%5);
    }
    
  • Reality

    void cool_down() {
        sleep(2);
        /* Time is flying never to<RLO>/*/ return ;
        reactor_temperature -= (55 + random()%5);
    }
    

4. Commenting out

  • Expectation

    /*‮ } ⁦if (isAdmin)⁩ ⁦ begin admins only */
        printf("You are an admin.\n");
    /* end admins only ‮ { ⁦  */
    
  • Reality

    /*<RLO> } <LRI>if (isAdmin)<PDI> <LRI> begin admins only */
        printf("You are an admin.\n");
    /* end admins only <RLO> { <LRI>*/
    
Example taken from the paper

5. Streched strings

  • Expectation

    char* access_level = "user";
    if (strcmp(access_level, "user‮ ⁦// Check if admin⁩ ⁦")) {
        printf("You are an admin.\n");
    }
    
  • Reality

    char* access_level = "user";
    if (strcmp(access_level, "user<RLO> <LRI>// Check if admin<PDI> <LRI>")) {
        printf("You are an admin.\n");
    }
    
Example taken from the paper

Problems

  • Homoglyphs
  • Invisible Zero Width Space
  • Unterminated bidirectional control characters

Defences

Lines of defence

  1. Awareness
  2. Compilers
  3. Editors
  4. Build Pipelines

Awareness

If you don’t know it’s possible, you’re easy to trick

Compilers

  • Warnings if homoglyphs or Bidi control characters present

    warning: identifier pair considered confusable between
    `say_hello` and `say_һello`
     --> homoglyph-function.rs:5:4
      |
    1 | fn say_hello() {
      |    --------- this is where the previous identifier occurred
    ...
    5 | fn say_һello() {
      |    ^^^^^^^^^
      |
    
  • Make programmers aware

    PEP 672 Unicode-related Security Considerations

Build pipelines

Either warn or abort build

  • Github trojan_source_github.png

Editors

  1. Don’t render Unicode (Vim)
  2. Warn users about tricky characters (Emacs)

Story (contd.)

Reactor working fine

You delete the comment and Push

Things are back to normal, now

Another day, Another patch

From 31b5edad4faa8be727d5b4cb71fbba8c7f00d77e Mon Sep 17 00:00:00 2001
From: Dave <dave@abc.com>
Date: Fri, 17 Dec 2021 21:33:23 +0530
Subject: [PATCH] Add Christmas greetings

---
 reacto.c | 2 +++
 1 file changed, 2 insertions(+)

diff --git a/reacto.c b/reacto.c
index ac98b14..0f3a1e0 100644
--- a/reacto.c
+++ b/reacto.c
@@ -59,7 +59,9 @@ int main() {
            /* If execution reaches this point, We're doomed anyway */
            self_destruct();
        }
+       /* xmas     ‮ ⁦printf("     Merry Christmas !!   ");⁩ ⁦*​/⁩
        cool_down();
+       /* new year ‮ ⁦printf(" And a Happy new year !!! ");⁩ ⁦*/
    }
    return 0;
 }

But, It’s Christmas time anyway.

Patch Accepted

Next day


What happened to the reactor?

  • Expectation

    /* xmas ‮ ⁦printf("Merry Christmas!!");⁩ ⁦*​/⁩
    cool_down();
    /* new year ‮ ⁦printf("And a Happy new year!!!");⁩ ⁦*/
    
  • Reality (Commented cooldown)

    /* xmas <RLO> <LRI>printf("Merry Christmas!!");<PDI> <LRI>*<ZWS>/<PDI>
    cool_down();
    /* new year <RLO> <LRI>printf("And a Happy new year!!!");<PDI> <LRI>*/
    

Thanks for listening, Questions?

References

Files used in presentation

References

References (contd.)