Notes: Text processing and Regular expressions in Python

In this post, we will learn by example, text processing and regular expression basics in Python. We will also learn how to use inbuilt and external packages, to take command line arguments and to read data from files. Notes Execute python scripts like executable Normally, we can execute python scripts by running python script.py. If we want to execute python scripts like an executable on the terminal we can add the following line as the first line of the script.

Notes on Sed

Sed Stream EDitor, UNIX utility Based on ed (line oriented text editor) Used commonly for find and replace based on Regular expressions Useful for processing and transforming logs Can also be used inside vim for find and replace Basic usage cat file | sed 's/hello/world/' sed 's/hello/world/' file sed file -e 's/hello/world/' sed 's/hello/world/' -i file # Inline (will replace in the file) cat file | sed '/REGEX/d' # Delete lines matching a regular expression Regular expression syntax Language to represent string patterns Useful beyond sed (eg: grepping through source code) Basic regex: Specify characters to match ‘a’ matches character a [a-z] matches lowercase alphabets [a-zA-Z0-9] matches alphabets and numbers [abc] matches characters a b c [^abc] matches anything except characters a b c Specify count of characters to match * -> Zero or more instances \+ -> One or more instances ?

Kernel defences

Usually when an attacker exploits a vulnerability, the attack starts out as a Illegal memory access or Control flow hijack, which the attacker would use to write to sensitive memory locations or execute arbitrary code in supervisor mode, to try to increase privileges in the system. Illegal memory accesses are memory accesses which the programmer didn’t intend to happen, which allows attackers to read or write to some memory locations. Illegal memory accesses can be classified on three aspects:

Fixing syzbot bugs

Syzbot is an automated fuzzing infrastructure that uses Syzkaller to perform continuous fuzzing, primarily on the Linux kernel. Whenever it finds bugs, Syzbot reports it to the relevant mailing list. It also has a public dashboard where it lists all the open bugs that needs to be fixed. Syzbot is quite effective in finding bugs in the kernel but due to the large number of bugs being found, many of them don’t get fixed in time.

Coccinelle

Coccinelle is static analysis tool used for semantic pattern matching and automated transformation of C programs. It is written in OCaml. Unlike other pattern matching tools like grep which use regular expressions, Coccinelle understands C syntax and can find semantic code pattern in the source code and automatically transform them, irrespective of the name of identifiers, comments or formatting. Coccinelle is intraprocedural, i.e. all its matching and transformation happens within functions.

Finding bugs with Syzkaller

Syzkaller is an unsupervised, grammar based, coverage guided fuzzer used for fuzzing operating system kernels. It primarily performs system call fuzzing, but it can also be used for fuzzing USB and network packets. It is currently used for continuous fuzzing of Linux, Android and other BSD kernels. Automated: Syzkaller can automatically restart crashed virtual machines and also create a reproducer for the crashes. Coverage guided: Syzkaller gets coverage information using the KCOV infrastructure, which is built into the kernel.

How to do research?

Disclaimer: I’m no expert in this. This post is just to collect all my thoughts and lessons learnt from random talks and blogs, about research. What is research? Research is producing new knowledge. The aim of research is to do something novel (new) and useful. The purpose of literature survey is to ensure that our idea is new and has not been proposed before. And the purpose of evaluations is to show that our idea or technique is useful.

Kernel Sanitizers

When fuzzing a program by feeding random inputs to it, we need a mechanism to tell when the program is doing unexpected things. Sanitizers help detect bugs in the program at runtime. They are usually used along with fuzzing to detect bugs in programs. The two roles of sanitizers: Detect incorrect program behaviour: like accessing memory that the program is not supposed to access Report incorrect behaviour: To be useful, the sanitizer needs to report useful information (like the stack trace and ) that makes it easier to understand and fix the bug.

Linux kernel fuzzing

In this post, we’ll see how fuzzing is used for finding different types of bugs in the Linux kernel. This post consists of my notes taken from the talk by Andrey Konavalov about Linux fuzzing. Operating systems kernels are complex. Testing kernels is of prime importance since any vulnerability in the kernel can lead to compromising the whole system. Fuzzing is a dynamic program analysis technique, used to find bugs in software.

TLB;DR Reversing TLBs with TLB desynchronization

Yesterday, I read an interesting research paper about reverse engineering TLBs using TLB desynchronization. In this post, I’ll write briefly about the key ideas and what I found very interesting in the paper. You can find the paper here: TLB;DR: Enhancing TLB-based Attacks with TLB Desynchronized Reverse Engineering. TLB;DR Source code Reverse engineering CPU internals In the subfield of hardware security that focuses on communicating (covert channels) or leaking (side channels) critical information using timing or storage channels, accurate information about the CPU internals helps create more efficient and reliable channels.