Notes, musings and what not
Notes: Text processing and Regular expressions in Python
In this post, we will learn by example, text processing and regular expression basics in Python. We will also learn how to use inbuilt and external packages, to take command line arguments and to read data from files.
Notes
Execute python scripts like executable
Normally, we can execute python scripts by running python script.py
. If we want to execute python scripts like an executable on the terminal we can add the following line as the first line of the script.
Notes on Sed
Sed
- Stream EDitor, UNIX utility
- Based on ed (line oriented text editor)
- Used commonly for find and replace based on Regular expressions
- Useful for processing and transforming logs
- Can also be used inside vim for find and replace
Basic usage
cat file | sed 's/hello/world/'
sed 's/hello/world/' file
sed file -e 's/hello/world/'
sed 's/hello/world/' -i file # Inline (will replace in the file)
cat file | sed '/REGEX/d' # Delete lines matching a regular expression
Regular expression syntax
- Language to represent string patterns
- Useful beyond sed (eg: grepping through source code)
- Basic regex:
- Specify characters to match
- ‘a’ matches character a
- [a-z] matches lowercase alphabets
- [a-zA-Z0-9] matches alphabets and numbers
- [abc] matches characters a b c
- [^abc] matches anything except characters a b c
- Specify count of characters to match
- * -> Zero or more instances
- \+ -> One or more instances
- ? -> Zero or one instances
- {8} -> Matches 8 instances
- {1,3} -> Matches 1 to 3 instances
- {3,} -> Matches 3 or more instances
- Special characters
- . -> Matches any character
- ^ -> Matches the beginning of the line
- $ -> Matches the end of the line
- Combining regex
- ‘regex1regex2’ matches regex1 first then regex2 (Concatenation)
- ‘regex1\|regex2’ matches either regex1 or regex2 (Choice)
- Backreferences
- Used for selecting a part of the matched string to be used for transforming text
- Enclose parts of the regex with \( .. \) parenthesis
- Use the corresponding matches in replacement using \1, \2, etc to match the first, second backreference respectively.
Sed command syntax
PATTERN_SPACE { commands; ...}
Pattern space
- Selects (filters) the lines that we want to process
- Can be specified as line number ranges or regular expressions
sed '7d' file # Delete line number 7
sed '1,10d' file # Delete line number 1 to 10
sed '/REGEX/d' file # Delete lines matching REGEX
sed '3,/REGEX/d' file # Delete from 3rd line to a line matching REGEX
sed '/REGEX1/,/REGEX2/d' file # Delete lines starting from a line matching REGEX1 to a line matching REGEX2
sed '/REGEX1/,$d' file # Delete lines starting from a line matching REGEX1 till end of file
sed '/REGEX1/ {/REGEX2/d}' file # Delete lines matching both REGEX1 and REGEX2
Substitute command for find and replace
# Finds patterns matching REGEX1 and replaces with REGEX2
cat file | sed 's/REGEX1/REGEX2/[flags]'
# Optional flags
# i to ignore case
# g to replace more than one instance of the pattern in a line
# c to confirm every replacement (Only available in vim mode)
Other commands
# Delete command
cat file | sed '/REGEX/d' # Delete lines matching a regular expression
# Print command
cat file | sed -n '10p' # Print 10th line
cat file | sed -n '/REGEX/p' # Grep
# Transliterate command
cat file | sed 'y/abc/ABC/' # Transliterate (a->A, b->B, c->C)
For more information, execute info sed
on the shell to get a manual on sed.
Kernel defences
Usually when an attacker exploits a vulnerability, the attack starts out as a Illegal memory access or Control flow hijack, which the attacker would use to write to sensitive memory locations or execute arbitrary code in supervisor mode, to try to increase privileges in the system.
- Illegal memory accesses are memory accesses which the programmer didn’t intend to happen, which allows attackers to read or write to some memory locations.
Illegal memory accesses can be classified on three aspects:
- Read or Write access
- Access to Arbitrary address or restricted address
- If it is a write, Arbitrary value or restricted value write Write accesses and arbitrary address/value writes are more serious bugs, as they allow attackers more control over where or what value they can write, making it easier to subvert execution.
- In a control flow hijack, the vulnerability provides a way to divert execution into an attacker controlled path. For instance, when an attacker controls the value of a function pointer, she can hijack control flow when that function is dereferenced. Control flow hijack can happen either on the forward edge (when a function is called) or on the backward edge (when a function returns).
Despite having vulnerabilities that allow illegal writes or control flow hijack, the kernel has a few defence mechanisms in place to make it difficult to convert a vulnerability into a useful attack.
…Fixing syzbot bugs
Syzbot is an automated fuzzing infrastructure that uses Syzkaller to perform continuous fuzzing, primarily on the Linux kernel. Whenever it finds bugs, Syzbot reports it to the relevant mailing list. It also has a public dashboard where it lists all the open bugs that needs to be fixed.
Syzbot is quite effective in finding bugs in the kernel but due to the large number of bugs being found, many of them don’t get fixed in time. And so, we can help fix those bugs. In this post, I’ll share the general approaches and steps in fixing Syzbot bugs.
…Coccinelle
Coccinelle is static analysis tool used for semantic pattern matching and automated transformation of C programs. It is written in OCaml. Unlike other pattern matching tools like grep which use regular expressions, Coccinelle understands C syntax and can find semantic code pattern in the source code and automatically transform them, irrespective of the name of identifiers, comments or formatting.
Coccinelle is intraprocedural, i.e. all its matching and transformation happens within functions. Coccinelle also does not expand C macros.
…