Hacking & Computer Science stuff

The carriage return (CR) case

I encountered a strange behaviour by testing text input processing functions, where I thought I found a vulnerability. Some feedback and history about characters management.

The characters

The history with introduction to security

Representation

Make the experience, open a file textbook.txt, write this content:

This is my wonderful textbook!

Save it then display it: cat textbook.txt. You should see your text, nothing exceptional.

Some explanations: as you may know, in most of the cases, all characters are 8-bits single bytes that can be represented as hexadecimal.
You can display the hexadecimal representation of your file with xxd:

❯ xxd textbook.txt 
00000000: 5468 6973 2069 7320 6d79 2077 6f6e 6465  This is my wonde
00000010: 7266 756c 2074 6578 7462 6f6f 6b21 0a    rful textbook!.

The T is represented by 54. The end-of-line (Or LF, Line Feed) is represented by 0A.

Yes, in the jungle of encoding schemes, you could encounter some outliers like characters encoded on 7-bits bytes (Original ASCII) or two bytes or more (For special characters, emoji, ...) ... Let's keep it simple today.

Here are the 128 first characters in ASCII table:

HexadecimalCharacter
00NUL
01SOH
02STX
03ETX
04EOT
05ENQ
06ACK
07BEL
08BS
09HT
0ALF
0BVT
0CFF
0DCR
0ESO
0FSI
10DLE
11DC1
12DC2
13DC3
14DC4
15NAK
16SYN
17ETB
18CAN
19EM
1ASUB
1BESC
1CFS
1DGS
1ERS
1FUS
20SP
21!
22"
23#
24$
25%
26&
27'
28(
29)
2A*
2B+
2C,
2D-
2E.
2F/
300
311
322
333
344
355
366
377
388
399
3A:
3B;
3C<
3D=
3E>
3F?
40@
41A
42B
43C
44D
45E
46F
47G
48H
49I
4AJ
4BK
4CL
4DM
4EN
4FO
50P
51Q
52R
53S
54T
55U
56V
57W
58X
59Y
5AZ
5B[
5C|
5D]
5E^
5F_
60`
61a
62b
63c
64d
65e
66f
67g
68h
69i
6Aj
6Bk
6Cl
6Dm
6En
6Fo
70p
71q
72r
73s
74t
75u
76v
77w
78x
79y
7Az
7B{
7C|
7D}
7E~
7FDEL

The unprintable

You can notice there are two kind of categories:

  • Printable characters: a, B, 1, 2, !, etc.
  • Unprintable characters: NUL, BEL, ACK, BS, TAB, ...

Where do the unprintable characters come from?. Most of these characters come from older times, where old computers and teletypes were the kings and queens (or Teleprinter).

teleprinter

These characters instruct the interpreter for special behavior, like the good old typewriter for "Ring a bell" or "Return to the beginning of the text". They are also called Control characters. All of them also can be represented by Caret notation, and some may be represented using C escape sequence:

HexadecimalCharacterCaret notationC escape sequence
00NUL^@\0
01SOH^A
02STX^B
03ETX^C
04EOT^D
05ENQ^E
06ACK^F
07BEL^G\a
08BS^H\b
09HT^I\t
0ALF^J\n
0BVT^K\v
0CFF^L\f
0DCR^M\r
0ESO^N
0FSI^O
10DLE^P
11DC1^Q
12DC2^R
13DC3^S
14DC4^T
15NAK^U
16SYN^V
17ETB^W
18CAN^X
19EM^Y
1ASUB^Z
1BESC^[\e
1CFS^|
1DGS^]
1ERS^^
1FUS^_
7FDEL^?

The ASCII page on Wikipedia is a gold mine about this.

These characters have been used for more than 50 years, and are still fully used today in all basic I/O communication at low level.

We may one day talk about (pseudo)TTY behaviors, you can refer to this excellent article from Guillaume Quéré about The oldest privesc: injecting careless administrators' terminals using TTY pushback and Linus Akesson about The TTY demystified.

Managing the unprintable characters

When text processing comes in, everything breaks. We need exceptions, awful if/else, whitelisting, regex (they do not hurt, stay here), ... to manage control characters (block, replace, etc.). Beyond this, some characters may even have a double meaning (ex: control character 09 for text tabulation).

Without countermeasures, here some winners in cybersecurity:

bomb

What a nightmare uh? Because these characters are not natural, they are sometimes ignored by developers, thinking control characters are actually well-managed at low-level states which is not always the case (by purpose, by mistake or by ignorance).

A lot of security holes come from missing character management. We are only talking about 1 or 2 bytes. That is beautiful. That is also why I like cybersecurity.

About character testing

I will not especially cover specific injection types like the CRLF one. I think you got it after reading that first part: in fact, all control characters, and by extension unprintable characters, are candidates for injection during text processing. Some of them are more special than others, because they are more prevalent (NUL for strings, LF for lines, ...).

It is always interesting to inject these control characters during critical text processing (Authentication, user management, session management, text editor, etc.).

Some ideas and methods:

  • For HTTP(S), I like to use the Intruder from Burp Pro with numbers or hex list.
  • For serial or specific TCP, I like to run my own boofuzz template or specific library if necessary (scapy, ...).
  • For RPC / IPC, I like implementing my own callers (C, shell, python, ...) but you can also use standard fuzzers (AFLplusplus, syzkaller, ...). msfvenom mixed with badchars generator can also be used for payloads generation, but it is very context-dependent.
catgun

The case

The security test

I was testing the security of a user management feature.
An HTTP endpoint permits adding a user to the system, which involves editing the /etc/passwd file (found after some reverse engineering).
Here the general steps of the current implementation:

  1. API endpoint callable with HTTP POST with parameters like the username, the password, the groups and the user-friendly name (GECOS).
  2. Its implementation calls an internal function which ends up invoking putpwent from glibc (man), in order to edit /etc/passwd.

A vulnerability may be discovered in two ways:

  • Try to find a vulnerability regarding the workflow between (1) and (2).
  • Try to find an exploit regarding directly putpwent to target the system. This where I identified some parameters "not well controlled" by the caller.

I like the last way, and I also wanted to understand how the putpwent function works.

The Results

Within the well-prepared environment, create a simple caller:

#include <stdio.h>
#include <pwd.h>

int main() {
    FILE *passwdFile = fopen("./etc_passwd", "a");
    if (passwdFile == NULL) {
        perror("Error on reading file");
        return 1;
    }

    struct passwd userEntry;

    userEntry.pw_name = "standard";
    userEntry.pw_passwd = "userpw";
    userEntry.pw_uid = 1001;
    userEntry.pw_gid = 1001;
    userEntry.pw_gecos = "Simple user";
    userEntry.pw_dir = "/home/user1";
    userEntry.pw_shell = "/bin/sh";

    putpwent(&userEntry, passwdFile);

    fclose(passwdFile);

    return 0;
}

After executing the previous code, the file etc_passwd will contain:

standard:userpw:1001:1001:Simple user:/home/user1:/bin/sh

Fine. Nominal case.

Now, let's try to inject some control characters and : as it acts as a separator.

  • The function is not executed for \n:
    userEntry.pw_name = "stand\nard";
  • : is replaced by a space in the final file:
    userEntry.pw_gecos = "Simple:user";
  • For \r (CR):
    userEntry.pw_gecos = "Simple\ruser";

I got:

cat etc_passwd 
user:/home/user1:/bin/sh1:Simple

Hm. What? A possible vulnerability in glibc for password entry write function ? I could not believe it (and I feel like I shouldn’t; this is too easy).

deeper

Let's dive in glibc source code. In fact, there is effectively just a "simple" filter on semi-colon and LF characters around nss/valid_field.c:

#include <nss.h>
#include <string.h>

const char __nss_invalid_field_characters[] = NSS_INVALID_FIELD_CHARACTERS;

/* Check that VALUE is either NULL or a NUL-terminated string which
   does not contain characters not permitted in NSS database
   fields.  */
_Bool
__nss_valid_field (const char *value)
{
  return value == NULL
    || strpbrk (value, __nss_invalid_field_characters) == NULL;
}

Where NSS_INVALID_FIELD_CHARACTERS is # define NSS_INVALID_FIELD_CHARACTERS ":\n".

Sure, the vulnerability could only be fully exploited if the password entry function accepts that format (Like getpwent).

I have managed to create a special passwd file from a possible bad usage of putpwent function resulting in abcd:mypwd:88:0:random.
It could potentially be used as a local privilege escalation vector.

If I create a /etc/passwd file based with this result: it works (sudo -u abcd id gives me the IDs of the user), but I use the full chain exploit attempt, it does not (real editing of /etc/password file through putpwent then sudo -u abcd id. The user does not exist.).

WHY?

questions

The failure

I just forgot to avoid to mixing up text processing programs with pure I/O programs.

Let’s go back to our last generated passwd file:

cat etc_passwd 
user:/home/user1:/bin/sh1:Simple

Display with -A:

cat -A etc_passwd
standard:userpw:1001:1001:Simple^Muser:/home/user1:/bin/sh$

I was stunned. Just tricked by (my own) ^M character.

cry

Keep in mind that the historical job of that control character is to do a carriage return.

So, with default options, cat produces a good output, by only showing "printable" characters and process control characters in conformity with your tty. For this case:

  • Write standard:userpw:1001:1001:Simple
  • Process ^M: go back to the beginning of the buffer (or return to the beginning of the line like a typewriter).
  • Write user:/home/user1:/bin/sh
  • We got the final string: user:/home/user1:/bin/sh1:Simple.

It is also a good reminder that printf should be preferred to echo for I/O. echo is dedicated to text (man), and an LF control character is automatically appended by default (this behavior can be disabled if you pass the option -n). It could mess up some of your scripts. Example:

# Incorrectecho "wonderful" | base64
d29uZGVyZnVsCg==
# Correctprintf "wonderful" | base64
d29uZGVyZnVs

Conclusion and learnings

Do not mix pure I/O and text processing programs or functions. For the security of your programs and your peace of mind.

Ultimately, prefer checking and comparing function output with agnostic format (Hexadecimal (xxd)) combined with checksum (sha256sum).

This little experience is kind of human proof-of-concept that, even with technical knowledge about characters management, errors ((bad) usage, (mis)understanding) about text processing, it can sometimes lead to strange outputs, followed by hasty conclusions, and associated bugs. What a waste of time. Now multiply that situation by N where N is the amount of people, langages, programs, fixes, errors, standards that have been around for 50 years.

Character management, an infinite source of work for developer & cybersecurity.

© Sébastien Copin (cosades) 2024