Formatting hexdump’s output

hexdump is a tool used to dump a file (or whatever is piped to it) as a hex file. I personnally like using it with the “canonical” (hex + ASCII) options like this:

https://gist.github.com/SenpaiSilver/84f1f755ee3cc426ad503f856ee81351

While reading the man I saw I could specify a format (-e) or a format file (-f) to change the output. Great!
I can actually control what I want to see.

Understanding the output

A format file is just like your regular file. We will now try to recreate the “canonical” output. First of all we need to analyse what output provides and how:

hexdump -C Made_in_paint_with_love

We can see that there are 3 sections:

  • Offset;
  • Hex data;
  • ASCII data.

The Offset is 8 characters wide, the hex data is 48 characters wide and the ASCII data is 16 characters wide or 18 characters wide with the separators. Each section is separated by a set of two spaces, same goes for the hex data that is separated in half.

The number of leading zeros in the offset is 8, and in the hex data is 2.
This is particularly important so we can get the padding right.

Creating the format file

Use your favorite text editor and/or operating system. I won’t judge you.
The best way to reproduce the same output is to use the same input, so start writing your favorites quotes from your favorite shows.
I’ll be using some data I got from sniffing Republic Commando‘s sockets.

It’s good to be familiar with the printf formatting values.

The first element we need to show is the offset which can be achieved by using the magic value _ax. Values that are printed must be surrounded by quotes. Since we are printing hex numbers in the hex section we will use 02h so that we have a hex number that will always be 2 characters long, even if it’s null.
For the ASCII section we will use _p.

https://gist.github.com/SenpaiSilver/1b22768ccaca7bb1dbd6dd8ed7b93415

Close enough. The offset needs leading zeros. Let’s put 8 leading zeroes like for the hex variable. Let’s also adjust the padding on the ASCII section.

https://gist.github.com/SenpaiSilver/feec6274663338d82a8cea0b2174579a

That’s very close. But we don’t have the last line from the “canonical” output, the _Ax value. And we’re missing the padding cutting the hex data in half. To cut data in half we can print each half separately.

https://gist.github.com/SenpaiSilver/f0af132b6f85cf910654ffd074ea7ef6

That’s it!

Going further

There is not much to say. This is what the “canonical” output prints. The hardest part was probably understanding how to use the buffer.

https://gist.github.com/SenpaiSilver/7738556f3552505a48f89eb466bcf6c3

This is the final format file. Instead of using hard-coded padding it is possible to specify the padding before the leading zeros as such: "%3.02x" will be equivalent to " %02x".

We are also printing each byte by specifying 8/1 wich translates to print the next 8 values as groups of 1 byte. By writing 8/2 you are grouping the 16 next values, grouping them by 2 and printing each group.
This is manipulating the input as blocks of arbitrary length.