C programming language allows developers to directly access the memory where variables are stored. Ruby does not allow that. There are times while working in Ruby when you need to access the underlying bits and bytes. Ruby provides two methods pack and unpack for that.
Here is an example.
1> 'A'.unpack('b*') 2=> ["10000010"]
In the above case 'A' is a string which is being stored and using unpack I am trying to read the bit value. The ASCII table says that ASCII value of 'A' is 65 and the binary representation of 65 is 10000010 .
Here is another example.
1> 'A'.unpack('B*') 2=> ["01000001"]
Notice the difference in result from the first case. What's the difference between b* and B*. In order to understand the difference first lets discuss MSB and LSB.
Most significant bit vs Least significant bit
All bits are not created equal. C has ascii value of 67. The binary value of 67 is 1000011.
First let's discuss MSB (most significant bit) style . If you are following MSB style then going from left to right (and you always go from left to right) then the most significant bit will come first. Because the most significant bit comes first we can pad an additional zero to the left to make the number of bits eight. After adding an additional zero to the left the binary value looks like 01000011.
If we want to convert this value in the LSB (Least Significant Bit) style then we need to store the least significant bit first going from left to right. Given below is how the bits will be moved if we are converting from MSB to LSB. Note that in the below case position 1 is being referred to the leftmost bit.
{% highlight text %} move value 1 from position 8 of MSB to position 1 of LSB move value 1 from position 7 of MSB to position 2 of LSB move value 0 from position 6 of MSB to position 3 of LSB and so on and so forth
1 2After the exercise is over the value will look like `11000010`. 3 4We did this exercise manually to understand the difference between `most significant bit` and `least significant bit`. However unpack method can directly give the result in both MSB and LSB. The `unpack` method can take both `b*` and `B*` as the input. As per the ruby documentation here is the difference. 5 6{% highlight text %} 7B | bit string (MSB first) 8b | bit string (LSB first)
Now let's take a look at two examples.
1> 'C'.unpack('b*') 2=> ["11000010"] 3 4> 'C'.unpack('B*') 5=> ["01000011"]
Both b* and B* are looking at the same underlying data. It's just that they represent the data differently.
Different ways of getting the same data
Let's say that I want binary value for string hello . Based on the discussion in the last section that should be easy now.
1> "hello".unpack('B*') 2=> ["0110100001100101011011000110110001101111"]
The same information can also be derived as
1> "hello".unpack('C*').map {|e| e.to_s 2} 2=> ["1101000", "1100101", "1101100", "1101100", "1101111"]
Let's break down the previous statement in small steps.
1> "hello".unpack('C*') 2=> [104, 101, 108, 108, 111]
Directive C* gives the 8-bit unsigned integer value of the character. Note that ascii value of h is 104, ascii value of e is 101 and so on.
Using the technique discussed above I can find hex value of the string.
1> "hello".unpack('C*').map {|e| e.to_s 16} 2=> ["68", "65", "6c", "6c", "6f"]
Hex value can also be achieved directly.
1> "hello".unpack('H*') 2=> ["68656c6c6f"]
High nibble first vs Low nibble first
Notice the difference in the below two cases.
1> "hello".unpack('H*') 2=> ["68656c6c6f"] 3 4> "hello".unpack('h*') 5=> ["8656c6c6f6"]
As per ruby documentation for unpack
{% highlight text %} H | hex string (high nibble first) h | hex string (low nibble first)
1 2A byte consists of 8 bits. A nibble consists of 4 bits. So a byte has two nibbles. The ascii value of 'h' is `104`. Hex value of 104 is `68`. This `68` is stored in two nibbles. First nibble, meaning 4 bits, contain the value `6` and the second nibble contains the value `8`. In general we deal with high nibble first and going from left to right we pick the value `6` and then `8`. 3 4However if you are dealing with low nibble first then low nibble value `8` will take the first slot and then `6` will come. Hence the result in "low nibble first" mode will be `86`. 5 6This pattern is repeated for each byte. And because of that a hex value of `68 65 6c 6c 6f` looks like `86 56 c6 c6 f6` in low nibble first format. 7 8## Mix and match directives 9 10In all the previous examples I used `*`. And a `*` means to keep going as long as it has to keep going. Lets see a few examples. 11 12A single `C` will get a single byte. 13 14~~~ruby 15> "hello".unpack('C') 16=> [104]
You can add more Cs if you like.
1> "hello".unpack('CC') 2=> [104, 101] 3 4> "hello".unpack('CCC') 5=> [104, 101, 108] 6 7> "hello".unpack('CCCCC') 8=> [104, 101, 108, 108, 111]
Rather than repeating all those directives, I can put a number to denote how many times you want previous directive to be repeated.
1> "hello".unpack('C5') 2=> [104, 101, 108, 108, 111]
I can use * to capture al the remaining bytes.
1> "hello".unpack('C*') 2=> [104, 101, 108, 108, 111]
Below is an example where MSB and LSB are being mixed.
1> "aa".unpack('b8B8') 2=> ["10000110", "01100001"]
pack is reverse of unpack
Method pack is used to read the stored data. Let's discuss a few examples.
1> [1000001].pack('C') 2=> "A"
In the above case the binary value is being interpreted as 8 bit unsigned integer and the result is 'A'.
1> ['A'].pack('H') 2=> "\xA0"
In the above case the input 'A' is not ASCII 'A' but the hex 'A'. Why is it hex 'A'. It is hex 'A' because the directive 'H' is telling pack to treat input value as hex value. Since 'H' is high nibble first and since the input has only one nibble then that means the second nibble is zero. So the input changes from ['A'] to ['A0'] .
Since hex value A0 does not translate into anything in the ASCII table the final output is left as it and hence the result is \xA0. The leading \x indicates that the value is hex value.
Notice the in hex notation A is same as a. So in the above example I can replace A with a and the result should not change. Let's try that.
1> ['a'].pack('H') 2=> "\xA0"
Let's discuss another example.
1> ['a'].pack('h') 2=> "\n"
In the above example notice the change. I changed directive from H to h. Since h means low nibble first and since the input has only one nibble the value of low nibble becomes zero and the input value is treated as high nibble value. That means value changes from ['a'] to ['0a']. And the output will be \x0A. If you look at ASCII table then hex value A is ASCII value 10 which is NL line feed, new line. Hence we see \n as the output because it represents "new line feed".
Usage of unpack in Rails source code
I did a quick grep in Rails source code and found following usage of unpack.
{% highlight text %} email_address_obfuscated.unpack('C*') 'mailto:'.unpack('C*') email_address.unpack('C*') char.unpack('H2') column.class.string_to_binary(value).unpack("H*") data.unpack("m") s.unpack("U*")
1 2Already we have seen the usage of directive `C*` and `H` for unpack. The directive `m` gives the base64 encoded value and the directive `U*` gives the UTF-8 character. Here is an example. 3 4~~~ruby 5> "Hello".unpack('U*') 6=> [72, 101, 108, 108, 111]
Testing environment
Above code was tested with ruby 1.9.2 .
French version of this article is available here .