Bytes in Solidity

TLDR: this is a rehash of Jean Cvllr's post on Medium about bytes on Solidity. I decided to write my own version, as a project recently came up that involved buffers, bijective encoding/decoding optimizations, and 1155 component swapping.

What are bytes?

Bytes, like any unit of measurement in science, are simply a unit of measurement of computer memory. Inside of a block of memory, there are small sections that are dedicated to storage. These can be a 1 or a 0, called a bit. We don’t usually deal in bits at the application interface level, simply because the amount of memory we are using isn’t that small. The next level is a byte, which is simply 8 bits. 

So, why 8 bits? Actually, engineers tried 5, 6, and 7 bit bytes, but because of the 0/1 nature of storage, it actually made logical sense to go with a storage size that is a power of 2.  This means that maybe we’re not optimizing on a few bits here and there, but overall, the system has led to the standardization of a lot of lower level characters that we use, like an ASCII character that uses 1 byte per letter

Layers:

Metal

Bit (0 or 1)

Byte (8 bits, or 8 0 or 1’s)

Kilobyte (1,024 bytes, or 2^10 bytes)

Megabyte (1,024 kilobytes, or 2^20 bytes)

...


*0x means hexadecimal, just like all Ethereum addresses are based on hexadecimal


Bytes in Solidity

How do you store variables or values in your computer, if the value is bigger than one byte? Meaning that it overflows one byte, and part of it needs to be in a secondary memory slot. Engineers have deemed the term endianness, the order in which bytes are stored.

Let’s say you are trying to store the word ‘0xcoffee’ in memory. You could do it forward facing, meaning 0x, co, ff, and ee in separate memory slots (big endian); or, you could go the reverse way and do ee, ff, co, 0x (little endian). 

Now, when you build a machine, as long as the software/hardware engineers are on the same page about which end goes first, you shouldn’t have a problem. The problem comes when you have two machines that talk to each other with different ‘endianness.’

The internet has agreed on big endianness, but if there is a different network, there is software/functions at the port level that will translate the data into the computer’s local ‘endianness.’

What does this have to do with Solidity? 

In Solidity, strings and bytes are stored in big endian (left to right), and other types such as numbers and addresses, are stored as little endian (right to left).

If I wanted to store ‘ethereum’, that would get me 657468657265756d (hexadecimal is stored at 1-9, A-F, which is 16).

So, when stored as a string it would be:

0x657468657265756d000000000000000000000000000000000000000000000000 

The reason why there are 64 zeroes is because the hardware that is storing it is 32 bytes (256 bits). Each spot in hexadecimal is 4 bits, 4 bits * 64 slots = 256 bits.

If you want to store the 657468657265756d as an address (though it technically isn’t the correct length, as you can tell when you compare the length of this with your wallet address length), it would be little endian and be stored at the end:

0x000000000000000000000000000000000000000000000000657468657265756d

You might have also heard of a data type called byte, in Solidity. You would want to use bytes for unspecified-length raw byte data and string for unspecified-length string (UTF-8) data. So why/when would you use byte vs string? You save gas when using bytes; but, you need to use a string if the sequence of characters inside of quotations (ie “ethereum”) is larger than 32 bytes. Bytes are stored as big endian, like strings.


What about arrays and structs?

Fixed size byte arrays

You can create a fixed-size byte array by typing bytes[length] with the length being any number between 1-32. Why is 32 the max? It is simply because it is a commonly-used memory size, so it made sense to optimize something that was going to be used frequently. 

When would you use a fixed-size byte array? You would use this for words that are less than 32 bytes (which is a lot of words, given that we save variable names like the contract name, a trading ticker symbol, etc). These could all be bytes[] values instead of strings. You also cannot return a string from a contract and use that value in a different one.

Dynamically sized byte arrays

Bytes- when you don’t know how long the word will be. Whereas above, you specified a length less than 32, with this one, you don’t know what the length will be. This would be used in an instance where you have an incoming string (which is technically in bytes) from another contract, and need to use it in your code.

Bytes are treated like arrays, which means that you can use pop(), push(), and length(). 

String- same thing as above, where you don’t know the expected length of the string and it is UTF-8 encoded.


Bitwise operations

Bitwise operations are basically operations done at the bit level within a specific data type. This could be a boolean, array, string, etc. Despite being so low level, the symbols will be pretty familiar if you’ve used Javascript before.

Comparison:

<, <=, ==, >=, >, !=

Bit:

AND: &

OR: |

XOR: ^

NEGATION: ~

Now, you’re probably looking at these symbols like WTF why do i need these? In all honesty, you can get away without using them, but it will make your code much more memory efficient, which is great when your single smart contract is managing $1B+ worth of crypto in LP’s and gas is used for every write operation/change of state. 

We will have two variables to start with: a and b. For their hexadecimal values, we’ll choose two random sequences: let’s say 73 and d2 (use this tool to go between hex and binary).

bytes1 a = 0x73; // [11100110]

bytes1 b = 0xd2; // [11010010]

----------

`a & b`, AND: we need both 1’s in a column to get a 1 in the output row

11100110

11010010

11000010  // ⇒ 0xc2

----------

`a | b`, OR: at least one of the bits has to be 1 to get a 1 in the output row

11100110

11010010

11110110 // ⇒ 0xf6

----------

`a ^ b`, XOR: the values must be opposite in a column to get a 1 in the output row

11100110

11010010

00110100 // ⇒ 0x34 (BTW, if you want to know `a`, just XOR compare the `output` with `b`)

----------

`~ output`, NEGATION: you’re just reversing the 1 or 0 

00110100

11001011 // ⇒ 0xcb

----------

Two questions arise:

  1. When would you use these?
  2. What is the relevance of their outputs?

You would use these for more performant code, minimizing your gas costs, and optimizing for memory. The outputs are relevant as those are what you would be sending to different contracts or your own functions. That being said, L2’s and Eth2.0 (soon?) should make this unnecessary for most contracts.


Shifting 

This is a way to shift bits- if we shift our 11001011 left by 4 bits, 11001011 becomes 10110000.

Left shift (uses <<):

Right shift (uses >>): 

Bite of Bytes

This is just a brief overview of bytes in Solidity, but there are a ton more tricks that you can use. When using bytes, be conscious of gas optimizations and incoming data from external contracts. You need to match the incoming data type, but might be able to save on your side. There is always room for improvement!

Tired of doing the same activities over and over again?

Get Started