A Crash Course in C++: Representing Data

Memory leaks, here we come!

James Collerton
6 min readJul 9, 2024
Refer back to this calming picture of the ocean when the frustration kicks in.

Audience

This article is aimed at those looking to understand how data works in C++. This includes things like variables, pointers and containers.

To get the most from the article you will need a solid understanding of at least one other language. It will help if it’s compiled, but it’s not completely necessary as long as you understand the concepts.

Additionally it would be good to read my previous article introducing C++ here.

Argument

Personally, representing data is one of the things I find most interesting (and confusing) about C++.

Values, references and pointers

If you’re familiar with other programming languages then you’ll be used to the int a = 42syntax. Now whenever we use the variable a we can expect it to resolve to the value 42.

To explain how this kind of thing works in C++, let’s introduce a diagram.

Demonstrating data in C++

We now define four concepts:

  1. Type: The type of data within a variable (int, string etc.)
  2. Name: The name of the variable. In int a = 42; the name would be a.
  3. Content/ value: What we’d like to store in our variable. In int a = 42; the content would be 42.
  4. Address: Our content needs to live somewhere in physical memory. The address (like a postal address), tells us where that is.

Let’s use this to break down the diagram, starting with the right. In plain English this says: ‘there is an integer variable called a, whose content can be found at address 000000001. When you go to that address, the content is 42.’ For all intents and purposes: int a = 42;.

Where it gets slightly more complex is on the left hand side. What we’re declaring is a pointer. Rather than storing a value, like 42, we’re storing a reference to another variable’s address in memory.

The left hand side is saying ‘there is a variable called p_a, whose content can be found at address 999999999. When you go that address, the content is 000000001. If you go to the address stored in the content (000000001) you will find a value of type int’.

It’s not the most intuitive thing in the world, I’ll admit. I’ve added some examples below to help get our heads round it.

We can also introduce references, which are alternative names for existing variables.

References are quite similar to pointers, however, we don’t need to dereference to use their contents (we can directly use r_a, we don’t need to do *p_a).

The above demonstrates lvalue references. There also exist rvalue references, which we will cover later, as they require further concepts.

Aside from the need for dereferencing, another downside of standard pointers is that they require you to be aware of the memory you’re using. In certain circumstances, you will need to do things like manually delete things your pointers are referencing. This is both confusing and error-prone.

In more modern iterations of C++ they introduced smart pointers. These help manage their lifecycle and avoid things like allocating memory for pointers that are never used, or pointers that end up referencing nothing.

There are three smart pointers we’re going to talk about:

  1. unique_ptr: Doesn’t share the data it references. Most common type.
  2. shared_ptr: The opposite, for when you need multiple owners. Used only in specific instances.
  3. weak_ptr: For when we want to reference the object of a shared_ptr without incrementing the reference count.

Let’s clarify with a few examples. There may be some syntax in here you don’t quite understand. Let it slide for the time being, as long as you get the gist.

We’ve also sneakily introduced the auto keyword. This tells the compiler that the type of the variable is the return type of the initialisation expression.

As std::make_unique returns a unique_ptr, we know that u_p1 will also be a unique_ptr.

Containers, structs and unions

Containers are the data structures C++ defines for managing collections of data. They are split into the following groups:

  1. Sequence containers: Data is ordered (although not necessarily contiguous in memory). Access via index.
  2. Associative containers: Store data as values accessed with keys. Come in two categories: maps (key/ value are different) and sets (key/value are the same).

As long as you have a reasonably firm grasp of at least one programming language they should seem familiar.

Common containers include std::vector (arrays of variable size), std::map, std::stack, std::deque (double-ended queue), std::array (arrays of fixed size), std::list (constant time removal and insertion) etc.

Complimentary to containers are structs. These are convenient ways of grouping related values together.

Another more C++ specific concept is the union. This can be used to save memory by storing different objects at different times, but in the same part of memory.

It’s worth noting that except in very specific circumstances (e.g. heavily optimising for something like memory or perhaps binary size) unions are rarely used in modern C++. Instead we use something called std::variant, which can hold different types of object, but is beyond the scope of this article.

Both unions and structs employ the arrow operator ->, which allows you to access their members through a pointer.

Functions

It’s hard to talk about data without also discussing functions. We won’t dwell too much on simple syntax (assuming you’re familiar with it from other places), and instead will focus on the more interesting behaviours.

We’ll use an example to explore passing by value, reference and pointers.

Calculating at compile time

Above we introduced the const keyword. This indicates a value won’t change, allowing safer coding and better optimisations. Let’s introduce a related keyword, constexpr.

One of the benefits of C++ being compiled is that we can do some calculations at compile time, allowing our program to run faster. constexpr tells the compiler the value of the variable must be calculated at compilation (unlike const which can happen at runtime).

// We can have a variable calculated at
// compile time
constexpr int j = 1;

// Or an object (as long as the arguments are
// also constexpr)
constexpr MyObject my_object{const_expr_arguments};

// Or declare a function which returns a constexpr (and is also
// implicitly inlined)
constexpr int double(int x) {
return 2 * x;
}

The advantage of this is less work at runtime, but with more readable code. It also prevents undefined behaviour while running (as we’ve done the work at compile time).

lvalues and rvalues

We define an expression as sequences of operators and operands designed to:

  • Compute a value
  • Define objects or functions
  • Carry out a side-effect (like modify an object).

As you can see, the majority of the code we have written so far is formulated from expressions, and we haven’t had to understand much about them in order to use them.

However, it’s useful to understand two particular expression value categories: lvalues and rvalues. It’s not uncommon for the compiler to complain you’re using them incorrectly.

Splitting expressions into their various value categories.

The reason we include the above chart is to show that anything that is not an lvalue is either an xvalue or a prvalue (and therefore an rvalue).

Different people think of lvalues in different ways including:

  • Something that can live on the left hand side of an assignment.
  • Something that points to a location in memory.
  • Something whose result will survive past the end of the expression.

From the diagram we can see anything that is not an lvalue is an rvalue.

Let’s go back to trusty int a = 42;. In this expression a:

  • Lives on the left hand side of an assignment.
  • Points to a location in memory.
  • Will survive past the end of the expression.

Therefore a is an lvalue. However, 42:

  • Couldn’t live on the left hand side of the assignment
  • Exists in memory but doesn’t point to a location in memory,
  • On its own wouldn’t exist once the expression had finished.

Therefore 42 is an rvalue.

It’s worth noting none of these definitions are 100% correct, but understanding the intricacies of them aren’t vital to being able to employ them successfully.

Having these approximate definitions in mind is useful for debugging those pesky compiler errors. Additionally, understanding there are other value categories will be good if they ever crop up (although defining them is the subject of an article all on its own!).

Conclusion

In conclusion, we’ve covered a lot of the most interesting points around how C++ represents data including pointers, references, structs, containers, expressions and their accompanying syntax!

--

--

James Collerton
James Collerton

Written by James Collerton

Senior Software Engineer at Spotify, Ex-Principal Engineer at the BBC

Responses (1)