How to Read Types in C

Type signatures in C are a common point of confusion for even seasoned programmers. I found myself repeating the same explanations over and over again, so I decided to write this post to have a reference to link to.

Recap: Type annotations

In many programming languages (and in mathematics), you annotate an expression e with type A by writing e : A. These types can either be primitive types like int or float, or they can be compound types like function types A → B or product types A × B (aka tuples (A, B)).

Here are some examples of valid type annotations:

2 : int
3.14 : float
square : int → int
square(2) : int

Many languages also feature generic types, which are types that can be parameterized by other types. For example, the type List<A> is a list of elements of type A. You can think of these as being "functions" from types to types (e.g. List takes a type A and returns a type List<A>).

In the following, we will also see pointer types ptr<A> (often denoted with a *), and arrays array<A, N> (often denoted with []), where N is the size of the array.

How C does it

In C, the syntax used to declare variables mirrors the syntax of their usage. While it probably felt intuitive at the time, this "declaration follows usage" principle effectively means you spell the type inside out.

More precisely, a declaration in C consists of some declaration specifiers (these include type specifiers such as int or float, but also things like const, static, inline, etc.) followed by a list of declarators. A declarator is essentially an identifier with some simple operations (e.g. dereferencing *, indexing [], function application (..), etc.) applied to it. The declaration specifiers tell you the type of the result of those operations on the identifier.

int x declares x to be type int.
float *x declares *x to be type float, i.e. x : ptr<float>
int *x[10] declares *x[10] to be type int, i.e. x : array<ptr<int>, 10>
int (*x)[10] declares (*x)[10] to be type int, i.e. x : ptr<array<int, 10>>
float f(int x) declares f(x) to be type float, i.e. f : int → float
void *f(int x) declares f : int → ptr<void>
void (*(*f)(int x))(float y) declares f : ptr<int → ptr<float → void>>

I find that even in university-level courses, this is often glossed over, leaving students to figure it out on their own, often by looking at some examples. Many end up with the mental model that declarations are just a type followed by a variable name, which works for annoyingly many cases, but breaks for just slightly more complicated ones.

A by now relatively famous example is int* x, y;. At first sight it might look like we are declaring two pointers, but in reality, this declares both *x and y to be type int, meaning x : ptr<int> and y : int. Of course, you can make this more explicit by moving the * to the right, or by using parentheses (like int (*x), y;) but rarely anyone does.

An evaluation algorithm for C declarations

The translation from a C declaration to its type can be defined recursively.

A declaration of the form T D, where T is a type specifier and D is the declarator, has type τ(D, T) where τ is defined recursively as follows:

τ(id, T)           = T
τ(*D, T)           = τ(D, ptr<T>)
τ(D[N], T)         = τ(D, array<T, N>)
τ(D(T' S', ..), T) = τ(D, (τ(T' S'), ..) -> T)

Here are some examples of how this works.

// int *x[4];
  τ(*x[4], int)
= τ(x[4], ptr<int>)
= τ(x, array<ptr<int>, 4>)
= array<ptr<int>, 4>

// void *f(int *x)
  τ(*f(int *x), void)
= τ(f(int *x), ptr<void>)
= τ(f, τ(*x, int) -> ptr<void>)
= τ(f, τ(x, ptr<int>) -> ptr<void>)
= τ(f, ptr<int> -> ptr<void>)
= ptr<int> -> ptr<void>

Type names

In some cases, you just want to name a type without also declaring a variable. To do this, you can use a type name. They look almost exactly like the declarations we have seen already with the identifier removed. So instead of int *x, you would write int *, or int (*) if you want to be pedantic.

Here are some examples of type names:

char * represents ptr<char>
int (*)[3] represents ptr<array<int, 3>>
int *() represents () → ptr<int> ¹
int (*)(float) represents ptr<float → int>
void *(*)(int) represents ptr<int → ptr<void>>

I'm not sure this still counts as "declaration follows usage" since there is nothing to be used here, but the syntax is still consistent with the declaration syntax, so I think it's a reasonable choice.

What about `typedef`?

Some of you might know that the int* x, y; problem can be solved by introducing a type alias, as follows.

typedef int* int_ptr_t;
int_ptr_t x, y;

This will indeed declare both x and y with type int_ptr_t = ptr<int>. So what's going on here? Well, the first line is really just another declaration. typedef is a storage-class specifier, so it is one of the declaration specifiers we mentioned earlier. Fully parenthesized, the first line becomes typedef int (*(int_ptr_t));. So, a declaration like int_ptr_t x; is expanded by substituting int_ptr_t for our declarator x in the type definition, so we get int (*(x));.

The curious case of C++

Since C++ is (mostly) backward compatible with C, declarations in C++ also spell the type inside out, but C++ also has templated types, which are spelled outside in.

For example, the declaration std::vector<int (*)[6]> *v; would declare a variable v of type ptr<vector<ptr<array<int, 6>>>>. Here is an illustration of the order in which you read the declaration.

std::vector<int (*)[6]> *v;
     ^       ^   ^  ^   ^
     2       5   3  4   1

In fact, C++ templates can be used to "reverse" the order of declarations arbitrarily, using an identity template. In the following, all declarations of f : array<ptr<int → float>, 2> are equivalent.

template<typename T> using Id = T;

float (*f[2])(int);
Id<float> (*f[2])(int);
Id<Id<float>(int)> *f[2];
Id<Id<Id<float>(int)>*> f[2];
Id<Id<Id<Id<float>(int)>*>[2]> f;

We can make this look a little nicer by adding more type aliases.

template<typename T> using Ptr = T*;
template<typename T, int N> using Array = T[N];
template<typename T, typename... Args> using Fn = T(Args...);

Array<Ptr<Fn<float, int>>, 2> g;

In C, if a function is declared with an empty parameter list, it means that the number and types of the parameters are unspecified. This is different from a function with no parameters, which is declared with void as the parameter list. In C++, an empty parameter list means no parameters. ↩︎

Recap: Type annotations#

How C does it#

An evaluation algorithm for C declarations#

Type names#

What about typedef?#

The curious case of C++#