Bound and Constraint in Generics and Type Variables#
Some Motivation, The Problem With Unconstrained Type Variable#
We start off with some motivation. To understand why unconstrained type
variables can be problematic, let’s examine a simple add
function that
combines two arguments:
from typing import TypeVar
T = TypeVar("T")
def add(x: T, y: T) -> T:
return x + y
And note that running this piece of code through mypy
[1] will yield two
particular errors:
4: error: Returning Any from function declared to return "T" [no-any-return]
return x + y
^~~~~~~~~~~~
4: error: Unsupported left operand type for + ("T") [operator]
return x + y
^~~~~
These errors occur because TypeVar("T")
is unconstrained, and this
means T
can represent literally any type, and not all type can do add
. Why
so? Because for +
operator, the underlying object must implement the __add__
method. For instance, two dictionaries cannot be added together because their
__add__
is not well defined. This error that mypy
raised forces you to
rethink your code design - in that what types of type the type variable T
can
take on. Assume for the sake of simplicity, that your program’s add
would only
operate on a few types:
int
float
NDArray[np.float64]
str
And if we can somehow tell our type variable T
to only take on any one
of the 4 types above, then our job is done. This motivates the need for
constraining type variables to a specific set of types that support our
desired operations (i.e. +
operator).
Constraining Type Variable#
Constraints allow you to specify a list of explicit types that a type
variable (TypeVar
) can take. This means the type variable can represent any
one of these specified types, and no others.
Syntax:
T = TypeVar("T", Type1, Type2, ...)
Meaning: The type variable
T
can be any one ofType1
,Type2
, etc.Concrete:
T = TypeVar("T", int, float)
meansT
can only beint
orfloat
.
To be more formal, a constrained type variable is a type variable \(T\) bound to a
finite set of types \(\mathcal{S} = \{T_1, T_2, \ldots, T_n\}\) such that \(T\) can
only be instantiated with types from \(\mathcal{S}\). So if one declares
T = TypeVar("T", T_1, T_2, ...)
, then \(T\) can only be bound to a type
\(T_i \in \mathcal{S} = \{T_1, T_2, \ldots, T_n\}\).
In our add
example, since we only want the type variable T
to take on 4
specific types, we can thus re-write the type variable as such:
1from typing import TypeVar
2
3import numpy as np
4from numpy.typing import NDArray
5
6T = TypeVar("T", int, float, str, NDArray[np.float64])
7
8def add(a: T, b: T) -> T:
9 return a + b
Then running mypy
again, will yield no errors, because the static type checker
is intelligent enough to know that T
can only take on int
, float
, str
,
or NDArray[np.float64]
, all of which has the add
operator well defined.
Type Binding#
When using a constrained TypeVar
, the type checker ensures that within a
single usage of the function, all occurrences of T
must be the same type.
This is usually defined as
Type Binding.
What does this even mean? Type binding is the association of a type variable
(like T
) with a concrete type (like int
). We can intuitively think of
this action as a substitution (or mapping) of the type variable with a
concrete type - we replace every occurrence of T
with the concrete (actual)
type.
Type Variables And Substitution#
Let’s use the earlier example of add
function, and see how type binding works.
In line 6, we declared T
as
TypeVar("T", int, float, NDArray[np.float64], str)
. This means:
There exists a type variable
T
.T
can only be substituted from the set of types{int, float, NDArray[np.float64], str}
.All occurences of
T
in a single function invocation must be substituted with the same type.
Since type binding happens at function call time (runtime), then the
process is like follows (T
and \(T\) are interchangeable):
Say we invoke the function
add(a=x, b=y)
wherex
andy
are of some type.Let
type(x)
be denoted as \(T_x\) andtype(y)
be denoted as \(T_y\).For the checker to pass, we must abide by:
\(T_x\) must be one of the types defined in the set
{int, float, NDArray[np.float64]}
.\(T_y\) must be one of the types defined in the set
{int, float, NDArray[np.float64]}
.\(T_x\) and \(T_y\) must be the same type due to both have same type variable \(T\).
Then \(T\) is mapped (or bound) to \(T_x\) (\(T \mapsto T_x\)).
Binding Time And Scope#
Consider the following two calls:
def add(a: T, b: T) -> T:
return a + b
z1 = add(2, 5) # T ↦ int
z2 = add(1.0, 2.0) # T ↦ float
Each function call creates a new binding scope:
For the call z1 = add(2, 5)
:
Let \(x_1 = 2\) and \(x_2 = 5\) be the input arguments
Given \(\text{type}(x_1) = \text{int} \in \mathcal{S}\) where \(\mathcal{S} = \{\text{int}, \text{float}, \text{NDArray}[\text{np.float64}], \text{str}\}\)
Create binding \(T \mapsto \text{int}\)
Verify \(\text{type}(x_2) = \text{int} = T\)
Therefore return type \(\text{type}(z_1) = T = \text{int}\) is valid
An invalid binding would be:
z3 = add(1, "3")
This is because:
type(1) = int
T
is bound toint
type("3") = str != int != T
Type Error ensues. Mypy will also catch this.
Union Type versus Constrained Type Variable#
The definition of constrained type variable may not be immediately clear on why
we cannot just use an Union
type. After all, Union
type is a type that can
take on any one of the types defined in the set
{int, float, str, NDArray[np.float64]}
. Why bother to purposely specify a
constrained type variable? As we shall see shortly, using Union
type is a
loose constraint, and it does not enforce the type variable to be of the same
type (i.e. within a function scope, a
and b
must be of the same type can be
violated).
Union Type#
Consider the following example using Union
type:
1from typing import Union
2from numpy.typing import NDArray
3import numpy as np
4
5U = Union[int, float, NDArray[np.float64]]
6
7def add(a: U, b: U) -> U:
8 return a + b
This type signature allows any combination of types within the Union.
For any arguments \(a\) and \(b\): \(\text{type}(a), \text{type}(b) \in \{\text{int}, \text{float}, \text{NDArray}[\text{np.float64}]\}\)
No constraint exists requiring \(\text{type}(a) = \text{type}(b)\)
This leads to potentially unsafe operations:
result1 = add(3, 4) # Works as expected: 7
result2 = add(3, 9.99) # Type mixing: 12.99
result3 = add(3, np.array([1, 2, 3])) # Implicit broadcasting: array([4, 5, 6])
result4 = add(3, "4") # Type mixing: error!
The Union type permits all these operations because it only enforces that each argument independently belongs to the set of allowed types.
More verbosely, with Union
, the operation we use between arguments (i.e. a
and b
) is supported by any permutation order[2]. As we can see, we added
two int
, added int
and float
, added int
and an NDArray
, and lastly,
added int
and a str
(which is an error)! Is this really what we want? Do we
really want to allow int
and NDArray
to be added together freely (they can,
but some may regard it as not safe as it might lead to undesirable consequences
via broadcasting). Consequently, there will be no error raised for certain
operations that are unsafe. We fear not those that raises an error, but those
that does not (silent errors).
Furthermore, let’s consider a scenario where you want to restrict the add
function to only work with int
or str
types. Using a Union
type would be
problematic because it allows mixing these types in a single operation (like
adding an int
with a str
), which would raise a runtime error.
U = Union[int, str]
def add(a: U, b: U) -> U:
return a + b
Our static type checker is fast to spot this potential error and raised aptly the following:
4: error: Unsupported operand types for + ("int" and "str") [operator]
return a + b
^
4: error: Unsupported operand types for + ("str" and "int") [operator]
return a + b
^
4: note: Both left and right operands are unions
This suggests there’s a better way to type hint rather than using Union
type.
Constrained TypeVar Approach#
Now consider using a constrained TypeVar:
from typing import TypeVar
T = TypeVar("T", int, float, str, NDArray[np.float64])
def add(a: T, b: T) -> T:
return a + b
This enforces a stronger type constraint. Formally:
\(\exists T \in \{\text{int}, \text{float}, \text{NDArray}[\text{np.float64}]\}: \text{type}(a) = T \land \text{type}(b) = T\)
The return type must be the same \(T\) that was bound to the arguments
This prevents type mixing:
result1 = add(3, 4) # Valid: T ↦ int
result3 = add(3, np.array([1, 2, 3])) # Type Error: Cannot bind T to both int and NDArray
result4 = add(3, "4") # Type Error: Cannot bind T to both int and str
TypeVar provides stronger compile-time guarantees and ensures type consistency within a function scope. Naively, I guess the type checker applies some rules for each approach above, my guess is:
Union types:
For each argument
x
, the type checker will check ifx
is of any of the types in the unionUnion[T_1, T_2, ...]
.If
x
is of any of the types in the union, then the type checker will proceed to check the next argument.If
x
is not of any of the types in the union, then the type checker will raise an error.
Constrained TypeVar:
First occurrence of
T
is checked against the set of types defined in theTypeVar
declaration.If the type of the first occurrence of
T
is not in the set, then the type checker will raise an error.If the type of the first occurrence of
T
is in the set, thenT
is bound to the type of the first occurrence ofT
and all subsequent occurrences ofT
must be of the same type.
Upper Bounding Type Variables#
Bounds specify an upper bound for the type variable. This means that the type variable can represent any type that is a subtype of the specified bound.
Syntax:
T = TypeVar("T", bound=SuperType)
Meaning: The type variable
T
can be any type that is a subtype ofSuperType
(includingSuperType
itself).Concrete:
T = TypeVar("T", bound=BaseModel)
meansT
can be any type that is a subclass ofBaseModel
(frompydantic
).
Defining Type Variables with Upper Bounds#
In Python’s type hinting system, you can define a type variable that restricts
which types can be used in place of it by specifying an upper bound. This is
done using the bound=<type>
argument in the TypeVar
function. The key point
is that any type that replaces this type variable must be a subtype of the
specified boundary type.
It’s important to note that the boundary type itself cannot be another type variable or a parameterized type. For example:
# NOT ALLOWED!
T1 = TypeVar("T1")
T2 = TypeVar("T2", bound=T1) # Cannot use T1 as a bound
You also cannot use a parameterized generic type as a bound. For example:
# This is NOT allowed
from typing import List
T = TypeVar('T', bound=List[int]) # Error! Can't use List[int] as a bound
# This is NOT allowed either
S = TypeVar('S')
T = TypeVar('T', bound=List[S]) # Error! Can't use parameterized List as bound
Some Motivation, Simple Length Comparison#
Let’s say we want to write a function that compares the lengths of two sequences.
from typing import TypeVar
T = TypeVar("T")
def longer(x: T, y: T) -> T:
return x if len(x) > len(y) else y
This fails the type checker because T
could be any type - there’s no guarantee
it supports len()
. For example, the below code works because list
and str
both support len()
:
compare_lengths([1, 2], [3])
compare_lengths("hello", "hi")
However, the below code will fail at runtime because int
does not support
len()
:
compare_lengths(42, 100)
A Case Study On Sized
#
We can fix this using bound
to ensure our type variable only accepts types
that support len()
. Consider the Sized
protocol from Python’s typing
module, which represents any type that supports the len()
function. We define
a type variable ST
with Sized
as its upper bound:
from typing import TypeVar, Sized
ST = TypeVar("ST", bound=Sized)
This definition means that ST
can be replaced by any type that has a len()
method, ensuring that objects of type ST
can be measured for their size.
The function longer
takes two parameters, x
and y
, both of type ST
. It
returns the object with the greater length:
def longer(x: ST, y: ST) -> ST:
return x if len(x) > len(y) else y
Because ST
is bound to Sized
, we can safely use len()
on x
and y
. This
allows the function to work with any sized collection, such as lists or sets.
longer([1], [1, 2])
correctly returns the longer list, with the return type beingList[int]
.longer({1}, {1, 2})
operates on sets, returning the larger set asSet[int]
.What’s interesting is that
longer([1], {1, 2})
being okay and returning a typeCollection[int]
is correct as well. This is because unlike constraints, we do not need bothx
andy
to be of the same exact type, they just need to be subclass of the bound super type.
Why Not Use Constraints?#
You might wonder why not use constraints, we can try, say, let’s say we want to
let T
be a type that is either a list
, str
, or tuple
.
T = TypeVar('T', list, str, tuple)
This approach has limitations because we need to add and list every possible
type that we want to allow, or that has __len__
method. This is not scalable
and so bound is more applicable.
More formally, we can say something like, let \(S\) be the set of all types that
implement the Sized
protocol. For type variable \(T\) with bound Sized
:
\(\forall t \in T \implies t \in S\)
That is, any type \(t\) that can be bound to \(T\) must be in the set of
Sized
types.
Bounding and Semantic Clarity#
Bounding also offers more clarity and semantic meaning, than say, an Union
type.
class Animal:
...
class Dog(Animal):
...
class Cat(Animal):
...
class Car:
...
AnimalType = TypeVar("AnimalType", bound=Animal)
def function_with_bound(arg: AnimalType) -> AnimalType:
return arg
def function_with_union(arg: Union[Dog, Cat, Car]) -> Union[Dog, Cat, Car]:
return arg
In function_with_bound
, the argument arg must be an instance of Animal
or a
subclass of Animal
. This means you could pass in an instance of Dog
or
Cat
, but not Car
, because Car
is not a subclass of Animal
.
In function_with_union
, the argument arg can be an instance of Dog
, Cat
,
or Car
. There’s no requirement that these types are related in any way.
A Case Study On Addable
#
Let’s say we want to write a function that doubles any number. Without bound
:
from typing import TypeVar
T = TypeVar('T')
def double(x: T) -> T:
return x + x
If we run through mypy
, we get the below errors:
tmp.py:237: error: Returning Any from function declared to return "T" [no-any-return]
return x + x
^~~~~~~~~~~~
tmp.py:237: error: Unsupported left operand type for + ("T") [operator]
return x + x
^~~~~
Found 2 errors in 1 file (checked 1 source file)
Rightfully so, same logic, not all types support +
operation, like
double(x={"key": "value"})
will cause errors. To resolve this, we seek to find
a class like Sized
that has the __add__
method. But let’s say it’s hard to
find, we can define our own protocol.
from __future__ import annotations
from typing import TypeVar, Protocol
class Addable(Protocol):
def __add__(self, other: T) -> T: ...
T = TypeVar("T", bound=Addable)
def double(x: T) -> T:
return x + x
Bound versus Constraints#
Bounds are used to specify that a type variable must be a subtype of a particular type. This is akin to setting an upper limit (or in some contexts, a lower limit) on what the type variable can be. The purpose of bounds is to ensure that the type variable adheres to a hierarchical type constraint, typically ensuring that it inherits certain methods or properties.
Constraints, on the other hand, specify a list of explicit types that a type variable can represent, without implying any hierarchical relationship between them. The purpose of constraints is to allow a type variable to be more flexible by being one of several types, rather than restricting it to a subtype of a specific class or interface.
The comparison is pretty superficial but something to remember is that you can
mix types within arguments if you use bound, which behaves a little like
Union
, whereas in constraint, all arguments must be of the exact same type.
Let’s see an example:
AnimalType = TypeVar("AnimalType", bound=Animal)
def function_with_bound(arg1: AnimalType, arg2: AnimalType) -> Tuple[AnimalType, AnimalType]:
return arg1, arg2
cat = Cat()
tabby = Cat()
dog = Dog()
_, _ = function_with_bound(cat, dog)
This above code will not raise any issue when compared to the below code:
AnimalType = TypeVar("AnimalType", Cat, Dog)
def function_with_bound(arg1: AnimalType, arg2: AnimalType) -> Tuple[AnimalType, AnimalType]:
return arg1, arg2
cat = Cat()
tabby = Cat()
dog = Dog()
_, _ = function_with_bound(cat, dog)
This is because if we use constraint, our contract is that within the scope, all
arguments must be of type AnimalType
, whereas when in bound, arg1
and arg2
can be different, as long as both are upper bounded by Animal
.
References and Further Readings#
What’s the difference between a constrained TypeVar and a Union?
Difference between TypeVar(“T”, A, B) and TypeVar(“T”, bound=Union[A, B])