Global Interpreter Lock (GIL)#
Reference Counting in Python#
In Python, reference counting is a memory management technique used to keep track of how many references (or “pointers”) exist to an object in memory. When an object’s reference count drops to zero, Python automatically frees the memory allocated to that object.
import sys
a = []
b = a
a_ref_count = sys.getrefcount(a)
print(a_ref_count) # Output: 3
3
a = []
creates an empty list[]
and assigns it to variablea
.b = a
makesb
reference the same list asa
.sys.getrefcount(a)
returns the number of references to the lista
points to. The count is3
because:a
references the list.b
references the list.The
sys.getrefcount(a)
function temporarily creates another reference when it’s called.
As a beginner python programmer, we thank our lucky stars that Python manages memory for us. But, as a good student, we ask, why is managing the reference count important?
First, memory management helps Python automatically manage memory by
keeping track of objects and freeing memory when objects are no longer needed.
For example, if we delete the reference b
, the reference count of a
drops to
2.
del b
a_ref_count = sys.getrefcount(a)
print(a_ref_count) # Output: 2
2
Second, avoiding memory leaks ensures that memory is reused efficiently and prevents memory from being wasted on unused objects.
Threads and Race Conditions#
A race condition occurs when two or more threads access shared data simultaneously, and the final outcome depends on the sequence of execution.
For example, if two threads try to increment a counter from 5 to 6:
Thread A reads counter (value = 5)
Thread B reads counter (value = 5)
Thread A adds 1 (5 + 1 = 6)
Thread B adds 1 (5 + 1 = 6)
Thread A writes 6
Thread B writes 6
Even though two increment operations occurred, the counter only increased by 1 instead of 2. This happens because both threads read the original value before either could update it. The final result depends on which thread writes last, and the program “races” to an incorrect outcome.
Show me the code to illustrate the race condition!
"""With reference to effective python book chapter 54.
Ref: https://github.com/bslatkin/effectivepython/blob/master/example_code/item_54.py
"""
import logging
import threading
from threading import Barrier
from typing import List
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")
NUM_THREADS = 5
BARRIER = Barrier(NUM_THREADS)
class Counter:
def __init__(self) -> None:
self.count = 0
def increment(self, offset: int) -> None:
self.count += offset
def worker(thread_index: int, total_iterations: int, counter: Counter) -> None:
"""The barrier is used to synchronize the threads so that they all start counting
at the same time. This makes it easier to get a race condition since we wait for
the other threads to start else in the loop we always have an order that the
first thread likely starts first and then the second and so on.
"""
BARRIER.wait()
logging.debug("Thread %s, starting", thread_index)
for _ in range(total_iterations):
counter.increment(1)
def thread_unsafe(total_iterations: int) -> None:
counter = Counter()
threads: List[threading.Thread] = []
for index in range(NUM_THREADS):
thread = threading.Thread(target=worker, args=(index, total_iterations, counter))
threads.append(thread)
for thread in threads:
thread.start()
for thread in threads:
thread.join()
expected = total_iterations * NUM_THREADS
found = counter.count
logging.info("Counter should be %s, got %s", expected, found)
if __name__ == "__main__":
total_iterations = 10**6
thread_unsafe(total_iterations)
2024-10-29 10:22:29,521 - DEBUG - Thread 4, starting
2024-10-29 10:22:29,522 - DEBUG - Thread 0, starting
2024-10-29 10:22:29,522 - DEBUG - Thread 3, starting
2024-10-29 10:22:29,522 - DEBUG - Thread 2, starting
2024-10-29 10:22:29,523 - DEBUG - Thread 1, starting
2024-10-29 10:22:29,951 - INFO - Counter should be 5000000, got 4564365
Protecting Reference Counts with Locks#
So we circle back a bit to the reference count thingy earlier. Since we saw how race conditions can happen, the same thing can happen to the reference count of an object - which means that the reference count of an object can be modified by multiple threads. If two threads try to increment or decrement the reference count simultaneously without protection, it can lead to inconsistencies, such as the case where reference counts might reach zero incorrectly, freeing memory while it’s still in use, causing crashes or bugs.
How do we prevent this? We first understand the rough idea of a lock.
A lock is a synchronization mechanism used to control access to shared resources. When a thread acquires a lock, other threads must wait until the lock is released before accessing the resource.
In the first example without a lock, both threads might read the same
counter
value before incrementing it, causing some increments to be lost.In the second example, the lock ensures that only one thread can modify the counter at a time.
The
with lock:
statement automatically acquires the lock before entering the block and releases it when exiting.This guarantees that the final counter value will always be correct.
"""With reference to effective python book chapter 54.
Ref: https://github.com/bslatkin/effectivepython/blob/master/example_code/item_54.py
"""
import logging
import threading
from threading import Barrier
from typing import List
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")
NUM_THREADS = 5
BARRIER = Barrier(NUM_THREADS)
class CounterLock:
def __init__(self) -> None:
self.count = 0
self.lock = threading.Lock()
def increment(self, offset: int) -> None:
with self.lock:
self.count += offset
def worker(thread_index: int, total_iterations: int, counter: Counter) -> None:
"""The barrier is used to synchronize the threads so that they all start counting
at the same time. This makes it easier to get a race condition since we wait for
the other threads to start else in the loop we always have an order that the
first thread likely starts first and then the second and so on.
"""
BARRIER.wait()
logging.debug("Thread %s, starting", thread_index)
for _ in range(total_iterations):
counter.increment(1)
def thread_safe(total_iterations: int) -> None:
counter = CounterLock()
threads: List[threading.Thread] = []
for index in range(NUM_THREADS):
thread = threading.Thread(target=worker, args=(index, total_iterations, counter))
threads.append(thread)
for thread in threads:
thread.start()
for thread in threads:
thread.join()
expected = total_iterations * NUM_THREADS
found = counter.count
logging.info("Counter should be %s, got %s", expected, found)
if __name__ == "__main__":
total_iterations = 10**6
thread_safe(total_iterations)
2024-10-29 10:22:29,959 - DEBUG - Thread 4, starting
2024-10-29 10:22:29,959 - DEBUG - Thread 0, starting
2024-10-29 10:22:29,959 - DEBUG - Thread 3, starting
2024-10-29 10:22:29,960 - DEBUG - Thread 2, starting
2024-10-29 10:22:29,960 - DEBUG - Thread 1, starting
2024-10-29 10:22:30,821 - INFO - Counter should be 5000000, got 5000000
Deadlocks#
A deadlock occurs when two or more threads are waiting indefinitely for locks held by each other, preventing them from making progress.
import threading
import time
class BankAccount:
def __init__(self, id: int, balance: float) -> None:
self.id: int = id
self.balance: float = balance
self.lock: threading.Lock = threading.Lock()
def __str__(self) -> str:
return f"Account {self.id}: ${self.balance}"
def unsafe_transfer(from_account: BankAccount, to_account: BankAccount, amount: float) -> None:
with from_account.lock:
print(f"Locked account {from_account.id}")
time.sleep(0.5) # Simulate some work and make deadlock more likely
with to_account.lock:
print(f"Locked account {to_account.id}")
if from_account.balance >= amount:
from_account.balance -= amount
to_account.balance += amount
print(f"Transferred ${amount} from Account {from_account.id} to Account {to_account.id}")
def safe_transfer(from_account: BankAccount, to_account: BankAccount, amount: float) -> None:
first: BankAccount = from_account if from_account.id < to_account.id else to_account
second: BankAccount = to_account if from_account.id < to_account.id else from_account
with first.lock:
print(f"Locked account {first.id}")
time.sleep(0.5) # Simulate some work
with second.lock:
print(f"Locked account {second.id}")
if from_account.balance >= amount:
from_account.balance -= amount
to_account.balance += amount
print(f"Transferred ${amount} from Account {from_account.id} to Account {to_account.id}")
account1: BankAccount = BankAccount(1, 1000)
account2: BankAccount = BankAccount(2, 1000)
print("Initial balances:")
print(account1)
print(account2)
print("\nTrying unsafe transfers (will likely deadlock):")
# Create threads with unsafe transfers (will deadlock)
thread1: threading.Thread = threading.Thread(target=unsafe_transfer, args=(account1, account2, 500))
thread2: threading.Thread = threading.Thread(target=unsafe_transfer, args=(account2, account1, 300))
thread1.start()
thread2.start()
thread1.join()
thread2.join()
thread1
acquires the lock onaccount1
and waits to acquire the lock onaccount2
.thread2
acquires the lock onaccount2
and waits to acquire the lock onaccount1
.Both threads are waiting for each other to release the locks, causing a deadlock.
Why do we need to know this? Because as we saw earlier, locks are useful for preventing race conditions. But using locks not so carefully can lead to deadlocks. GIL will aim to solve this as well.
The Global Interpreter Lock (GIL) in Python#
The Global Interpreter Lock (GIL) is a mutex (lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecodes simultaneously in the same process.
To solve deadlocks, gil only has one lock for the entire process - thus the scenario where two threads are waiting for each other to release a lock is avoided. In more intuition, the gil allows only one thread to execute bytecode at a time.
With locks in place, race conditions are mostly resolved.
With gil, it has its own trade-offs:
Limits Multi-threaded Performance: Because the GIL allows only one thread to execute Python code at a time, CPU-bound multi-threaded programs don’t benefit from multiple cores. They run almost as if they were single-threaded.
Inefficiency in Multi-core Systems: In CPU-bound tasks, the GIL can become a bottleneck, preventing Python programs from fully utilizing multi-core processors.
Let’s see an example of how gil can limit multi-threaded performance.
CPU-bound tasks#
We first compare the performance of single-threaded vs multi-threaded and note for cpu-bound tasks, the time taken is almost the same.
import threading
import time
import requests
def cpu_bound_task():
"""CPU intensive task - calculating sum of numbers"""
count = 0
for _ in range(20_000_000):
count += 1
return count
def run_tasks_single_thread(task, num_iterations):
start_time = time.time()
for _ in range(num_iterations):
task()
end_time = time.time()
return end_time - start_time
def run_tasks_multi_thread(task, num_threads):
start_time = time.time()
threads: List[threading.Thread] = []
for _ in range(num_threads):
t = threading.Thread(target=task)
threads.append(t)
t.start()
for t in threads:
t.join()
end_time = time.time()
return end_time - start_time
# NOTE: CPU-bound tasks
num_tasks = 4
print("CPU-bound task comparison:")
single_thread_cpu = run_tasks_single_thread(cpu_bound_task, num_tasks)
print(f"Single-threaded time: {single_thread_cpu:.2f} seconds")
multi_thread_cpu = run_tasks_multi_thread(cpu_bound_task, num_tasks)
print(f"Multi-threaded time: {multi_thread_cpu:.2f} seconds")
print(f"Speed difference: {single_thread_cpu/multi_thread_cpu:.2f}x\n")
CPU-bound task comparison:
Single-threaded time: 1.71 seconds
Multi-threaded time: 1.62 seconds
Speed difference: 1.06x
Next, we compare the performance of single-threaded vs multi-threaded vs multi-process. We note that multi-process is the fastest and this is expected because there is no GIL in multi-process.
import multiprocessing
def run_tasks_multi_process(task, num_processes):
"""Run tasks using multiple processes"""
start_time = time.time()
processes: List[multiprocessing.Process] = []
for _ in range(num_processes):
p = multiprocessing.Process(target=task)
processes.append(p)
p.start()
for p in processes:
p.join()
end_time = time.time()
return end_time - start_time
# Comparison of all three approaches
num_tasks = 4
print("CPU-bound task comparison:")
single_thread_cpu = run_tasks_single_thread(cpu_bound_task, num_tasks)
print(f"Single-threaded time: {single_thread_cpu:.2f} seconds")
multi_thread_cpu = run_tasks_multi_thread(cpu_bound_task, num_tasks)
print(f"Multi-threaded time: {multi_thread_cpu:.2f} seconds")
multi_process_cpu = run_tasks_multi_process(cpu_bound_task, num_tasks)
print(f"Multi-process time: {multi_process_cpu:.2f} seconds")
print(f"\nSpeed comparison (relative to single-thread):")
print(f"Threading speedup: {single_thread_cpu/multi_thread_cpu:.2f}x")
print(f"Multiprocessing speedup: {single_thread_cpu/multi_process_cpu:.2f}x\n")
CPU-bound task comparison:
Single-threaded time: 1.73 seconds
Multi-threaded time: 1.58 seconds
Multi-process time: 0.09 seconds
Speed comparison (relative to single-thread):
Threading speedup: 1.10x
Multiprocessing speedup: 18.59x
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/homebrew/Caskroom/miniconda/base/envs/omniverse/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/homebrew/Caskroom/miniconda/base/envs/omniverse/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/homebrew/Caskroom/miniconda/base/envs/omniverse/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/opt/homebrew/Caskroom/miniconda/base/envs/omniverse/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
exitcode = _main(fd, parent_sentinel)exitcode = _main(fd, parent_sentinel)
File "/opt/homebrew/Caskroom/miniconda/base/envs/omniverse/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
File "/opt/homebrew/Caskroom/miniconda/base/envs/omniverse/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'cpu_bound_task' on <module '__main__' (built-in)>
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'cpu_bound_task' on <module '__main__' (built-in)>
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'cpu_bound_task' on <module '__main__' (built-in)>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/homebrew/Caskroom/miniconda/base/envs/omniverse/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/opt/homebrew/Caskroom/miniconda/base/envs/omniverse/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'cpu_bound_task' on <module '__main__' (built-in)>
GIL Is Released During IO-bound Tasks#
As mentioned, the GIL is released during IO-bound tasks. As to why, please see this post.
Let’s see an example of this.
def io_bound_task():
"""IO intensive task - making HTTP requests"""
url = "https://api.github.com"
response = requests.get(url)
return response.status_code
# NOTE: IO-bound tasks
print("IO-bound task comparison:")
single_thread_io = run_tasks_single_thread(io_bound_task, num_tasks)
print(f"Single-threaded time: {single_thread_io:.2f} seconds")
multi_thread_io = run_tasks_multi_thread(io_bound_task, num_tasks)
print(f"Multi-threaded time: {multi_thread_io:.2f} seconds")
print(f"Speed difference: {single_thread_io/multi_thread_io:.2f}x")
2024-10-29 20:14:18,709 - DEBUG - Starting new HTTPS connection (1): api.github.com:443
IO-bound task comparison:
2024-10-29 20:14:19,014 - DEBUG - https://api.github.com:443 "GET / HTTP/1.1" 200 510
2024-10-29 20:14:19,021 - DEBUG - Starting new HTTPS connection (1): api.github.com:443
2024-10-29 20:14:19,072 - DEBUG - https://api.github.com:443 "GET / HTTP/1.1" 200 510
2024-10-29 20:14:19,075 - DEBUG - Starting new HTTPS connection (1): api.github.com:443
2024-10-29 20:14:19,122 - DEBUG - https://api.github.com:443 "GET / HTTP/1.1" 200 510
2024-10-29 20:14:19,126 - DEBUG - Starting new HTTPS connection (1): api.github.com:443
2024-10-29 20:14:19,167 - DEBUG - https://api.github.com:443 "GET / HTTP/1.1" 200 510
2024-10-29 20:14:19,174 - DEBUG - Starting new HTTPS connection (1): api.github.com:443
2024-10-29 20:14:19,174 - DEBUG - Starting new HTTPS connection (1): api.github.com:443
2024-10-29 20:14:19,174 - DEBUG - Starting new HTTPS connection (1): api.github.com:443
2024-10-29 20:14:19,174 - DEBUG - Starting new HTTPS connection (1): api.github.com:443
2024-10-29 20:14:19,256 - DEBUG - https://api.github.com:443 "GET / HTTP/1.1" 200 510
2024-10-29 20:14:19,259 - DEBUG - https://api.github.com:443 "GET / HTTP/1.1" 200 510
2024-10-29 20:14:19,261 - DEBUG - https://api.github.com:443 "GET / HTTP/1.1" 200 510
2024-10-29 20:14:19,261 - DEBUG - https://api.github.com:443 "GET / HTTP/1.1" 200 510
Single-threaded time: 0.50 seconds
Multi-threaded time: 0.09 seconds
Speed difference: 5.24x
Indeed, the multi-threaded version is faster than the single-threaded version.