A Guide to Garbage Collection in Programming
When you design software, your code generates and uses objects, variables, and other data structures that need memory. But if you don’t manage that memory correctly, you could be in for a world of trouble. Memory leaks can cause your system to crash, deplete precious resources, and bring your performance to a screeching halt – leaving you feeling like you’re stuck in quicksand.
In this guide, we’ll explore how garbage collection works and some of the most common ways it’s implemented in popular programming languages.
What is Garbage Collection in Programming?
Garbage collection is a process of automatic memory management in programming. This task is handled by a special Java Virtual Machine (JVM) component called the garbage collector (GC), allowing developers to focus on other parts of their code.
The Mark & Sweep algorithm is the backbone of garbage collection. It has three main stages:
- Mark – The garbage collector looks over the entire program’s memory area to figure out which objects are being used and which aren’t. It does this by marking every object that is currently in use.
- Sweep – After the marking process is finished, the garbage collector removes any marked objects, showing that they are no longer being used.
- Compact – The objects that made it through the cleanup process are moved into one continuous memory block. This helps optimize memory usage.
Using the Mark & Sweep algorithm, Python garbage collection, for instance, helps ensure that memory is used efficiently and effectively, improving your software’s overall stability and reliability.
There are five types of GC implementations in the JVM:
- Serial Garbage Collector
- Parallel Garbage Collector
- CMS Garbage Collector
- G1 Garbage Collector
- Z Garbage Collector
Let’s take a look at each one.
The Serial GC employs a single thread to handle all garbage collection tasks, which makes it relatively efficient due to the absence of inter-thread communication overhead.
In Java garbage collection, for example, the garbage collector uses a single thread to identify all live objects in the heap and then moves them to one end of the heap. This process is done while halting all application threads, known as a stop-the-world pause. Once the garbage collection Java process is complete, the application threads resume their execution.
This GC execution type is best suited for machines with a single processor since it cannot take advantage of multiprocessor hardware. However, it can still be useful for applications with small data sets, even on multiprocessor machines.
The Serial GC is the default GC on certain hardware and operating system configurations, and it can be explicitly enabled using the option:
The Parallel Garbage Collector (GC) is similar to the Serial GC but uses multiple threads to speed up the garbage collection process. This GC implementation is called a generational collector since it works on certain generations of objects in the memory heap. To activate the Parallel GC in the JVM, you can use the command-line option:
By default, this option will run both minor and major collections in parallel, which helps reduce garbage collection overhead. The Parallel GC is best used on multi-processor systems, so you can use the extra processing power to achieve faster and more effective garbage collection.
The Concurrent Mark Sweep (CMS) Garbage Collection (GC) is an efficient process that runs concurrently with your application and utilizes multiple threads for both minor and major GC tasks.
As it does not compact live objects after deleting unused objects, your application experiences minimal pauses – making it a great choice for applications that require low pause times. However, running the CMS GC concurrently with an application may slow down the application’s response time.
It’s worth noting that this GC implementation was deprecated in Java 8u and completely removed in version 14u onwards. Nevertheless, if you’re still using an older version of the program that supports it, you can enable garbage collection in Java using CMS GC using the option:
In the case of the CMS GC, the application is paused twice. The first pause occurs during the initial mark phase, where the GC marks a live object that’s directly reachable. The second pause occurs at the end of the CMS GC phase to account for objects missed during the concurrent cycle when application threads updated the objects after CMS GC had completed. This is known as the remark phase.
The Garbage First (G1) GC was designed to replace CMS GC and provide a concurrent, parallel, and incrementally compacting garbage collector with low pauses. It has a different memory layout, dividing the heap into equal-sized regions to enable multiple threads to trigger a global mark phase. It then identifies the mostly empty region and marks it as sweep/delete first.
If an object is greater than half a region’s size, it classifies as a “humongous object” and is placed in the Old generation in a dedicated region called the humongous region. To enable G1 GC, use the option below in the command line:
$ java -XX:+UseG1GC
Overall, G1 GC is ideal for large heap applications with strict latency requirements, offering low pauses and high throughput, as it offers low pause times while still boasting high throughput.
ZGC (Z Garbage Collector) is a low-latency, scalable garbage collector that was introduced as an experimental option in Java 11 for Linux. It was later made available for Windows and macOS operating systems in JDK 14 and has been promoted to production status from Java 15 onwards.
One of the primary benefits of ZGC is that it performs all expensive operations concurrently, with pause times of no more than 10 ms. This makes it well-suited for applications that require low latency. ZGC uses load barriers with colored pointers to perform concurrent operations while threads are running, which helps keep track of heap usage.
The core concept of ZGC is reference coloring, which means that ZGC uses metadata bits in reference to mark the state of an object. It can handle heaps ranging from 8MB to 16TB in size, and pause times do not increase with the heap, live-set, or root-set size. Like G1, ZGC partitions the heap but with the added flexibility of using regions of different sizes.
To enable ZGC, the following argument can be used in JDK versions lower than 15:
java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC Application.java
From version 15 on, the experimental mode is not required:
java -XX:+UseZGC Application.java
It’s important to note that ZGC is not the default garbage collector.
If your program doesn’t necessitate rigid pause times, it’s recommended to take advantage of the default garbage collection programming that the JVM offers. In the majority of cases, this should work well for you. But, if you want to boost performance, you can modify the heap size accordingly. If you still need to attain the desired performance, you can customize the garbage collector depending on your application’s requirements.
C++ and C, for example, are more primitive languages, so they don’t come with a garbage collector like Python or other languages. Memory management in these languages has to be managed manually by the programmer, using new and delete operators to allocate and deallocate memory on the heap. Although this gives the programmer greater control, it also requires more attention to programming to avoid memory leaks or dangling pointers. Some third-party libraries and tools offer C++ garbage collection features, though they are not included in the regular language.
Overall, while many simple applications may not require you to think about garbage collection too much, it is essential for programmers who want to advance their Java skills to understand how garbage collection works. This knowledge can help you optimize your code and improve your software performance.