Great question, TechWizard!
You're right to look beyond just the CL number. DDR5 introduces several architectural changes that affect latency. While CL numbers themselves might seem lower or comparable, DDR5 has a different burst length and internal architecture. The increased bank groups and higher clock speeds mean that while a single cycle might be shorter, the total number of cycles to complete a data transfer can be more complex.
One major factor is the On-Die ECC, which adds a small overhead, and the increased number of internal banks. However, the increased bandwidth often compensates for this. Benchmarks can be tricky because they depend heavily on the application and how well it utilizes the increased bandwidth and the specific timings used. Games often show less improvement from latency changes than productivity applications that deal with large datasets.
It's also worth noting that early DDR5 kits sometimes had looser secondary and tertiary timings compared to mature DDR4 kits. As the technology matures, we're seeing tighter timings and better overall performance.