Have you ever wondered what’s happening insideÂ
your computer when you load a program or video Â
game? Well, millions of operations are happening,Â
but perhaps the most common is simply just copying Â
data from a solid-state drive or SSD into dynamicÂ
random-access memory or DRAM.  An SSD stores all Â
the programs and data for long-term storage,Â
but when your computer wants to use that data, Â
it has to first move the appropriateÂ
files into DRAM, which takes time, Â
hence the loading bar. Because your CPU worksÂ
only with data after it’s been moved to DRAM, Â
it’s also called working memory or main memory.
The reason why your desktop uses both SSDs and Â
DRAM is because Solid-State Drives permanentlyÂ
store data in massive 3D arrays composed of a Â
trillion or so memory cells, yielding terabytes ofÂ
storage, whereas DRAM temporarily stores data in Â
2D arrays composed of billions of tiny capacitorÂ
memory cells yielding gigabytes of working memory. Â
Accessing any section of cells in the massiveÂ
SSD array and reading or writing data takes Â
about 50 microseconds whereas reading orÂ
writing from any DRAM capacitor memory Â
cell takes about 17 nanoseconds, which is 3000Â
times faster. For comparison, a supersonic jet Â
going at Mach 3 is around 3000 times fasterÂ
than a moving tortoise. So, the speed of Â
17 nanosecond DRAM versus 50 microsecond SSD isÂ
like comparing a supersonic jet to a tortoise. Â
  However, speed is just one factor. DRAM isÂ
limited to a 2D array and temporarily stores Â
one bit per memory cell. For example, this stickÂ
of DRAM with 8 chips holds 16 gigabytes of data, Â
whereas a solid-state drive of a smallerÂ
size can hold 2 terabytes of data, more Â
than 100 times that of DRAM. Additionally,Â
DRAM requires power to continuously store Â
and refresh the data held in its capacitors. Â
Therefore, computers use both SSDs and DRAM and, Â
by spending a few seconds of loading timeÂ
to copy data from the SSD to the DRAM, Â
and then prefetching, which is the process ofÂ
moving data before it’s needed, your computer can Â
store terabytes of data on the SSD and then accessÂ
the data from programs that were preemptively Â
copied into the DRAM in a few nanoseconds.Â
For example, many video games have a loading Â
time to start up the game itself, and then aÂ
separate loading time to load a save file.  Â
During the process of loading a save file, allÂ
the 3D models, textures, and the environment of Â
your game state are moved from the SSD into DRAMÂ
so any of it can be accessed in a few nanoseconds, Â
which is why video games have DRAM capacityÂ
requirements. Just imagine, without DRAM, Â
playing a game would be 3,000 times slower. Â
We covered solid-state drives in other videos, Â
so in this video, we’re going to take a deepÂ
dive into this 16-gigabyte stick of DRAM. First, Â
we’ll see exactly how the CPU communicatesÂ
and moves data from an SSD to DRAM. Then Â
we’ll open up a DRAM microchip and see howÂ
billions of memory cells are organized into Â
banks and how data is written to and read fromÂ
groups of memory cells. In the process, we’ll Â
dive into the nanoscopic structures insideÂ
individual memory cells and see how each Â
capacitor physically stores 1 bit of data. Â
Finally, we’ll explore some breakthroughs and Â
optimizations such as the burst buffer and foldedÂ
DRAM layouts that enable DRAM to move data around Â
at incredible speeds. A few quick notes. Â
First, you can find similar DRAM chips inside Â
GPUs, Smartphones, and many other devices, butÂ
with different optimizations. As examples, Â
GPU DRAM or VRAM, located all around theÂ
GPU chip, has a larger bandwidth and can Â
read and write simultaneously, but operates atÂ
a lower frequency, and DRAM in your smartphone Â
is stacked on top of the CPU and is optimized forÂ
smaller packaging and lower power consumption.  Â
Second, this video is sponsored byÂ
Crucial. Although they gave me this Â
stick of DRAM to model and use in theÂ
video, the content was independently Â
researched and not influenced by them. Â
Third, there are faster memory structures Â
in your CPU called cache memory and even fasterÂ
registers. All these types of memory create a Â
memory hierarchy, with the main trade-offÂ
being speed versus capacity while keeping Â
prices affordable to consumers and optimizingÂ
the size of each microchip for manufacturing.Â
Fourth, you can see how much ofÂ
your DRAM is being utilized by Â
each program by opening your computer’sÂ
resource monitor and clicking on memory.Â
Fifth, there are different generations of DRAM,Â
and we’ll explore DDR5. Many of the key concepts Â
that we explain apply to prior generations,Â
although the numbers may be different.  Â
Sixth, 17 nanoseconds is incredibly fast! Â
Electricity travels at around 1 foot per Â
nanosecond, and 17 nanoseconds is about theÂ
time it takes for light to travel across a room.Â
Finally, this video is rather long as it coversÂ
a lot of what there is to know around DRAM. We Â
recommend watching it first at one point twoÂ
five times speed, and then a second time at Â
one and a half speed to fully comprehend thisÂ
complex technology. Stick around because this Â
is going to be an incredibly detailed video. Â
To start, a stick of DRAM is also called a Dual Â
Inline Memory Module or DIMM and there are 8Â
DRAM chips on this particular DIMM. On the Â
motherboard, there are 4 DRAM slots, and whenÂ
plugged in, the DRAM is directly connected to Â
the CPU via 2 memory channels that run throughÂ
the motherboard. Note that the left two DRAM Â
slots share these memory channels, and the rightÂ
two share a separate channel. Let’s move to Â
look inside the CPU at the processor. AlongÂ
with numerous cores and many other elements, Â
we find the memory controller which managesÂ
and communicates with the DRAM. There’s also Â
a separate section for communicating with SSDsÂ
plugged into the M2 slots and with SSDs and Â
hard drives plugged into SATA connectors. UsingÂ
these sections, along with data mapping tables, Â
the CPU manages the flow of data fromÂ
the SSD to DRAM, as well as from DRAMÂ Â
to cache memory for processing by the cores.
Let’s move back to see the memory channels.  Â
For DDR5 each memory channel is divided into twoÂ
parts, Channel A and Channel B. These two memory Â
channels A and B independently transfer 32 bits atÂ
a time using 32 data wires.  Using 21 additional Â
wires each memory channel carries an addressÂ
specifying where to read or write data and, using Â
7 control signal wires, commands are relayed.
The addresses and commands are sent to and shared Â
by all 4 chips on the memory channel whichÂ
work in parallel. However, the 32-bit data Â
lines are divided among the chips and thus eachÂ
chip only reads or writes 8 bits at a time.  Â
Additionally, power for DRAM isÂ
supplied by the motherboard and Â
managed by these chips on the stick itself.
Next, let’s open and look inside one of these Â
DRAM microchips. Inside the exterior packaging,Â
we find an interconnection matrix that connects Â
the ball grid array at the bottom with the dieÂ
which is the main part of this microchip. This 2 Â
gigabyte DRAM die is organized into 8 bank groupsÂ
composed of 4 banks each, totaling 32 banks.  Â
Within each bank is a massive array, 65,536 memoryÂ
cells tall by 8192 cells across, essentially rows Â
and columns in a grid, with tens of thousands ofÂ
wires, and supporting circuitry running outside Â
each bank. Instead of looking at this die, we’reÂ
going to transition to a functional diagram, Â
and then reorganize the banks and bank groups.
In order to access 17 billion memory cells, Â
we need a 31-bit address. 3 bits are used toÂ
select the appropriate bank group, then 2 bits Â
to select the bank. Next 16 bits of the addressÂ
are used to determine the exact row out of 65Â Â
thousand. Because this chip reads or writes 8Â
bits at a time, the 8192 columns are grouped by Â
8 memory cells, all read or written at a time,Â
or ‘by 8’, and thus only 10 bits are needed for Â
the column address. One optimization is thatÂ
this 31-bit address is separated into two parts Â
and sent using only 21 wires. First, the bankÂ
group, bank, and row address are sent, and then Â
after that the column address. Next, we’ll lookÂ
inside these physical memory cells, but first, Â
let’s briefly talk about how these structures areÂ
manufactured as well as this video’s sponsor.   Â
This incredibly complicated die,Â
also called an integrated circuit, Â
is manufactured on 300-millimeter silicon wafers,Â
2500ish dies at a time. On each die are billions Â
of nanoscopic memory cells that are fabricatedÂ
using dozens of tools and hundreds of steps in Â
a semiconductor fabrication plant or fab. ThisÂ
one was made by Micron which manufactures around Â
a quarter of the world’s DRAM, including bothÂ
Nvidia’s and AMD’s VRAM in their GPUs Micron also Â
has its own product line of DRAM and SSDs underÂ
the brand Crucial which, as mentioned earlier, Â
is the sponsor of this video. In additionÂ
to DRAM, Micron is one of the world’s leading Â
suppliers of solid-state drives such as thisÂ
Crucial P5+ M2 NVME SSD.  By installing your Â
operating system and video games on a CrucialÂ
NVMe solid-state drive, you’ll be sure to have Â
incredibly fast loading times and smooth gameplay,Â
and if you do video editing, make sure all those Â
files are on a fast SSD like this one as well. Â
This is because the main speed bottleneck for Â
loading is predominantly limited by the speed ofÂ
the SSD or hard drive where the files are stored.Â
For example, this hard drive can only transferÂ
data at around 150 megabytes a second whereas Â
this Crucial NVMe SSD can transfer data at aÂ
rate of up to 6,600 megabytes a second, which, Â
for comparison is the speed of a moving tortoiseÂ
versus a galloping horse. By using a Crucial NVMe Â
SSD, loading a video game that requires gigabytesÂ
of DRAM is reduced from a minute or more down to Â
a couple seconds. Check out the Crucial NVMeÂ
SSDs using the link in the description below.Â
Let’s get back to the details of how DRAM worksÂ
and zoom in to explore a single memory cell Â
situated in a massive array. This memory cell isÂ
called a 1T1C cell and is a few dozen nanometers Â
in size. It has two parts, a capacitor to storeÂ
one bit of data in the form of electrical charges Â
or electrons and a transistor to access and readÂ
or write data. The capacitor is shaped like a Â
deep trench dug into silicon and is composed ofÂ
two conductive surfaces separated by a dielectric Â
insulator or barrier just a few atoms thick, whichÂ
stops the flow of electrons but allows electric Â
fields to pass through. If this capacitorÂ
is charged up with electrons to 1 volt, Â
it’s a binary 1, and if no charges are presentÂ
and it’s at 0 volts, it’s a binary 0, and thus Â
this cell only holds one bit of data. DesignsÂ
of capacitors are constantly evolving but in Â
this trench capacitor, the depth of the silicon isÂ
utilized to allow for larger capacitive storage, Â
while taking up as little area as possible.
Next let’s look at the access transistor and Â
add in two wires. The wordline wire connects toÂ
the gate of the transistor while the bitline wire Â
connects to the other side of the transistor’sÂ
channel. Applying a voltage to the wordline Â
turns on the transistor, and, while it’s on,Â
electrons can flow through the channel thus Â
connecting the capacitor to the bitline. ThisÂ
allows us to access and charge up the capacitor Â
to write a 1 or discharge the capacitor to writeÂ
a 0. Additionally, we can read the stored value Â
in the capacitor by measuring the amount ofÂ
charge. However, when the wordline is off, Â
the transistor is turned off, and the capacitorÂ
is isolated from the bitline thus saving the Â
data or charge that was previously written. NoteÂ
that because this transistor is incredibly small, Â
only a few dozen nanometers wide, electrons slowlyÂ
leak across the channel, and thus over time the Â
capacitor needs to be refreshed to rechargeÂ
the leaked electrons. We’ll cover exactly how Â
refreshing memory cells works a little later.
As mentioned earlier, this 1T1C memory cell is Â
one of 17 billion inside this single die and isÂ
organized into massive arrays called banks. So, Â
let’s build a small array for illustrativeÂ
purposes. In our array, each of the wordlines Â
is connected in rows, and then the bitlines areÂ
connected in columns. Wordlines and bitlines Â
are on different vertical layers so one canÂ
cross over the other, and they never touch. Â
Let’s simplify the visual and use symbols for theÂ
capacitors and the transistors. Just as before, Â
the wordlines connect to each transistor’s controlÂ
gate in rows, and then all the bitlines in columns Â
connect to the channel opposite each capacitor.Â
As a result, when a wordline is active, Â
all the capacitors in only that row areÂ
connected to their corresponding bitlines, Â
thereby activating all the memory cells in thatÂ
row. At any given time only one wordline is Â
active because, if more than one wordline wereÂ
active, then multiple capacitors in a column Â
would be connected to the bitline and the dataÂ
storage functionalities of these capacitors would Â
interfere with one another, making them useless. Â
As mentioned earlier, within a single bank there Â
are 65,536 rows and 8,192 columns and the 31-bitÂ
address is used to activate a group of just 8Â Â
memory cells. The first 5 bits select the bank,Â
and the next 16-bits are sent to a row decoder Â
to activate a single row. For example, thisÂ
binary number turns on the wordline row 27,524, Â
thus turning on all transistors in that row andÂ
connecting the 8,192 capacitors to their bitlines, Â
while at the same time the other 65Â
thousandish wordlines are all off.  Â
Here’s the logic diagram for a simple decoder.
The remaining 10 bits of the address are sent Â
to the column multiplexer. This multiplexerÂ
takes in the 8192 bitlines on the top, and, Â
depending on the 10-bit address, connects aÂ
specific group of 8 bitlines to the 8 input Â
and output IO wires at the bottom. For example,Â
if the 10-bit address we this, then only the Â
bitlines 4,784 through 4,791 would be connectedÂ
to the IO wires, and the rest of the 8000ish Â
bitlines would be connected to nothing. Here’sÂ
the logic diagram for a simple multiplexer.  Â
We now have the means of accessing anyÂ
memory cell in this massive array; however, Â
to understand the three basic operations,Â
reading, writing, and refreshing let’s add Â
two elements to our layout:Â A sense amplifierÂ
at the bottom of each bitline, and a read and Â
write driver outside of the column multiplexer.
Let’s look at reading from a group of memory Â
cells. First the read command and 31-bit addressÂ
are sent from the CPU to the DRAM. The first 5 Â
bits select a specific bank. The next step isÂ
to turn off all the wordlines in that bank, Â
thereby isolating all the capacitors, and thenÂ
precharge all 8000ish bitlines to .5 volts. Next Â
the 16-bit row address turns on a row, and allÂ
the capacitors in that row are connected to their Â
bitlines. If an individual capacitor holds a 1Â
and is charged to 1 volt, then some charge flows Â
from the capacitor onto the .5-volt bitline, andÂ
the voltage on the bitline increases. The sense Â
amplifier then detects this slight changeÂ
or perturbation of voltage on the bitline, Â
amplifies the change, and pushes the voltage onÂ
the bitline all the way up to 1 volt. However, Â
if a 0 is stored in the capacitor, chargeÂ
flows from the bitline into the capacitor, Â
and the .5-volt bitline decreases in voltage. Â
The sense amplifier then sees this change, Â
amplifies it and drives the bitline voltage downÂ
to 0 volts or ground. The sense amplifier is Â
necessary because the capacitor is so small,Â
and the bitline is rather long, and thus the Â
capacitor needs to have an additional componentÂ
to sense and amplify whatever value is stored.  Â
Now, all 8000ish bitlines are driven to 1Â
volt or 0 volts corresponding to the stored Â
charge in the capacitors of the activatedÂ
row, and this row is now considered open.  Â
Next, the column select multiplexer usesÂ
the 10-bit column address to connect the Â
corresponding 8 bitlines to the readÂ
driver which then sends these 8 values Â
and voltages over the 8 data wires to the CPU.Â
Writing data to these memory cells is similar Â
to reading, however with a few key differences.
First the write command, address, and 8 bits to Â
be written are sent to the DRAM chip. Next, justÂ
like before the bank is selected, the capacitors Â
are isolated, and the bitlines are prechargedÂ
to .5 volts. Then, using a 16-bit address, Â
a single row is activated, the capacitors perturbÂ
the bitline, and the sense amplifiers sense this Â
and drive the bitlines to a 1 or 0 thus openingÂ
the row. Next the column address goes to the Â
multiplexer, but, this time, because a writeÂ
command was sent, the multiplexer connects the Â
specific 8 bitlines to the write driver whichÂ
contains the 8 bits that the CPU had sent along Â
the data wires and requested to write. TheseÂ
write drivers are much stronger than the sense Â
amplifier and thus they override whatever voltageÂ
was previously on the bitline, and drive each of Â
the 8 bitlines to 1 volt for a 1 to be written,Â
or 0 volts for a 0. This new bitline voltage Â
overrides the previously stored charges or valuesÂ
in each of the 8 capacitors in the open row, Â
thereby writing 8 bits of data to the memoryÂ
cells corresponding to the 31-bit address.Â
Three quick notes. First, as a reminder, writingÂ
and reading happens concurrently with all the 4Â Â
chips in the shared memory channel, usingÂ
the same 31-bit address and command wires, Â
but with different data wires for each chip. Â
Second, with DDR5 for a binary 1 the voltage Â
is actually 1.1 volts, for DDR4 it’s 1.2 volts,Â
and prior generations had even higher voltages, Â
with the bitline precharge voltages beingÂ
half of these voltages. However, for DDR5, Â
when writing or refreshing a higher voltage,Â
around 1.4 volts is applied and stored in each Â
capacitor for a binary 1 because charge leaksÂ
out over time. However, for simplicity, we’re Â
going to stick with 1 and 0. Third, the numberÂ
of bank groups, banks, bitlines and wordlines Â
varies widely between different generationsÂ
and capacities but is always in powers of 2.Â
Let’s move on and discuss the third operationÂ
which is refreshing the memory cells in a bank.  Â
As mentioned earlier, the transistors used toÂ
isolate the capacitors are incredibly small, Â
and thus charges leak across the channel. TheÂ
refresh operation is rather simple and is a Â
sequence of closing all the rows, prechargingÂ
the bitlines to .5 volts, and opening a row.  Â
To refresh, just as before, the capacitors perturbÂ
the bitlines and then the sense amplifiers drive Â
the bitlines and capacitors of the open row fullyÂ
up to 1 volt or down to 0 volts depending on the Â
stored value of the capacitor, thereby refillingÂ
the leaked charge. This process of row closing, Â
precharging, opening, and sense amplifying happensÂ
row after row, taking 50 nanoseconds for each row, Â
until all 65 thousandish rows are refreshedÂ
taking a total of 3 milliseconds or so to Â
complete. The refresh operation occursÂ
once every 64 milliseconds for each bank, Â
because that’s statistically below theÂ
worst-case time it takes for a memory Â
cell to leak too much charge to make a stored 1Â
turn into a 0, thus resulting in a loss of data.Â
Let’s take a step back and consider theÂ
incredible amount of data that is moved Â
through DRAM memory cells. These banks of memoryÂ
cells handle up to 4 thousand 8 hundred million Â
requests to read and write data every secondÂ
while refreshing every memory cell in each Â
bank row by row around 16 times a second.Â
That’s a staggering amount of data movement Â
and illustrates the true strength of computers.Â
Yes, they do simple things like comparisons, Â
arithmetic, and moving data around, butÂ
at a rate of billions of times a second. Â
Now, you might wonder why computersÂ
need to do so much data movement. Well, Â
take this video game for example. You have obviousÂ
calculations like the movement of your character Â
and the horse. But then there are individualÂ
grasses, trees, rocks, and animals whose Â
positions and geometries are stored in DRAM.Â
And then the environment such as the lighting Â
and shadows change the colors and textures of theÂ
environment in order to create a realistic world.Â
Next, we’re going to explore breakthroughs andÂ
optimizations that allow DRAM to be incredibly Â
fast. But, before we get into all thoseÂ
details, we would greatly appreciate it Â
if you could take a second to hit that likeÂ
button, subscribe if you haven’t already, Â
and type up a quick comment below, as it helps getÂ
this video out to others. Also, we have a Patreon Â
and would appreciate any support. This is ourÂ
longest and most detailed video by far, and we’re Â
planning more videos that get into the innerÂ
details of how computers work. We can’t do it Â
without your help, so thank you for watching andÂ
doing these three quick things. It helps a ton.Â
The first complex topic which we’ll exploreÂ
is why there are 32 banks, as well as what the Â
parameters on the packaging of DRAM are. Â
After that, we’ll explore burst buffers, Â
sub-arrays, and folded DRAM architectureÂ
and what’s inside the sense amplifier.Â
Let’s take a look at the banks. AsÂ
mentioned earlier opening a single Â
row within a bank requires all theseÂ
steps and this process takes time.
However, if a row were already open, weÂ
could read or write to any section of Â
8 memory cells using only the 10-bitÂ
column address and the column select Â
multiplexer. Â When the CPU sends a read orÂ
write command to a row that’s already open, Â
it’s called a row hit or page hit, and thisÂ
can happen over and over. With a row hit, Â
we skip all the steps required to open a row, andÂ
just use the 10-bit column address to multiplex a Â
different set of 8 columns or bitlines, connectingÂ
them to the read or write driver, thereby saving Â
a considerable amount of time. A row miss isÂ
when the next address is for a different row, Â
which requires the DRAM to close and isolate theÂ
currently open row, and then open the new row. Â
On a package of DRAM there are typically 4 numbersÂ
specifying timing parameters regarding row hits, Â
precharging, and row misses. The first numberÂ
refers to the time it takes between sending an Â
address with a row open, thus a row hit, toÂ
receiving the data stored in those columns.  Â
The next number is the time it takes to openÂ
a row if all the lines are isolated and the Â
bitlines are precharged. Then the next numberÂ
is the time it takes to precharge the bitlines Â
before opening a row, and the last number isÂ
the time it takes between a row activation and Â
the following precharge. Note that theseÂ
numbers are measured in clock cycles.  Â
Row hits are also the reason why the address isÂ
sent in two sections, first the bank selection and Â
row address called RAS and then the column addressÂ
called CAS. If the first part, the bank selection Â
and row address, matches a currently open row,Â
then it’s a row hit, and all the DRAM needs is the Â
column address and the new command, and then theÂ
multiplexer simply moves around the open row.  Â
Because of the time saving in accessing anÂ
open row, the CPU memory controller, programs, Â
and compilers are optimized for increasing theÂ
number of subsequent row hits. The opposite, Â
called thrashing, is when a program jumps aroundÂ
from one row to a different row over and over, Â
and is obviously incredibly inefficientÂ
both in terms of energy and time.  Â
Additionally, DDR5 DRAM has 32 banks forÂ
this reason. Each bank’s rows, columns, Â
sense amplifiers and row decoders operateÂ
independently of one another, and thus multiple Â
rows from different banks can be open all at theÂ
same time, increasing the likelihood of a row hit, Â
and reducing the average time it takes for the CPUÂ
to access data. Furthermore, by having multiple Â
bank groups, the CPU can refresh one bank in eachÂ
bank group at a time while using the other three, Â
thus reducing the impact of refreshing.Â
A question you may have had earlier is why Â
are banks significantly taller than they areÂ
wide? Well, by combining all the banks together Â
one next to the other you can think of this chipÂ
as actually being 65 thousand rows tall by 262Â Â
thousand columns wide. And, by adding 31 equallyÂ
spaced divisions between the columns, thus Â
creating banks, we allow for much more flexibilityÂ
and efficiency in reading, writing and refreshing.Â
Also, note that on the DRAM packaging areÂ
its capacity in Gigabytes, the number of Â
millions of data transfers per second, whichÂ
is two times the clock frequency, and the peak Â
data transfer rate in Megabytes per second.
The next design optimization we’ll explore Â
is the burst buffer and burst length. Let’s add aÂ
128-bit read and write temporary storage location, Â
called a burst buffer to our functional diagram. Â
Instead of 8 wires coming out of the multiplexer, Â
we’re going to have 128 wires that connectÂ
to these 128-bit buffer locations. Next Â
the 10-bit column address is broken into twoÂ
parts, 6 bits are used for the multiplexer, Â
and 4 bits are for the burst buffer.Â
Let’s explore a reading command. With Â
our burst buffer in place, 128 memory cells andÂ
bitlines are connected to the burst buffer using Â
the 6 column bits, thereby temporarily loading,Â
or caching 128 values into the burst buffer.  Â
Using the 4 bits for the buffer, 8 quicklyÂ
accessed data locations in the burst buffer Â
are connected to the read drivers and the data isÂ
sent to the CPU. By cycling through these 4 bits, Â
all 16 sets of 8 bits are read out, and thus theÂ
burst length is 16. After that a new set of 128 Â
bitlines and values are connected and loadedÂ
into the burst buffer. There’s also a write Â
burst buffer which operates in a similar way.
The benefit of this design is that 16 sets of Â
8 bits per microchip, totaling 1024 bits, can beÂ
accessed and read or written extremely quickly, Â
as long as the data is all next to oneÂ
another, but at the same time we still Â
have the granularity and ability to access anyÂ
set of 8 bits if our data requests jump around.Â
The next design optimization is that this bankÂ
of 65536 rows by 8192 columns is rather massive, Â
and results in extremely long wordlines andÂ
bitlines, especially when compared to the size of Â
each trench capacitor memory cell. Therefore,Â
the massive array is broken up into smaller Â
blocks 1,024 by 1,024, with intermediateÂ
sense amplifiers below each subarray, Â
and subdividing wordlines and using a hierarchicalÂ
row decoding scheme. By subdividing the bitlines, Â
the distance and amount of wire that each tinyÂ
capacitor is connected to as it perturbs the Â
bitline to the sense amplifier is reduced, andÂ
thus the capacitor doesn’t have to be as big. By Â
subdividing the wordlines the capacitive load fromÂ
eight thousandish transistor gates and channels is Â
decreased, and thus the time it takes to turn onÂ
all the access transistors in a row is decreased.Â
The final topic we’re going to talk about isÂ
the most complicated. Remember how we had Â
a sense amplifier connected to the bottom ofÂ
each bitline? Well, this optimization has two Â
bitlines per column going to each sense amplifierÂ
and alternating rows of memory cells connected to Â
the left and right bitlines, thus doubling theÂ
number of bitlines. When one row is active, Â
half of the bitlines are active while the otherÂ
half are passive and vice versa when the next row Â
is active. Â Moving down to see inside the senseÂ
amplifier we find a cross-coupled inverter. How Â
does this work? Well, when the active bitline isÂ
a 1, the passive bitline will be driven by this Â
cross-coupled inverter to the opposite valueÂ
of 0, and when the active is a 0, the passive Â
becomes a 1. Note that the inverted passiveÂ
bitline isn’t connected to any memory cells, Â
and thus it doesn’t mess up any stored data. TheÂ
cross-coupled inverter makes it such that these Â
two bitlines are always going to be oppositeÂ
one another, and they’re called a differential Â
pair. There are three benefits to this design. Â
First, during the precharge step, we want to bring Â
all the bitlines to .5 volts and, by having aÂ
differential pair of active and passive bitlines, Â
the easiest solution is to disconnect the crossÂ
coupled inverters and open a channel between the Â
two using a transistor. The charge easilyÂ
flows from the 1 bitline to the 0, and they Â
both average out and settle at .5 volts. Â
The other two benefits are noise immunity, Â
and a reduction in parasitic capacitance of theÂ
bitline. These benefits are related to that fact Â
that by creating two oppositely charged electricÂ
wires with electric fields going from one to Â
the other we reduce the amount of electric fieldsÂ
emitted in stray directions and relatedly increase Â
the ability of the sense amplifier to amplifyÂ
one bitline to 1 volt and the other to 0 volts.  Â
One final note is that when discussing DRAM,Â
one major topic is the timing of addresses, Â
command signals and data, and the relatedÂ
acronyms DDR or double data rate, and SDRAM, Â
or Synchronous DRAM. These topics were omittedÂ
from this video because it would have taken an Â
additional 15 minutes to properly explore. Â
That’s Â
pretty much it for the DRAM, and we are gratefulÂ
you made it this far into the video. We believe Â
the future will require a strong emphasis onÂ
engineering education and we’re thankful to all Â
our Patreon and YouTube Membership SponsorsÂ
for supporting this dream. If
you want to Â
support us on YouTube Memberships, or Patreon,Â
you can find the links in the description. Â
A huge thanks goes to the Nathan, Peter, andÂ
Jacob who are doctoral students at the Florida Â
Institute for Cybersecurity Research for helpingÂ
to research and review this video’s content! They Â
do foundational research on finding the weakÂ
points in device security and whether hardware Â
is compromised. If you want to learn more aboutÂ
the FICS graduate program or their work, check out Â
the website using the link in the description.
 This is Branch Education, and we create 3D Â
animations that dive deep into the technology thatÂ
drives our modern world. Watch another Branch Â
video by clicking one of these cards or click hereÂ
to subscribe. Thanks for watching to the end!