Test2

Saturday, 2 September 2023

Have you ever wondered whatâ€™s happening insideÂ

your computer when you load a program or videoÂ Â

game?Â Well, millions of operations are happening,Â

but perhaps the most common is simply just copyingÂ Â

data from a solid-state drive or SSD into dynamicÂ

random-access memory or DRAM. Â An SSD stores allÂ Â

the programs and data for long-term storage,Â

but when your computer wants to use that data,Â Â

it has to first move the appropriateÂ

files into DRAM, which takes time,Â Â

hence the loading bar.Â Because your CPU worksÂ

only with data after itâ€™s been moved to DRAM,Â Â

itâ€™s also called working memory or main memory.

The reason why your desktop uses both SSDs andÂ Â

DRAM is because Solid-State Drives permanentlyÂ

store data in massive 3D arrays composed of aÂ Â

trillion or so memory cells, yielding terabytes ofÂ

storage, whereas DRAM temporarily stores data inÂ Â

2D arrays composed of billions of tiny capacitorÂ

memory cells yielding gigabytes of working memory.Â Â

Accessing any section of cells in the massiveÂ

SSD array and reading or writing data takesÂ Â

about 50 microseconds whereas reading orÂ

writing from any DRAM capacitor memoryÂ Â

cell takes about 17 nanoseconds, which is 3000Â

times faster.Â For comparison, a supersonic jetÂ Â

going at Mach 3 is around 3000 times fasterÂ

than a moving tortoise.Â So, the speed ofÂ Â

17 nanosecond DRAM versus 50 microsecond SSD isÂ

like comparing a supersonic jet to a tortoise.Â Â

Â Â However, speed is just one factor.Â DRAM isÂ

limited to a 2D array and temporarily storesÂ Â

one bit per memory cell. For example, this stickÂ

of DRAM with 8 chips holds 16 gigabytes of data,Â Â

whereas a solid-state drive of a smallerÂ

size can hold 2 terabytes of data, moreÂ Â

than 100 times that of DRAM.Â Additionally,Â

DRAM requires power to continuously storeÂ Â

and refresh the data held in its capacitors.Â Â

Therefore, computers use both SSDs and DRAM and,Â Â

by spending a few seconds of loading timeÂ

to copy data from the SSD to the DRAM,Â Â

and then prefetching, which is the process ofÂ

moving data before itâ€™s needed, your computer canÂ Â

store terabytes of data on the SSD and then accessÂ

the data from programs that were preemptivelyÂ Â

copied into the DRAM in a few nanoseconds.Â

For example, many video games have a loadingÂ Â

time to start up the game itself, and then aÂ

separate loading time to load a save file.Â Â Â

During the process of loading a save file, allÂ

the 3D models, textures, and the environment ofÂ Â

your game state are moved from the SSD into DRAMÂ

so any of it can be accessed in a few nanoseconds,Â Â

which is why video games have DRAM capacityÂ

requirements.Â Just imagine, without DRAM,Â Â

playing a game would be 3,000 times slower.Â Â

We covered solid-state drives in other videos,Â Â

so in this video, weâ€™re going to take a deepÂ

dive into this 16-gigabyte stick of DRAM.Â First,Â Â

weâ€™ll see exactly how the CPU communicatesÂ

and moves data from an SSD to DRAM.Â ThenÂ Â

weâ€™ll open up a DRAM microchip and see howÂ

billions of memory cells are organized intoÂ Â

banks and how data is written to and read fromÂ

groups of memory cells.Â In the process, weâ€™llÂ Â

dive into the nanoscopic structures insideÂ

individual memory cells and see how eachÂ Â

capacitor physically stores 1 bit of data.Â Â

Finally, weâ€™ll explore some breakthroughs andÂ Â

optimizations such as the burst buffer and foldedÂ

DRAM layouts that enable DRAM to move data aroundÂ Â

at incredible speeds. A few quick notes.Â Â

First, you can find similar DRAM chips insideÂ Â

GPUs, Smartphones, and many other devices, butÂ

with different optimizations.Â As examples,Â Â

GPU DRAM or VRAM, located all around theÂ

GPU chip, has a larger bandwidth and canÂ Â

read and write simultaneously, but operates atÂ

a lower frequency, and DRAM in your smartphoneÂ Â

is stacked on top of the CPU and is optimized forÂ

smaller packaging and lower power consumption.Â Â Â

Second, this video is sponsored byÂ

Crucial.Â Although they gave me thisÂ Â

stick of DRAM to model and use in theÂ

video, the content was independentlyÂ Â

researched and not influenced by them.Â Â

Third, there are faster memory structuresÂ Â

in your CPU called cache memory and even fasterÂ

registers.Â All these types of memory create aÂ Â

memory hierarchy, with the main trade-offÂ

being speed versus capacity while keepingÂ Â

prices affordable to consumers and optimizingÂ

the size of each microchip for manufacturing.Â

Fourth, you can see how much ofÂ

your DRAM is being utilized byÂ Â

each program by opening your computerâ€™sÂ

resource monitor and clicking on memory.Â

Fifth, there are different generations of DRAM,Â

and weâ€™ll explore DDR5.Â Many of the key conceptsÂ Â

that we explain apply to prior generations,Â

although the numbers may be different.Â Â Â

Sixth, 17 nanoseconds is incredibly fast!Â Â

Electricity travels at around 1 foot perÂ Â

nanosecond, and 17 nanoseconds is about theÂ

time it takes for light to travel across a room.Â

Finally, this video is rather long as it coversÂ

a lot of what there is to know around DRAM.Â WeÂ Â

recommend watching it first at one point twoÂ

five times speed, and then a second time atÂ Â

one and a half speed to fully comprehend thisÂ

complex technology.Â Stick around because thisÂ Â

is going to be an incredibly detailed video.Â Â

To start, a stick of DRAM is also called a DualÂ Â

Inline Memory Module or DIMM and there are 8Â

DRAM chips on this particular DIMM.Â On theÂ Â

motherboard, there are 4 DRAM slots, and whenÂ

plugged in, the DRAM is directly connected toÂ Â

the CPU via 2 memory channels that run throughÂ

the motherboard.Â Note that the left two DRAMÂ Â

slots share these memory channels, and the rightÂ

two share a separate channel.Â Letâ€™s move toÂ Â

look inside the CPU at the processor.Â AlongÂ

with numerous cores and many other elements,Â Â

we find the memory controller which managesÂ

and communicates with the DRAM.Â Thereâ€™s alsoÂ Â

a separate section for communicating with SSDsÂ

plugged into the M2 slots and with SSDs andÂ Â

hard drives plugged into SATA connectors.Â UsingÂ

these sections, along with data mapping tables,Â Â

the CPU manages the flow of data fromÂ

the SSD to DRAM, as well as from DRAMÂ Â

to cache memory for processing by the cores.

Letâ€™s move back to see the memory channels.Â Â Â

For DDR5 each memory channel is divided into twoÂ

parts, Channel A and Channel B. These two memoryÂ Â

channels A and B independently transfer 32 bits atÂ

a time using 32 data wires.Â Â Using 21 additionalÂ Â

wires each memory channel carries an addressÂ

specifyingÂ where to read or write data and, usingÂ Â

7 control signal wires,Â commands are relayed.

The addresses and commands are sent to and sharedÂ Â

by all 4 chips on the memory channel whichÂ

work in parallel.Â However, the 32-bit dataÂ Â

lines are divided among the chips and thus eachÂ

chip only reads or writes 8 bits at a time.Â Â Â

Additionally, power for DRAM isÂ

supplied by the motherboard andÂ Â

managed by these chips on the stick itself.

Next, letâ€™s open and look inside one of theseÂ Â

DRAM microchips.Â Inside the exterior packaging,Â

we find an interconnection matrix that connectsÂ Â

the ball grid array at the bottom with the dieÂ

which is the main part of this microchip.Â This 2Â Â

gigabyte DRAM die is organized into 8 bank groupsÂ

composed of 4 banks each, totaling 32 banks.Â Â Â

Within each bank is a massive array, 65,536 memoryÂ

cells tall by 8192 cells across, essentially rowsÂ Â

and columns in a grid, with tens of thousands ofÂ

wires, and supporting circuitry running outsideÂ Â

each bank.Â Instead of looking at this die, weâ€™reÂ

going to transition to a functional diagram,Â Â

and then reorganize the banks and bank groups.

In order to access 17 billion memory cells,Â Â

we need a 31-bit address.Â 3 bits are used toÂ

select the appropriate bank group, then 2 bitsÂ Â

to select the bank.Â Next 16 bits of the addressÂ

are used to determine the exact row out of 65Â Â

thousand.Â Because this chip reads or writes 8Â

bits at a time, the 8192 columns are grouped byÂ Â

8 memory cells, all read or written at a time,Â

or â€˜by 8â€™, and thus only 10 bits are needed forÂ Â

the column address.Â One optimization is thatÂ

this 31-bit address is separated into two partsÂ Â

and sent using only 21 wires.Â First, the bankÂ

group, bank, and row address are sent, and thenÂ Â

after that the column address.Â Next, weâ€™ll lookÂ

inside these physical memory cells, but first,Â Â

letâ€™s briefly talk about how these structures areÂ

manufactured as well as this videoâ€™s sponsor.Â Â Â Â

This incredibly complicated die,Â

also called an integrated circuit,Â Â

is manufactured on 300-millimeter silicon wafers,Â

2500ish dies at a time.Â On each die are billionsÂ Â

of nanoscopic memory cells that are fabricatedÂ

using dozens of tools and hundreds of steps inÂ Â

a semiconductor fabrication plant or fab.Â ThisÂ

one was made by Micron which manufactures aroundÂ Â

a quarter of the worldâ€™s DRAM, including bothÂ

Nvidiaâ€™s and AMDâ€™s VRAM in their GPUs Micron alsoÂ Â

has its own product line of DRAM and SSDs underÂ

the brand Crucial which, as mentioned earlier,Â Â

is the sponsor of this video.Â In additionÂ

to DRAM, Micron is one of the worldâ€™s leadingÂ Â

suppliers of solid-state drives such as thisÂ

Crucial P5+ M2 NVME SSD.Â Â By installing yourÂ Â

operating system and video games on a CrucialÂ

NVMe solid-state drive, youâ€™ll be sure to haveÂ Â

incredibly fast loading times and smooth gameplay,Â

and if you do video editing, make sure all thoseÂ Â

files are on a fast SSD like this one as well.Â Â

This is because the main speed bottleneck forÂ Â

loading is predominantly limited by the speed ofÂ

the SSD or hard drive where the files are stored.Â

For example, this hard drive can only transferÂ

data at around 150 megabytes a second whereasÂ Â

this Crucial NVMe SSD can transfer data at aÂ

rate of up to 6,600 megabytes a second, which,Â Â

for comparison is the speed of a moving tortoiseÂ

versus a galloping horse.Â By using a Crucial NVMeÂ Â

SSD, loading a video game that requires gigabytesÂ

of DRAM is reduced from a minute or more down toÂ Â

a couple seconds.Â Check out the Crucial NVMeÂ

SSDs using the link in the description below.Â

Letâ€™s get back to the details of how DRAM worksÂ

and zoom in to explore a single memory cellÂ Â

situated in a massive array. This memory cell isÂ

called a 1T1C cell and is a few dozen nanometersÂ Â

in size.Â It has two parts, a capacitor to storeÂ

one bit of data in the form of electrical chargesÂ Â

or electrons and a transistor to access and readÂ

or write data.Â The capacitor is shaped like aÂ Â

deep trench dug into silicon and is composed ofÂ

two conductive surfaces separated by a dielectricÂ Â

insulator or barrier just a few atoms thick, whichÂ

stops the flow of electrons butÂ allows electricÂ Â

fields to pass through.Â If this capacitorÂ

is charged up with electrons to 1 volt,Â Â

itâ€™s a binary 1, and if no charges are presentÂ

and itâ€™s at 0 volts, itâ€™s a binary 0, and thusÂ Â

this cell only holds one bit of data.Â DesignsÂ

of capacitors are constantly evolving but inÂ Â

this trench capacitor, the depth of the silicon isÂ

utilized to allow for larger capacitive storage,Â Â

while taking up as little area as possible.

Next letâ€™s look at the access transistor andÂ Â

add in two wires.Â The wordline wire connects toÂ

the gate of the transistor while the bitline wireÂ Â

connects to the other side of the transistorâ€™sÂ

channel.Â Applying a voltage to the wordlineÂ Â

turns on the transistor, and, while itâ€™s on,Â

electrons can flow through the channel thusÂ Â

connecting the capacitor to the bitline.Â ThisÂ

allows us to access and charge up the capacitorÂ Â

to write a 1 or discharge the capacitor to writeÂ

a 0.Â Additionally, we can read the stored valueÂ Â

in the capacitor by measuring the amount ofÂ

charge.Â However, when the wordline is off,Â Â

the transistor is turned off, and the capacitorÂ

is isolated from the bitline thus saving theÂ Â

data or charge that was previously written.Â NoteÂ

that because this transistor is incredibly small,Â Â

only a few dozen nanometers wide, electrons slowlyÂ

leak across the channel, and thus over time theÂ Â

capacitor needs to be refreshed to rechargeÂ

the leaked electrons. Weâ€™ll cover exactly howÂ Â

refreshing memory cells works a little later.

As mentioned earlier, this 1T1C memory cell isÂ Â

one of 17 billion inside this single die and isÂ

organized into massive arrays called banks.Â So,Â Â

letâ€™s build a small array for illustrativeÂ

purposes.Â In our array, each of the wordlinesÂ Â

is connected in rows, and then the bitlines areÂ

connected in columns.Â Wordlines and bitlinesÂ Â

are on different vertical layers so one canÂ

cross over the other, and they never touch.Â Â

Letâ€™s simplify the visual and use symbols for theÂ

capacitors and the transistors.Â Just as before,Â Â

the wordlines connect to each transistorâ€™s controlÂ

gate in rows, and then all the bitlines in columnsÂ Â

connect to the channel opposite each capacitor.Â

As a result, when a wordline is active,Â Â

all the capacitors in only that row areÂ

connected to their corresponding bitlines,Â Â

thereby activating all the memory cells in thatÂ

row.Â At any given time only one wordline isÂ Â

active because, if more than one wordline wereÂ

active, then multiple capacitors in a columnÂ Â

would be connected to the bitline and the dataÂ

storage functionalities of these capacitors wouldÂ Â

interfere with one another, making them useless.Â Â

As mentioned earlier, within a single bank thereÂ Â

are 65,536 rows and 8,192 columns and the 31-bitÂ

address is used to activate a group of just 8Â Â

memory cells.Â The first 5 bits select the bank,Â

and the next 16-bits are sent to a row decoderÂ Â

to activate a single row.Â For example, thisÂ

binary number turns on the wordline row 27,524,Â Â

thus turning on all transistors in that row andÂ

connecting the 8,192 capacitors to their bitlines,Â Â

while at the same time the other 65Â

thousandish wordlines are all off.Â Â Â

Hereâ€™s the logic diagram for a simple decoder.

The remaining 10 bits of the address are sentÂ Â

to the column multiplexer.Â This multiplexerÂ

takes in the 8192 bitlines on the top, and,Â Â

depending on the 10-bit address, connects aÂ

specific group of 8 bitlines to the 8 inputÂ Â

and output IO wires at the bottom.Â For example,Â

if the 10-bit address we this, then only theÂ Â

bitlines 4,784 through 4,791 would be connectedÂ

to the IO wires, and the rest of the 8000ishÂ Â

bitlines would be connected to nothing.Â Hereâ€™sÂ

the logic diagram for a simple multiplexer.Â Â Â

We now have the means of accessing anyÂ

memory cell in this massive array; however,Â Â

to understand the three basic operations,Â

reading, writing, and refreshing letâ€™s addÂ Â

two elements to our layout:Â A sense amplifierÂ

at the bottom of each bitline, and a read andÂ Â

write driver outside of the column multiplexer.

Letâ€™s look at reading from a group of memoryÂ Â

cells.Â First the read command and 31-bit addressÂ

are sent from the CPU to the DRAM.Â The first 5Â Â

bits select a specific bank. The next step isÂ

to turn off all the wordlines in that bank,Â Â

thereby isolating all the capacitors, and thenÂ

precharge all 8000ish bitlines to .5 volts.Â NextÂ Â

the 16-bit row address turns on a row, and allÂ

the capacitors in that row are connected to theirÂ Â

bitlines.Â If an individual capacitor holds a 1Â

and is charged to 1 volt, then some charge flowsÂ Â

from the capacitor onto the .5-volt bitline, andÂ

the voltage on the bitline increases.Â The senseÂ Â

amplifier then detects this slight changeÂ

or perturbation of voltage on the bitline,Â Â

amplifies the change, and pushes the voltage onÂ

the bitline all the way up to 1 volt. However,Â Â

if a 0 is stored in the capacitor, chargeÂ

flows from the bitline into the capacitor,Â Â

and the .5-volt bitline decreases in voltage.Â Â

The sense amplifier then sees this change,Â Â

amplifies it and drives the bitline voltage downÂ

to 0 volts or ground.Â The sense amplifier isÂ Â

necessary because the capacitor is so small,Â

and the bitline is rather long, and thus theÂ Â

capacitor needs to have an additional componentÂ

to sense and amplify whatever value is stored.Â Â Â

Now, all 8000ish bitlines are driven to 1Â

volt or 0 volts corresponding to the storedÂ Â

charge in the capacitors of the activatedÂ

row, and this row is now considered open.Â Â Â

Next, the column select multiplexer usesÂ

the 10-bit column address to connect theÂ Â

corresponding 8 bitlines to the readÂ

driver which then sends these 8 valuesÂ Â

and voltages over the 8 data wires to the CPU.Â

Writing data to these memory cells is similarÂ Â

to reading, however with a few key differences.

First the write command, address, and 8 bits toÂ Â

be written are sent to the DRAM chip.Â Next, justÂ

like before the bank is selected,Â the capacitorsÂ Â

are isolated, and the bitlines are prechargedÂ

to .5 volts.Â Then, using a 16-bit address,Â Â

a single row is activated, the capacitors perturbÂ

the bitline, and the sense amplifiers sense thisÂ Â

and drive the bitlines to a 1 or 0 thus openingÂ

the row.Â Next the column address goes to theÂ Â

multiplexer, but, this time, because a writeÂ

command was sent, the multiplexer connects theÂ Â

specific 8 bitlines to the write driver whichÂ

contains the 8 bits that the CPU had sent alongÂ Â

the data wires and requested to write.Â TheseÂ

write drivers are much stronger than the senseÂ Â

amplifier and thus they override whatever voltageÂ

was previously on the bitline, and drive each ofÂ Â

the 8 bitlines to 1 volt for a 1 to be written,Â

or 0 volts for a 0.Â This new bitline voltageÂ Â

overrides the previously stored charges or valuesÂ

in each of the 8 capacitors in the open row,Â Â

therebyÂ writing 8 bits of data to the memoryÂ

cells corresponding to the 31-bit address.Â

Three quick notes.Â First, as a reminder, writingÂ

and reading happens concurrently with all the 4Â Â

chips in the shared memory channel, usingÂ

the same 31-bit address and command wires,Â Â

but with different data wires for each chip.Â Â

Second, with DDR5 for a binary 1 the voltageÂ Â

is actually 1.1 volts, for DDR4 itâ€™s 1.2 volts,Â

and prior generations had even higher voltages,Â Â

with the bitline precharge voltages beingÂ

half of these voltages.Â However, for DDR5,Â Â

when writing or refreshing a higher voltage,Â

around 1.4 volts is applied and stored in eachÂ Â

capacitor for a binary 1 because charge leaksÂ

out over time. However, for simplicity, weâ€™reÂ Â

going to stick with 1 and 0.Â Third, the numberÂ

of bank groups, banks, bitlines and wordlinesÂ Â

varies widely between different generationsÂ

and capacities but is always in powers of 2.Â

Letâ€™s move on and discuss the third operationÂ

which is refreshing the memory cells in a bank.Â Â Â

As mentioned earlier, the transistors used toÂ

isolate the capacitors are incredibly small,Â Â

and thus charges leak across the channel.Â TheÂ

refresh operation is rather simple and is aÂ Â

sequence of closing all the rows, prechargingÂ

the bitlines to .5 volts, and opening a row.Â Â Â

To refresh, just as before, the capacitors perturbÂ

the bitlines and then the sense amplifiers driveÂ Â

the bitlines and capacitors of the open row fullyÂ

up to 1 volt or down to 0 volts depending on theÂ Â

stored value of the capacitor, thereby refillingÂ

the leaked charge.Â This process of row closing,Â Â

precharging, opening, and sense amplifying happensÂ

row after row, taking 50 nanoseconds for each row,Â Â

until all 65 thousandish rows are refreshedÂ

taking a total of 3 milliseconds or so toÂ Â

complete.Â The refresh operation occursÂ

once every 64 milliseconds for each bank,Â Â

because thatâ€™s statistically below theÂ

worst-case time it takes for a memoryÂ Â

cell to leak too much charge to make a stored 1Â

turn into a 0, thus resulting in a loss of data.Â

Letâ€™s take a step back and consider theÂ

incredible amount of data that is movedÂ Â

through DRAM memory cells. These banks of memoryÂ

cells handle up to 4 thousand 8 hundred millionÂ Â

requests to read and write data every secondÂ

while refreshing every memory cell in eachÂ Â

bank row by row around 16 times a second.Â

Thatâ€™s a staggering amount of data movementÂ Â

and illustrates the true strength of computers.Â

Yes, they do simple things like comparisons,Â Â

arithmetic, and moving data around, butÂ

at a rate of billions of times a second.Â Â

Now, you might wonder why computersÂ

need to do so much data movement. Well,Â Â

take this video game for example. You have obviousÂ

calculations like the movement of your characterÂ Â

and the horse. But then there are individualÂ

grasses, trees, rocks, and animals whoseÂ Â

positions and geometries are stored in DRAM.Â

And then the environment such as the lightingÂ Â

and shadows change the colors and textures of theÂ

environment in order to create a realistic world.Â

Next, weâ€™re going to explore breakthroughs andÂ

optimizations that allow DRAM to be incrediblyÂ Â

fast. But, before we get into all thoseÂ

details, we would greatly appreciate itÂ Â

if you could take a second to hit that likeÂ

button, subscribe if you havenâ€™t already,Â Â

and type up a quick comment below, as it helps getÂ

this video out to others.Â Also, we have a PatreonÂ Â

and would appreciate any support.Â This is ourÂ

longest and most detailed video by far, and weâ€™reÂ Â

planning more videos that get into the innerÂ

details of how computers work.Â We canâ€™t do itÂ Â

without your help, so thank you for watching andÂ

doing these three quick things. It helps a ton.Â

The first complex topic which weâ€™ll exploreÂ

is why there are 32 banks, as well as what theÂ Â

parameters on the packaging of DRAM are.Â Â

After that, weâ€™ll explore burst buffers,Â Â

sub-arrays, and folded DRAM architectureÂ

and whatâ€™s inside the sense amplifier.Â

Letâ€™s take a look at the banks.Â AsÂ

mentioned earlier opening a singleÂ Â

row within a bank requires all theseÂ

steps and this process takes time.

However, if a row were already open, weÂ

could read or write to any section ofÂ Â

8 memory cells using only the 10-bitÂ

column address and the column selectÂ Â

multiplexer. Â When the CPU sends a read orÂ

write command to a row thatâ€™s already open,Â Â

itâ€™s called a row hit or page hit, and thisÂ

can happen over and over.Â With a row hit,Â Â

we skip all the steps required to open a row, andÂ

just use the 10-bit column address to multiplex aÂ Â

different set of 8 columns or bitlines, connectingÂ

them to the read or write driver, thereby savingÂ Â

a considerable amount of time.Â A row miss isÂ

when the next address is for a different row,Â Â

which requires the DRAM to close and isolate theÂ

currently open row, and then open the new row.Â Â

On a package of DRAM there are typically 4 numbersÂ

specifying timing parameters regarding row hits,Â Â

precharging, and row misses.Â The first numberÂ

refers to the time it takes between sending anÂ Â

address with a row open, thus a row hit, toÂ

receiving the data stored in those columns.Â Â Â

The next number is the time it takes to openÂ

a row if all the lines are isolated and theÂ Â

bitlines are precharged.Â Then the next numberÂ

is the time it takes to precharge the bitlinesÂ Â

before opening a row, and the last number isÂ

the time it takes between a row activation andÂ Â

the following precharge.Â Note that theseÂ

numbers are measured in clock cycles.Â Â Â

Row hits are also the reason why the address isÂ

sent in two sections, first the bank selection andÂ Â

row address called RAS and then the column addressÂ

called CAS. If the first part, the bank selectionÂ Â

and row address, matches a currently open row,Â

then itâ€™s a row hit, and all the DRAM needs is theÂ Â

column address and the new command, and then theÂ

multiplexer simply moves around the open row.Â Â Â

Because of the time saving in accessing anÂ

open row, the CPU memory controller, programs,Â Â

and compilers are optimized for increasing theÂ

number of subsequent row hits. The opposite,Â Â

called thrashing, is when a program jumps aroundÂ

from one row to a different row over and over,Â Â

and is obviously incredibly inefficientÂ

both in terms of energy and time.Â Â Â

Additionally, DDR5 DRAM has 32 banks forÂ

this reason.Â Each bankâ€™s rows, columns,Â Â

sense amplifiers and row decoders operateÂ

independently of one another, and thus multipleÂ Â

rows from different banks can be open all at theÂ

same time, increasing the likelihood of a row hit,Â Â

and reducing the average time it takes for the CPUÂ

to access data.Â Furthermore, by having multipleÂ Â

bank groups, the CPU can refresh one bank in eachÂ

bank group at a time while using the other three,Â Â

thus reducing the impact of refreshing.Â

A question you may have had earlier is whyÂ Â

are banks significantly taller than they areÂ

wide? Well, by combining all the banks togetherÂ Â

one next to the other you can think of this chipÂ

as actually being 65 thousand rows tall by 262Â Â

thousand columns wide. And, by adding 31 equallyÂ

spaced divisions between the columns, thusÂ Â

creating banks, we allow for much more flexibilityÂ

and efficiency in reading, writing and refreshing.Â

Also, note that on the DRAM packaging areÂ

its capacity in Gigabytes, the number ofÂ Â

millions of data transfers per second, whichÂ

is two times the clock frequency, and the peakÂ Â

data transfer rate in Megabytes per second.

The next design optimization weâ€™ll exploreÂ Â

is the burst buffer and burst length.Â Letâ€™s add aÂ

128-bit read and write temporary storage location,Â Â

called a burst buffer to our functional diagram.Â Â

Instead of 8 wires coming out of the multiplexer,Â Â

weâ€™re going to have 128 wires that connectÂ

to these 128-bit buffer locations.Â NextÂ Â

the 10-bit column address is broken into twoÂ

parts, 6 bits are used for the multiplexer,Â Â

and 4 bits are for the burst buffer.Â

Letâ€™s explore a reading command.Â WithÂ Â

our burst buffer in place, 128 memory cells andÂ

bitlines are connected to the burst buffer usingÂ Â

the 6 column bits, thereby temporarily loading,Â

or caching 128 values into the burst buffer.Â Â Â

Using the 4 bits for the buffer, 8 quicklyÂ

accessed data locations in the burst bufferÂ Â

are connected to the read drivers and the data isÂ

sent to the CPU.Â By cycling through these 4 bits,Â Â

all 16 sets of 8 bits are read out, and thus theÂ

burst length is 16.Â After that a new set of 128Â Â

bitlines and values are connected and loadedÂ

into the burst buffer.Â Thereâ€™s also a writeÂ Â

burst buffer which operates in a similar way.

The benefit of this design is that 16 sets ofÂ Â

8 bits per microchip, totaling 1024 bits, can beÂ

accessed and read or written extremely quickly,Â Â

as long as the data is all next to oneÂ

another, but at the same time we stillÂ Â

have the granularity and ability to access anyÂ

set of 8 bits if our data requests jump around.Â

The next design optimization is that this bankÂ

of 65536 rows by 8192 columns is rather massive,Â Â

and results in extremely long wordlines andÂ

bitlines, especially when compared to the size ofÂ Â

each trench capacitor memory cell.Â Therefore,Â

the massive array is broken up into smallerÂ Â

blocks 1,024 by 1,024, with intermediateÂ

sense amplifiers below each subarray,Â Â

and subdividing wordlines and using a hierarchicalÂ

row decoding scheme.Â By subdividing the bitlines,Â Â

the distance and amount of wire that each tinyÂ

capacitor is connected to as it perturbs theÂ Â

bitline to the sense amplifier is reduced, andÂ

thus the capacitor doesnâ€™t have to be as big.Â ByÂ Â

subdividing the wordlines the capacitive load fromÂ

eight thousandish transistor gates and channels isÂ Â

decreased, and thus the time it takes to turn onÂ

all the access transistors in a row is decreased.Â

The final topic weâ€™re going to talk about isÂ

the most complicated.Â Remember how we hadÂ Â

a sense amplifier connected to the bottom ofÂ

each bitline?Â Well, this optimization has twoÂ Â

bitlines per column going to each sense amplifierÂ

and alternating rows of memory cells connected toÂ Â

the left and right bitlines, thus doubling theÂ

number of bitlines.Â When one row is active,Â Â

half of the bitlines are active while the otherÂ

half are passive and vice versa when the next rowÂ Â

is active. Â Moving down to see inside the senseÂ

amplifier we find a cross-coupled inverter.Â HowÂ Â

does this work?Â Well, when the active bitline isÂ

a 1, the passive bitline will be driven by thisÂ Â

cross-coupled inverter to the opposite valueÂ

of 0, and when the active is a 0, the passiveÂ Â

becomes a 1.Â Note that the inverted passiveÂ

bitline isnâ€™t connected to any memory cells,Â Â

and thus it doesnâ€™t mess up any stored data.Â TheÂ

cross-coupled inverter makes it such that theseÂ Â

two bitlines are always going to be oppositeÂ

one another, and theyâ€™re called a differentialÂ Â

pair.Â There are three benefits to this design.Â Â

First, during the precharge step, we want to bringÂ Â

all the bitlines to .5 volts and, by having aÂ

differential pair of active and passive bitlines,Â Â

the easiest solution is to disconnect the crossÂ

coupled inverters and open a channel between theÂ Â

two using a transistor.Â The charge easilyÂ

flows from the 1 bitline to the 0, and theyÂ Â

both average out and settle at .5 volts.Â Â

The other two benefits are noise immunity,Â Â

and a reduction in parasitic capacitance of theÂ

bitline.Â These benefits are related to that factÂ Â

that by creating two oppositely charged electricÂ

wires with electric fields going from one toÂ Â

the other we reduce the amount of electric fieldsÂ

emitted in stray directions and relatedly increaseÂ Â

the ability of the sense amplifier to amplifyÂ

one bitline to 1 volt and the other to 0 volts.Â Â Â

One final note is that when discussing DRAM,Â

one major topic is the timing of addresses,Â Â

command signals and data, and the relatedÂ

acronyms DDR or double data rate, and SDRAM,Â Â

or Synchronous DRAM.Â These topics were omittedÂ

from this video because it would have taken anÂ Â

additional 15 minutes to properly explore.Â Â

Thatâ€™sÂ Â

pretty much it for the DRAM, and we are gratefulÂ

you made it this far into the video.Â We believeÂ Â

the future will require a strong emphasis onÂ

engineering education and weâ€™re thankful to allÂ Â

our Patreon and YouTube Membership SponsorsÂ

for supporting this dream.Â If

you want toÂ Â

support us on YouTube Memberships, or Patreon,Â

you can find the links in the description.Â Â

A huge thanks goes to the Nathan, Peter, andÂ

Jacob who are doctoral students at theÂ FloridaÂ Â

Institute for Cybersecurity Research for helpingÂ

to research and review this videoâ€™s content!Â TheyÂ Â

do foundational research on finding the weakÂ

points in device security and whetherÂ hardwareÂ Â

is compromised.Â If you want to learn more aboutÂ

the FICS graduate program or their work, check outÂ Â

the website using the link in the description.

Â This is Branch Education, and we create 3DÂ Â

animations that dive deep into the technology thatÂ

drives our modern world.Â Watch another BranchÂ Â

video by clicking one of these cards or click hereÂ

to subscribe.Â Thanks for watching to the end!

Monday, 7 August 2023

Download images

download

Wednesday, 12 April 2023

Basic Blogger Template 2 sjsjjssjsjsj ususjjs shusus

This template is built from scratch with schema markup, i use schema because this help search engine understand the part of the website structure and help indexing in search engine. And i use open graph protocol for help social media i.e facebook to scraping and make preview for blog posting.

Getting Started

Please review the markup structure before editing and make any changes using this template.

<?xml version="1.0" encoding="UTF-8" ?>

<!DOCTYPE html>

<head>

</head>

<body>

<b:section class='header' id='header' maxwidgets='1'>

<b:widget id='Header1' locked='true' title='Basic Blogger Template (Header)' type='Header'></b:widget>

</b:section>

</header>

<ul>

<li><a href=''>About</a></li>

</ul>

</nav>

<b:section class='main' id='main'>

<b:widget id='Blog1' locked='true' title='Blog Posting' type='Blog'></b:widget>

</b:section>

</article>

</section>

<b:section class='sidebar' id='sidebar' showaddelement='yes'></b:section>

</aside>

<b:section class='footer' id='footer' showaddelement='yes'></b:section>

</footer>

</div>

</body>

</html>

Debugging

Structured Data Testing Tool

Open Graph Debugger

Technologies

Open Graph Protocol

Schema

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Prerequisite

HTML & CSS

XML

Javascript & Jquery (optional, if you need your Blog more interractive you need a Javascript, Jquery).

Authors

Agus Purwantoro initial work - meagusp

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Basic Blogger Template

Getting Started

Please review the markup structure before editing and make any changes using this template.

<?xml version="1.0" encoding="UTF-8" ?>

<!DOCTYPE html>

<head>

</head>

<body>

<b:section class='header' id='header' maxwidgets='1'>

<b:widget id='Header1' locked='true' title='Basic Blogger Template (Header)' type='Header'></b:widget>

</b:section>

</header>

<ul>

<li><a href=''>About</a></li>

</ul>

</nav>

<b:section class='main' id='main'>

<b:widget id='Blog1' locked='true' title='Blog Posting' type='Blog'></b:widget>

</b:section>

</article>

</section>

<b:section class='sidebar' id='sidebar' showaddelement='yes'></b:section>

</aside>

<b:section class='footer' id='footer' showaddelement='yes'></b:section>

</footer>

</div>

</body>

</html>

Debugging

Structured Data Testing Tool

Open Graph Debugger

Technologies

Open Graph Protocol

Schema

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Prerequisite

HTML & CSS

XML

Javascript & Jquery (optional, if you need your Blog more interractive you need a Javascript, Jquery).

Authors

Agus Purwantoro initial work - meagusp

License

This project is licensed under the MIT License - see the LICENSE.md file for details.