'분류 전체보기' 카테고리의 글 목록 (2 Page)

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

목록전체 (168)

Kim Seon Deok

[General Purpose GPU] ch3.4 Research directions on branch divergence(1)

*General - Purpose Graphics Processor Architecture의 chapter 3.4 RESEARCH DIRECTIONS ON BRANCH DIVERGENCE 를 읽고 정리한 내용입니다. 동일한 warp에 있는 thread들은 같은 control flow path를 따라 실행된다. 따라서 GPU는 thread를 SIMD 하드웨어에서 lockstep 방식으로 실행할 수 있다. 하지만 thread는 data dependent branch를 만났을 때 다른 target으로 diverge되는데, 이 때 이런 현상을 branch divergence라 한다. moder GPU는 이 branch divergence를 하나의 warp 안에서 handle할 수 있도록 하는 특별한 하드웨어인 ..

General Purpose GPU 2024. 1. 15. 23:38

Static pipeline with out-of-order execution completion(2)

Precise exceptions Out-of-order completion의 주요한 단점은 바로 precise exception을 implement하기가 어렵다는 것이다. ADD.S F1, F2, F1 LW R2, 0(R1) 두 instruction은 independent (각각 floating point inst, integer inst) ADD.S instruction은 floating point add를 수행 -> IF ~ WB 까지 9 cycle (EXE : 5 cycle) LW instruction은 integer add를 수행 -> IF ~ WB 까지 5 cycle (EXE : 1 cycle) case 1 ) precise exceptions LW instruction은 C6에 WB를 수행하..

Advanced computer architecture 2024. 1. 11. 23:19

Scoreboarding

출처 https://wwang.github.io/teaching/slides/Comp_Arch/OoO_Scoreboard.pdf https://people.eecs.berkeley.edu/~kubitron/courses/cs252-S12/lectures/lec07-dynasched2.pdf Scoreboarding CPU -> OoO execution을 수행 ID stage를 2개의 stage로 쪼갬 1. Issue(IS) stage : instruction이 decode되고 function unit에서 실행되기 위해 issue됨 2. Read Operands(RD) stage : register로부터 혹은 다른 function unit으로부터 instruction이 source operand를 read..

Advanced computer architecture 2024. 1. 8. 06:29

[General Purpose GPU] ch3.2 TWO - LOOP APPROXIMATION ~ ch3.3 THREE-LOOP APPROXIM

*General - Purpose Graphics Processor Architecture의 chapter 3.2 TWO - LOOP APPROXIMATION ~ 3.3 THREE-LOOP APPROXIMATION 를 읽고 정리한 내용입니다. 앞서 one loop approximation은 single scheduler를 다루었다. GPU에서 latency를 hiding하려면 현재 실행되고 있는 instruction이 끝나지 않은 상황에서 다음 instruction을 issue할 수 있어야 한다. 하지만 one loop approximation은 scheduling logic이 thread identifier와 다음 instruction address에만 access할 수 있기 때문에 latency hi..

General Purpose GPU 2024. 1. 7. 03:28

Static pipeline with out-of-order execution completion(1)

5-stage pipeline independent한 load, store, ALU instruction을 수행한다. 여기서 instruction이 independent하다는 것은 register나 memory location 등 resource를 서로 공유하지 않는다는 것을 의미한다. 한 stage에서 다음 stage로 instruction이 이동할 때마다 instruction은 recode 되고 다시 recode된 instruction은 매 clock cycle 마다 clock signal에 따라 동작한다. data path의 주요 resource - instruction memory (cache) - register file - 2 read ports -1 write port - ALU - data..

Advanced computer architecture 2024. 1. 5. 08:41

이전 Prev 1 2 3 4 5 ··· 34 Next 다음

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Kim Seon Deok

목록전체 (168)

Kim Seon Deok

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역