Fact and Fiction Mix: Advanced computer architecture Assessment 1 (October 2013)

Part A

1) Protein string machine code has 4 days execution time on current machine doing integer instruction in 20% of time , doing I/O instruction in 35% of time and other instruction in the remaining time. Which is the better trade-off among the following two proposals?

(i) Compiler optimization that reduces the number of integer instructions by 25% (assume each integer instruction take same amount of time)
(ii)Hardware optimization that reduce the latency of each IO operation from 6µs to 5µs.

2) A computer architect need to design the pipeline of a new microprocessor. She has an example workload program with one million (10⁶) instructions. Each instruction takes 100ps (1ps = 10^-12 sec) to complete. How long does it take to execute this program in a non-pipelined processor? The current state-of-the-art microprocessor has about 20 pipeline stages. Assume it is perfectly pipelined. How much speedup will it achieve compared to no-pipelines processor?

3) How can a CPI < 1 be achieved? List the approaches used for it.

Part B

a) Explain Tomasulo's algorithm to overcome data hazzard using dymamic scheduling with neat diagram and an example code.

(9)

(OR)

b) Discuss the static and dynamic branch prediction techniques with suitable examples and diagrams

(9)

Analyse data dependencies among the following statemens:

S1: Load R1, 1024 /R1 ← 1024/

S2: Load R2, M(10) /R2 ← Memory(10) /

S3: Add R1, R2 /R1 ← (R1) + (R2) /

S4: Store M(1024) ,R1 /Memory(1024) ← (R1) /

S5: Store M((R2)) ,1024 /Memory(64) ← 1024/

Note that (Ri) means content of register Ri and Memory(10) contains 64 initially. Draw a dependence graph to show all the dependencies. Are there any resource dependencies if only one copy of each functional unit is available in CPU?

(OR)

Consider the following code. If all istructions take 2 cycle latency ,unroll this loop to (i) twice and (ii) four times and show how a VLIW capable of two loads and two adds per cycle can use the minimum number of registers ,in the absence of pipeline interruptions and stall.

Loop: LW R1,0(R2);

ADDI R5,R1,#1;

SW R1,0(R2);

ADDI R2,R2,#8;

SUB R4,R3,R2

BNZ R4,Loop