Home

SUN MICROELECTRONICS STP1030A handbook

image

Contents

1. A5 vsso VDDO EDATA 58 EDATA 1 44 EDATA 74 81 SYSADDAT14 AG MISC_BIDIR 9 VSSC VDDO EDATA 1 08 NODEX_RQ SYSADDR 15 A7 MISC_BIDIR 5 TDO EDATA 51 SC_RQ ADR_VLD VSSC A8 MISC_BIDIR 1 MISC_BIDIR 14 EDATA 48 XIR_L VSSO VSSO A9 EDATA 125 MISC_BIDIR 12 VSSO UDB_CEL VSSC SYSADDR 17 A10 EDATA 121 VSSO EDATA 22 UDB_UEL VDDC VSSC A11 EDATA 120 MISC_BIDIR 2 spare UDB_UEH VSSC EDATA 12 A12 EDATA 116 EDATA 127 VDDC EDATA 1 10 VDDO EDATA 13 A13 EDATA 113 VDDO VSSC EDATA 1 09 EDATA 73 VSSO A14 EDATA EDATA 1 22 VDDO VSSO EDATA 47 EDPAR 1 A15 EDATAI DATA 1 18 VSSO EDPAR 13 EDATAI72 SYSADDR 16 A16 VDDC L5CLK EDATA 1 07 SYSADDR 1 SYSADDR 19 At7__ EDATA EPD S_REPLY O SYSADDR 0 SYSADDR 18 A18 EDATA VSS_QUIET DATA_STALL SYSADDR 2 RAM_TEST A19 EDATA VDD_QUIET NODE_RQ 2 VDDO LOOPCAP A20 EDATA MISC_BIDIR 13 NODE_RQ 1 VDDC EDATA 11 A21 EDATA MISC_BIDIR 8 NODE_RQ 0 VDDC VDBO A22 EDPAR7 MISC_BIDIR 4 VSSC VDDC EDATA 9 A23 EDATA MISC_BIDIR O EDATA 1 06 VSSO EDATA 10 A24 EDATA VDDC VDDC EDATA 45 EDATA 8 A25 EDATA EDPAR 15 VDDO EDATA 46 UBBCLKA A26 EDATA EDATA 119 EDATA 1 05 SYSADDRI3 UBBCLKB A27 EDATA EDPAR 14 S_REPLY 2 SYSADDRI4 RESET_L A28 EDATA EDATA 112 S_REPLY 1 SYSADDRI5 CLKA A29 VSSO VDDC VDDC SYSADDRI6 VSS_PLL B4 VDDO EDPAR 11 VSSC VSSC VDDC B5 MISC_BIDIR 11 ED
2. UltraSPARC I DC Characteristics High level output voltage First Generation SPARC v9 64 Bit Microprocessor With VIS Vop Min lon Max Low level output voltage Vbo Min lo Max High level input voltage except CLKA CLKB UDBCLKA UDBCLKB LOOPCAP Vop Max High level input voltage CLKA CLKB UDBCLKA UDBCLKB Vbo Max Low level input voltage except CLKA CLKB UDBCLKA UDBCLKB LOOPCAP Vbo Min Low level input voltage CLKA CLKB UDBCLKA UDBCLKB LOOPCAP Vbo Min High level low level input voltage for LOOPCAP Pin should be grounded Supply current Vop Max freq 167 MHz Vbp Max freq 200MHz High impedance output current Vop Max Vo Vbo Vbo Max Vo Vss Input current Vop Max Vo Vss tO Von Input capacitance Output capacitance 1 STOP_CLK has no Vg specification 2 Only bidirectional lines can be three state Output only cannot be three state All bidirectional lines will be three stated when RESET_L is held LOW and SRAM_TEST is held high 3 This specification is provided as an aid to board design This specification is not assured during manufacturing testing 26 Powered by ICminer com Sun MICROELECTRONICS Electronic Library Service CopyRight 2003 July 1997 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A AC Characteristics
3. Clock input slew rate teve UDBCLK UDBCLK clock cycle time Divide by 2 love Mode CLK x2 love CLK x2 tovo UDBCLK UDBCLK clock cycle time Divide by 3 teye Mode CLK x3 teye CLK x3 ty UDBCLK UDBCLK duty cycle 40 40 ty RESET_L RESET pulse width LOCK MODE 10 10 ty RESET_L 1 This is for the PLL enabled AC Characteristics JTAG Timing Symbol ee a mi RESET pulse width BYPASS MODE tso CLK x3 tsu TRST_L Input setup time to TCK teye CLK x3 200MHz Max Min Typ Max teu TDI Input setup time to TCK tsu TMS Input setup time to TCK ta TRST_L Input hold time to TCK t TDI Input hold time to TCK tu TMS Input hold time to TCK teo TDO Output delay from TCK lo 8 mA lon 4 MA ton TDO Output hold time from TCK TDO C 35 pF n V LOAD 1 5V teve TCK TCK clock cycle time ty TCK TCK clock duty cycle 1 TDO is referenced from falling edge of TCK 28 Powered by ICminer com Sun MICROELECTRONICS July 1997 Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A TABLE 6 AC Characteristics TPD Output Capacitive Derating Factor 167 MHz 200 MHz Parameter Min Typ Max Min Typ Max Units 1 Derating factors are shown to aid in board design This specification is not verifi
4. DSYN_WR N wo wi w2 DOEL wo wi w2 ECAD X AO_data x Ai_data X A2_data X EDAT i i i i i X DO_data X D1_data X D2_data Figure 7 Coherent Writes with E to M Updates Overlap of Tag and Data Access Figure 8 shows the overlap of tag and data accesses The data for three previous writes W0 W1 and W2 is written while three tag accesses reads are made for three younger stores R3 R4 and R5 If the line is in Shared or Owned state then a read for ownership is performed before writing the data If the access is a miss then a line is victimized and the data is written after the new line is brought in discussed in a later section 2 oie Lee es TOE_L aN R3 R4 R5 ECAT X A3_tag x A4_tag X A5_tag X l l TDATA x D3_tag X D4 tag X D5 tag x DOEL A wo wi w2 A X AO_data X At_data x A2_data X EDATA X DO_data X Di_data X D2_data X Figure 8 Overlap Between Tag Access and Data Write for Coherent Writes July 1997 Sun MICROELECTRONICS 23 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS Coherent Read Followed by a Coherent Write When a read is made to the E Cache the three cycle latency causes the data bus to be busy two cycles after the address appears at the pins For a processor without delayed writes writes have to be h
5. I needs to drive SYSADR Connected to all other UltraSPARC I bus ports which share this address bus and the system Synchronous to system clock 3 3V UPA NODEX_RQ 1 3 3V UPA pins SYSADR 35 0 ADR_VLD NODE_RQ 2 0 SC_RQ S_REPLY 3 0 DATA_STALL P_REPLY 4 0 NODEX_RAQ STOP_CLOCK TDI TCK TMS TRST EXT_EVENT RESET XIR July 1997 Sun MICROELECTRONICS 17 Powered by ICminer com Electronic Library Service CopyRight 2003 STP1030A UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS Quick Pin Reference External Cache Interface Symbol EDATA 127 0 Name and Function E Cache data bus Connects UltraSPARC I to the E Cache data SRAMs and the UDB Synchronous to processor clock EDPAR 15 0 Data bus parity Odd parity is driven for all EDATA transfers and checked if UltraSPARC or the UDB is the receiver The most significant bit serves as the parity for the most significant byte of EDATA Synchronous to processor clock TDATA 24 0 Bidirectional data bus for E Cache tag SRAMs Bits 24 22 carry the MOESI state Dirty Exclusive Valid Bits 21 0 carry the physical address bits 40 19 This allows a minimum cache size of 512 Kilobytes All of the TDATA bits are used even when the E Cache is greater than 512 Kilobytes This is because there is no sizing in the tag compare for E Cache hit generation Synchronous to processor clock TPARI3 0 Bidirectional data
6. Signal Timing Except Clock and JTAG 2 4 2 2 ty Input setup SYSADDR 35 0 ADR_VLD Figure 12 ns time to CLK SCLK_MODE NODE_RQ 2 0 SC_RQ S_REPLY 3 0 ty a om ta mpi nol DATA_STALL EDATA 127 0 EDPART15 0 TDATA 24 0 TPAR 3 0 RESET_L XIR_L ns top Output delay SYSADDRJ 85 0 ADR_VLD ly 8 MA ns from CLK EDATA 1 27 0 EDPAR 15 0 low 4 mA TDATA 24 0 TPAR 3 0 tog Output hold C 35 pF f oH Ror BYTEWE_L ECAT 15 0 L549 P PR cik TSYN_WR_L TOE L Vilom 1 5V P_REPLY 4 0 NODEX_RQ te Output delay ECAD 17 0 DOE_L DSYN_WR ns from CLK ty Output hold ECAD 17 0 DOE_L DSYN_WR ns time from CLK teg Output delay STOP_CLK EPD ns from CLK tay Output hold EPD ns time from CLK tice PLL ups acquisition cycle time tang UDBCLK Figure 14 ns skew to CLK 1 All timing requirements are specified with PLL enabled 2 RESET_L is asserted asynchronously but deasserted synchronously 3 UDBCLK is before CLK as shown in Figure 14 July 1997 Sun MICROELECTRONICS 27 Powered by ICminer com Electronic Library Service CopyRight 2003 STP1030A UltraSPARC I AC Characteristics Clock Timing Symbol teve CLK First Generation SPARC v9 64 Bit Microprocessor With VIS 200Mhz Parameter Min Processor clock Tcycle time ty CLK Processor clock duty cycle tsrew CLK
7. bus for E Cache tag SRAMs Odd Parity for TDATA 24 0 TPAR 3 covers TDATA 24 22 TPAR 2 covers TDATA 21 16 TPAR 1 covers TDATA 15 8 TPAR O covers TDATA 7 0 Synchronous to processor clock BYTEWE_L 15 0 Byte write enables for synchronous pipelined E Cache SRAMs Bit 0 controls EDATA 127 1 20 Bit 15 controls EDATA 7 0 Byte write control is necessary because the first level data cache is write through Synchronous to processor clock ECAD 17 0 Address for E Cache data SRAMs Corresponds to physical address 21 4 Allows a maximum 4 Megabyte E Cache Synchronous to processor clock ECAT 15 0 Address for E Cache tag SRAMs Corresponds to physical address 21 6 Allows a maximum 4 Megabyte E Cache Synchronous to processor clock DSYN_WR_L Write enable for E Cache data SRAMs Active low Synchronous to processor clock DOE_L Active low for all SRAM data reads and writes Synchronous to processor clock TSYN_WR_L Write enable for E Cache tag SRAMs Active low Synchronous to processor clock TOEL 18 Active low for all tag data SRAM reads and writes Synchronous to processor clock Sun MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A Quick Pin Reference Clock Interface Symbol CLKA Name and Function This pin provides STP1030A with i
8. by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A Clocks Observability Reset etc JTAG etc E Cache Tag Address System Address 35 Parity 22 3 State 4 Parity UltraS PARC P_reply Processor E Cache Data Address 5 r E Cache S_reply Byte Write Enable Data 16 SRAM E Cache Data Bus UDB Control UltraSPARC Data Buffer Figure 3 Main UltraSPARC Interfaces 128 16 Parity System Data Bus 128 16 ECC Cache Coherence Protocol This section describes the protocol used to maintain coherency between the internal caches of UltraSPARC I on the one hand and the external cache and the system on the other Inclusion in the E Cache is maintained for both the I Cache and the D Cache All lines containing data cur rently held in the internal caches are in the external cache even when the caches are turned off The state of these lines forms a part of the tag kept in the external tag RAM The cache coherence protocol is point to point write invalidate It is based on the 5 MOESI states maintained in the E Cache tags of each master port that is each UltraSPARC 1 The E Cache tags have one of the follow ing five states MOESD Exclusively Modified M Shared Modified O Exclusive Clean E Shared Clean S Invalid 1 July 1997 Sun MICROELECTRONICS 7 Powered by ICminer com Electronic L
9. more rapidly since system arbitration and system throughput are hidden by the internal buffering of the UDB Overlapping of transactions is also possible which increases overall bandwidth Interrupt packets are han dled by the UDB which also generates and checks error correction code ECC The external cache consists of two parts E Cache TAG RAM which contains the physical tags of the cached lines and 3 bits of state information E Cache DATA RAM which contains the data for each cache line Both these parts can be built out of commodity SRAMs The parts operate synchronously with UltraSPARC I and so are known as Synchronous Static RAMs The external cache sizes supported by UltraSPARC I are 512 Kilobytes and 1 2 or 4 Megabytes The size of the cache is established at boot time by software Each byte in the RAMs is accompanied by a parity bit three bits for the tags and 16 bits for data The clients for the external cache are UltraSPARC I and the UDB More specifically for Ultra SPARC I they are the load buffer the store buffer the prefetch unit and the data buffer If the working set is too large for the D Cache it may still fit the E Cache So loads that miss the D Cache are sent to the E Cache All cacheable stores go to the E Cache the D Cache is write through but not necessarily in order with respect to load accesses All Cache misses generate a request for the E Cache The UltraSPARC Data Buffer returns data f
10. provides overlap processing during load and store misses For instance stores that hit the E Cache can proceed while a load miss is being processed The ECU is also capable of processing reads and writes indiscriminately without a costly turnaround penalty only 2 cycles The ECU also handles snoops Block loads and block stores load or store a 64 byte line of data from memory to the floating point register file These are also processed efficiently by the ECU providing high transfer bandwidth without polluting the internal or external caches Memory Interface Unit MIU The MIU handles all transactions with the system such as external cache misses interrupts snoops write backs and so forth The MIU communicates with the system at a frequency lower than that of UltraSPARC I either 1 2 or 1 3 July 1997 Sun MICROELECTRONICS 5 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS ULTRASPARC I SUBSYSTEMS Subsystem Description In the discussion which follows system refers to any other location within the same coherency domain as UltraSPARC I For instance the term includes caches of other processors connected to the interconnect A complete UltraSPARC I subsystem consists of one UltraSPARC processor synchronous SRAM compo nents for the external cache tags and data and the UltraSPARC Data Buffer UDB which has two ident
11. wide line Memory Management Unit MMU The MMU provides mapping between a 44 bit virtual address and a 41 bit physical address That is accom plished through a 64 entry iTLB for instructions and a 64 entry dTLB for data both fully associative UltraSPARC I provides hardware support for a software based TLB miss strategy A separate set of global registers is available to process an MMU trap Page sizes of 8 64 and 512 Kilobytes and 4 Megabytes are supported July 1997 Sun MICROELECTRONICS 3 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS Integer Execution Unit IEU Two arithmetic logic units ALUs form the main computational part of the IEU An early out multi cycle integer multiplier and a multi cycle integer divider are also part of the IEU Eight register windows and four sets of global registers are provided normal alternate MMU and interrupt globals The trap registers UltraSPARC I supports five levels of traps are part of the IEU Load Store Unit LSU The LSU is responsible for generating the virtual address of all loads and stores including atomics and ASI loads for accessing the data cache for decoupling load misses from the pipe through the load buffer and for decoupling the stores through a store buffer One load or one store can be issued per cycle Data Cache D Cache The data cache is a write th
12. 2 decompression at full broadcast quality with no additional hardware support Features SPARC V9 Architecture Compliant 2 6 Gigabyte sec Processor Cache Bandwidth 167 Binary Compatible with all SPARC Application code MHz Multimedia Capable Visual Instruction Set VIS 3 2 Gigabyte sec Processor Cache Bandwidth 200 MHz Block Load Store Instructions Multi Processing Support Glueless 4 Processor Connection Minimum Latency Snooping or Directory Based Protocol Support 1 3 Gigabyte sec Processor Memory Bandwidth 167 4 way SuperScalar Design with 9 Execution Units MHz i 1 6 Gigabyte sec Processor Memory Bandwidth 200 4 Integer Execution Units MHz 3 Floating Point Execution Units 2 Graphics Execution Units Selectable Little or Big Endian Byte Ordering 64 Bit Address Pointers 16 Kilobyte Non Blocking Data Cache 16 Kilobyte Instruction Cache In Cache 2 bit Branch Prediction Single Cycle Branch Following Ease of Use JTAG Boundary Scan Performance Instrumentation Technology Packaging 0 4um 4 Layer Metal CMOS Process Operates at 3 3V 521 Pin Plastic Ball Grid Array BGA Power Management Integrated Second Level Cache Controller Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS Prefetch and Dispatch Unit PDU Memory Management Unit MMU Instruct
13. 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 1 w 324 h wy lt oug ZOPIE gt SQUgR mourysodcl gt xNmgaudt l LLT ooooej O0OOOO000o00o000opOo000000000000000 Notes 1 Dimensions in mm 3 Primary datum C and seating plane are defined by the spherical crowns of the solder balls 33 Sun MICROELECTRONICS July 1997 ight 2003 CopyR Library Service 1c lectroni fn Ei Powered by ICminer com UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS ORDERING INFORMATION Part Number Speeds Description STP1030ABGA 167 167 MHz First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030ABGA 200 200 MHz First Generation SPARC v9 64 Bit Microprocessor With VIS Document Part Number 802 7432 02 34 Sun MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003
14. ARC v9 64 Bit Microprocessor With VIS STP1030A ELECTRICAL SPECIFICATIONS Absolute Maximum Ratings Symbol Units Vop Supply voltage range 0 to 4 0 V Vi Input voltage range 0 5 to Vbo 0 5 V Vo Output voltage range 0 5 to Vpop 0 5 V lk Input clamp current V lt 0 or V gt Veo mA low Output clamp current Vo lt 0 or Vo gt Veo mA lon Current into any output in the low state mA Tere Storage temperature 40 to 150 2C 1 Operation of the device at values in excess of those listed above will result in degradation or destruction of the device All voltages are defined with respect to ground Functional operation of the device at these or any other conditions beyond those indicated under recommended operating conditions is not implied Exposure to absolute maximum rated conditions for extended periods may affect device reliability Recommended Operating Conditions 3 3 Vbo Supply voltage 3 2 Ves Ground Va High level input voltage All except CLK 2 0 CLK 2 4 Vi Low level input voltage All except CLK Vi CLK High level output current Low level output current T Operating junction temperature Ta Operating ambient temperature 1 Maximum ambient temperature is limited by air flow such that the maximum junction temperature does not exceed T July 1997 Sun MICROELECTRONICS 25 Powered by ICminer com Electronic Library Service CopyRight 2003 STP1030A
15. ATA S_REPLY VSSC VSSC B6 MISC_BIDIR 10 EDATA VSSC EDPAR 5 VSSO B7 MISC_BIDIRI6 VDDC VSSO EDATA 44 EDATA 103 B8 VDDO EDPAR EDATA 104 VDDO EDATA 102 B9 EDATA 126 EDATA EDATA 7 9 EDATA 41 PLLBYPASS B10 EDATA 124 VDDC EDATA 78 SYSADDRI9 TRST_L B11 VSSO EDATA P_REPLY VSSO CLKB Bi2 EDATA 117 EDATA P_REPLY SYSADDR 7 SCLK_MODE B13 EDATA 114 P_REPLY SYSADDR 10 VDD_PLL B14 VDDO MISC_BIDIR 7 EDATA VSSC SYSADDRI8 EDATA 101 B15 EDATA 92 MISC_BIDIR 3 EDATA VSSC EDATA 15 EDPAR 12 B16 VSSC VSSO EDATA VDDC VSSO EDATA 100 B17 VSSO VSSC Spare VSSC EDATA 42 VDDO B18 EDATA 88 EDATA 123 EDATA 18 VDDO EDATA 43 EDATA 99 B19 EDATAS7 85 VDDO EDATA 16 EDATA 76 EDATA 40 SYSADDR 20 B20 VDDO EDATA 115 EDATA 17 EDATA 77 SYSADDR 13 VDDO B21 EDATA 62 EDATA 95 VSSO P_REPLY 3 SYSADDRI11 SYSADDR 21 B22 EDATA 60 VSSO UDBB_CEH VDDO VDDO SYSADDR 23 B23 VSSO VDDC TMS P_REPLY 4 SYSADDR 12 SYSADDR 22 B24 EDATA 53 VSSC TCK VSSC VDDC EDATA 98 B25 EDATA 50 VDDO TDI VDDC VDDC VSSO B26 VDDO VSSC EXT_EVENT VDDC VDDC EDATA 97 B27 EDPAR 3 EDATA 83 VSSC EDATA 75 VDDO EDATA 96 B28 EDATA 24 VSSO VDDO EDPART S VSSC EDATA 71 B29 EDATA 21 VSSC VDDC VSSO EDATA 14 SYSADDR 24 x2 _ SYSADDR 25 Vsso TDATAL12 VDDC ECADP ECATI6 X3 VSSO EDATA 37 VDDC VSSC VDDO VDDO x4 SYSADDRI26 EDATA 36 TDATAL18 VSSO VDDC ECAT 13 x5 _ SYSADDR 27 TDATA 3 TDATA 21 VDDG VDDG BYTEWE_L 0 x29 VDDC TDATA 4 TDATA 24 ECAD 3 Vsso V
16. EQ P_SACK P_SACKD memory is not updated as opposed to M gt S followed byS_ CRAB i A Modified line is victimized by the processor P_WRB_REQ S_WAB or S_WBCAN M O gt Writeback if system takes ownership before completing Writeback ii Request from system to copyback and invalidate this S_CPI_REQ P_SACK P_SACKD line store miss from another processor followed by S_CRAB iii Request from system to invalidate this line block store S_INV_REQ P_SACK P_SACKD from another processor M O gt S Request from another processor to read this line memory S_CPB_MSI_REQ P_SACK P_SACKD is updated so line becomes clean c f M gt O followed byS_ CRAB O M Store hit atomic hit to Modified line PREFETCH P_RDO_REQ S_OAK July 1997 Sun MICROELECTRONICS 9 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS UltraSPARC as a Bus Port The UltraSPARC I Port Architecture defines protocols for a family of tightly coupled cache consistent shared memory multiprocessor systems The UltraSPARC I bus provides low latency to memory as well as high bandwidth and fast microprocessor data sharing UltraSPARC I bus transactions are carried over a packet switched bus with independent scheduling of separate and possibly multiple address and data buses The UltraSPARC I processor and its cache subsystem including the UDB form an UltraSPARC I bus mod ule T
17. ICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A P_NCRD_REQ Noncached Read The only slave read transaction that should be sent to UltraSPARC I UltraSPARC I responds to this request by sending the value of its bus port ID to the data bus The transaction starts as a P NCRD_REQ from an UltraSPARC I bus master and is forwarded by the system controller to UltraSPARC I UltraSPARC I replies through a P_RAS The system controller issues an S_SRS to drive the data on SYSDATA Finally the requesting master gets the data when the system controller issues an S_RAS P_INT_REQ Interrupt Interrupt transaction request packet sent by another UltraSPARC I In parallel the 64 byte interrupt data packet is placed on SYSDATA and S_SWIB is sent to instruct the UltraSPARC Data Buffer to accept the data Responses to Transactions Initiated by the System P_Reply P_reply is an acknowledgment from UltraSPARC to the system in response to a request that the system sent to UltraSPARC I previously There are five unidirectional output only pins on UltraSPARC I connected directly to the system P_IDLE Idle This is the default state of the wires It indicates no reply P_SNACK Non Existent Block Reply by UltraSPARC indicating that the requested block from a snoop does not exist in the external cache only set when DTAGs are no
18. O0 O0 STP1030ABGA 167 0 0 Sun MICROELECTRONICS July 1997 UltraSPARC DATA SHEET First Generation SPARC v9 64 Bit Microprocessor With VIS DESCRIPTION The STP1030A UltraSPARC I is a high performance highly integrated superscalar processor implementing the SPARC V9 64 bit RISC architecture The STP1030A is capable of sustaining the execution of up to four instructions per cycle even in the presence of conditional branches and cache misses This sustained perfor mance is supported by a decoupled Prefetch and Dispatch Unit PDU with an Instruction Buffer to feed the Execution Unit Load buffers on the input side of the Execution Unit together with store buffers on the output side completely decouple pipeline execution from data cache misses Instructions predicted to be executed are issued in program order to multiple functional units and execute in parallel Such predictively issued instructions can complete out of order To further increase the number of instructions executed per cycle instructions from different blocks for instance those before and after a conditional branch can be issued in the same group The STP1030A supports 2D as well as 3D graphics image processing video compression and decompression and video effects through the sophisticated Visual Instruction Set VIS VIS provides high levels of multime dia performance including real time H 261 video compression decompression and a single stream of MPEG
19. REPLY 5 UPA_S_REPLY 5 UPA_SnoopBus 36 UPA_SnoopCnitl 13 UPA_Port_ID 5 UPA_Reset RESET UPA_Sys_Clk 2 CLKA CLKB UPA_ClkCnil 4 UPA_JTAG 4 UPA_Slave_INT UPA_Mode UPA_Wakeup_Reset UltraSPARC 1 is both a bus master and a bus slave As a bus master UltraSPARC I issues read write transac tions to the interconnect using part of the UltraSPARC I bus transaction set UltraSPARC I splits transactions into two independent classes Class 0 contains read transactions due to cache misses and block loads e Class 1 contains writeback requests write invalidate requests block stores interrupt requests and non cached read write requests Transactions in each class are strongly ordered by the interconnect As a bus master UltraSPARC I has a physically addressed coherent cache the E Cache which participates in the MOESI cache coherence protocol and responds to the interconnect for copyback and invalidation requests As a bus slave UltraSPARC I responds to a noncached read of its bus port ID UltraSPARC is both an interrupter and an interrupt receiver It has the capability to generate interrupt pack ets to other UltraSPARC I bus interrupt receivers and it can receive interrupts coming from other interrupters July 1997 Sun MICROELECTRONICS 11 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor W
20. SO VSSO VSSC VSsC BYTEWE_L 2 AA3 STOP_CLOCK VDDC VDDO VDDO BYTEWE_L 6 AAA TDATA 1 VSSC VSSC ECAD 16 BYTEWE_L 8 AAS _ TDATA 2 vssc VSSC UDB_CNTL 0 VDDO AA29 VDDC VSSC VSSO VSSO AA30 VSSC VDDO ECADJ6 ECATI4 32 Vop Veg Core Vop V gg for the Inpu to these planes Vop Vgg Out Vop Vgg Outputs on the die have their own separate planes which are accessed through the Vpop Vss OUT pins Vop Vss Quiet PLL and PECL are bonded directly to the package pins Powered by ICminer com Sun MICROELECTRONICS Electronic Library Service CopyRight 2003 Gore and Memory are tied together on the die to the same power plane The V Vgg core pins are attached July 1997 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS lt i iw T ai 2 PACKAGE DIMENSIONS 256 Pin PBGA Package day 8E 6g 33H Z9 EL x 19 52 E 520x 0 60 0 10 o i go 17 o w EE oogt 5 c HN x LL gg 4 n Toe a lt Q Ge E D T o 9 E 2 w mel gt i G m za WE ira G4 a gt ag 3 e ro oO A o H S aE 2 o a o is E z g 7 Q eg o 9 ay f g fo gt N oH o ol H Pr T 5 0 o D gt 8 3 a ig 2 d lt 2 S 0 GAIA S N aS ol is E i 13 e o aS Sits wy v i Ps a FF FS I CP OP a e e Q p 3 i cje 3 I i fo x a E d 3 E gt f E w gel 2 a a G Oo A 2
21. SSO July 1997 Sun MICROELECTRONICS 31 Powered by ICminer com Electronic Library Service CopyRight 2003 STP1030A UltraSPARC I PIN ASSIGNMENTS CONTINUED 1 PI First Generation SPARC v9 64 Bit Microprocessor With VIS Pin Pin Name X30 ECAD 7 BYTEWE_L 5 X31 UDB_CNTL 4 BYTEWE _L 10 x32 EDATA 70 ECAD 1 ECAD 1 VDDO BYTEWE _L 14 X33 EDATA 69 ECAD 2 VSSO VDDC VSSO Yi SYSADDR 28 VSSC VSSC ECAT 9 VDDO Y2 SYSADDR 30 VDDC VDDC VSSO TDATA 1 9 Y3 SYSADDR 31 VDDC VDDO ECAT 15 TDATA 22 Y4 VDDO VSSC UDB_CNTL 3 VSSC TOE_L Y5 SYSADDR 32 VDDC ECAT 1 VDDO TSYN_WR_L y29 EDATA 68 ECAD 15 VSSO BYTEWE_L 11 ECAD O Y30 EDPAR 8 VSSC ECAT 7 BYTEWE_L 12 VDDC Y31 EDATA 67 UDB_CNTL 2 ECAT 12 VDDC ECAD 5 Y32__ VSSO ECAT 2 VDDO VDDO ECADJ 8 Y33 EDATA 66 ECATI5 BYTEWE_L 3 VSSO ECAD 10 Zi SYSADDR 29 VSSC BYTEWE_LI7 TDATA 17 ECAD 13 z2 SYSADDR 33 ECAT 11 VSSO TPARP ECAD 12 z3 SYSADDR 34 VDDC BYTEWE_L 15 TDATAL23 VSSC Z4 VDD_PECL BYTEWE U4 VSSC VSSO ECAD 14 z5 TDATA BYTEWE_L 9 TEMP_SEN 0 TPAR 3 UDB_CNTL 1 229 EDATA BYTEWE_L 13 VDDO DSYN_WR_L ECAT O Z30 VDDO TEMP_SEN 1 VDDC VDDO ECAT 3 Z31 EDATA VSSC PM_OUT ECAD 4 ECATT8 z3e EDATAT EDATA O SPARE ECAD 7 ECAT 10 Z33 EDATA EDATA 1 VDDC VSSO ECAT 14 AA1 SYSADDR 35 VSSO TDATA 20 VSSC BYTEWE_L 1 AA2 VS
22. _CRAB Copyback Read Block Acknowledge The system commands the UDB to drive 64 bytes of copyback data onto SYSDATA This follows a P_SACK or P_SACKD from UltraSPARC indicating the copyback data to be ready in the UDB e S_SWIB Interrupt Write Block Acknowledge The system commands the UDB to accept 64 bytes of interrupt data from SYSDATA In parallel the P_LINT_REQ packet that was initiated by the interrupting UltraSPARC is sent to the target UltraSPARC I on SYSADDR e S_ WBCAN Writeback Cancel Acknowledge The system generates this to cancel a previous writeback by UltraSPARC I P_WRB_REQ S_INAK Interrupt NACK The system generates this if the receiver of an UltraSPARC interrupt request P_INT_REQ cannot accept another interrupt packet at the moment This reply effectively removes the interrupt packet from the UDB queue software on the originator should retry later This is the only transaction that is NACK ed by the system S_INAK sets a bit in an ASI register on UltraSPARC see UltraSPARC User s Manual TABLE 5 S_REPLY Encoding S REPLY Name Reply to Which Transaction Type S IDLE Idle Default State 0000 S_ERR Error Report Error to Master 0001 S_CRAB Coherent Read Acknowledge block To slave for P_CRAB reply 0010 S_WBCAN Writeback Cancel To Master for P_WRB_REQ 0011 S_ WAS Write Acknowledge Single To Master for P_NCWR_REQ 0100 S_WAB Write Acknowledge Block To Master for any block
23. are support for two dimensional and three dimensional image and video processing image compression audio processing and similar functions Sixteen bit and 32 bit partitioned add boolean and compare are provided Eight bit and 16 bit partitioned multiplies are supported Single cycle pixel distance data alignment packing and merge operations are all supported in the GRU External Cache Unit ECU The main role of the ECU is to handle I Cache and D Cache misses efficiently The ECU can handle one access per cycle to the external cache E Cache Accesses to the external cache are pipelined take three cycles pin to pin and return 16 bytes of instructions or data per cycle This can effectively make the external cache a part of the pipeline Programs with large data sets data can keep data in the external cache and schedule instructions with load latencies based on the E Cache latency Floating point applications can use this feature to effectively hide D Cache misses The size of the external cache can be 512 Kilobytes or 1 or 2 or 4 Megabytes but the line size is always 64 bytes A MOESI protocol modified owned exclusive shared invalid see Cache Coherence Protocol on page 7 helps to maintain coherency across the system 4 Sun MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A The ECU
24. d to logic one when not driven IEEE 1149 test reset input active low This pin is internally pulled to logic one when not driven Symbol TDO TDI TCK TMS TRST_L RAM_TEST When asserted this pin forces the processor into SRAM test mode allowing direct access to the cache SRAMs for memory testing MISC_BIDIR 14 0 These are miscellaneous bidirectional signals used for test debug and instrumentation Some of them are used to improve internal operation observability such as pipeline monitoring signals Their exact functions are TBD EXT_EVENT This is an open drain bidirectional signal used to indicate the clock should be stopped This signal is wired or with the other EXT_EVENT signals from other devices so that once one is activated all are activated It is a debug signal which is set inactive on production systems PM_OUT Used for on chip process monitors reserved for IC manufacturing use TEMP_SEN 1 0 July 1997 Defines the end points of the temperature sense element on the module used to measure the processor temperature reserved for IC manufacturing use Sun MICROELECTRONICS 19 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS Quick Pin Reference Initialization Interface Symbol Name and Function RESET_L Driven for POR power on resets Asserted asynchronously Deasserted synch
25. e turn around penalty when reads are immediately followed by writes discussed in Coherent Read Followed by a Coherent Write on page 24 Cer LIJT LI LI LI LI LI LI LI LI l TSYN_WRLL 2 Ro 2 Ai 2 z roe w Rt Re con SEGRE Ae A Ace X Ae EE ED EDATA l l X DO_data X D1_data X D2_data DSYN_WR_L Figure 6 Coherent Write Hit to M State Line If the line is in exclusive state then the tag is updated to Modified at the same time as the data is written as shown in Figure 7 Otherwise the tag port is available for a tag check of a younger store during the data write In the timing diagram the store buffer is empty when the first write request is made That is why there is no overlap between the tag accesses and the write accesses In normal operation the tag access for one write can be done in parallel with the data write of the previous write This independence of the tag and data buses make the peak store bandwidth as high as the load bandwidth one per cycle 22 SuN MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A TSYN_WR_L Ro Ro R2 N uw u u2 if TOE L Ro R1 R2 o i uo u o u2 ECAT X AO tag x At_tag x A2_tag X AO tag Xs TDATA X DO_tag X D1_tag XC D2 tag C X Doa X Diag X D2_tag
26. ed during manufacturing testing TABLE 7 Thermal Resistance vs Air Flow Air Flow ft min Symbol 200 4 800 Units 00 600 Thal EO U a 1 Ty can be calculated by T T Pp x Thetay Thermal resistance measured using UltraSPARC heatsink P Power Dissipation PARAMETER MEASUREMENT From Output Under Test Vioap T Figure 10 Load Circuit Clock In Phase Output Out of Phase Output Figure 11 Voltage Waveforms Propagation Delay Times July 1997 Sun MICROELECTRONICS 29 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS Clock Data Input High Level 2 0 CLK Pulse Width 1 5V TCK Low Level 2 0V CLK Pulse Width 1 5V TCK CLOCK toxcw UDBCLK Figure 14 Voltage Waveforms Clock Skew RESET_L tew tock TRST_L l Note LOCK Mode RESET_L must be held low for Tpw Tock BYPASS Mode RESET_L must be held low for Tow Figure 15 Reset Timing 30 SuN MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A PIN ASSIGNMENTS 1 I Pin _PinName Pin PinName Pin _ PinName _ Pin PinName Pin _PinName Pin _PinName
27. eld for two cycles in order to avoid collisions between the data of the write and the data coming back from the read Additionally an extra cycle is necessary to switch the driver of the E Cache data bus from the SRAMs to UltraSPARC I due to electrical considerations UltraSPARC I uses a one deep write buffer in the data SRAMs to reduce the turnaround penalty going from reads to writes to two cycles The data of a write is sent one cycle after the address Figure 9 Note that there is no penalty for going from writes to reads Figure 9 shows the two cycle penalty between reads and writes The figure represents three reads followed by two writes and two tag updates The two cycle penalty applies to both tag accesses and data accesses two dead cycles between A2_tag and A3_tag as well as between A2_data and A3_data CLK PL Ui wu TSYN_WR_L Ro Ri R2 TOE_L i i i l l l ECAT X AO tag X Al tag X A2 tag X l X ata X atas X i i i TDATA X DO_tag X Di_tag X D2_tag X X D3_tag X D4_tag X w4 DSYN_WR_L wo w4 Ro Ro R2 DOEL ECAD X AO_data X A1_data X A2_data X l X A3_data X A4 data X EDATA X DO aata X Daa X E X X D3 data X D4 data X Figure 9 Reads Followed by Writes Turn Around Penalty 24 SuN MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SP
28. es of data from the UDB to be put on SYSDATA e S_WAB Write Acknowledge Block Generated by the system following a noncacheable block store P_NCBWR_REQ a writeback request P_WRB_REQ or a write invalidate request during a block store with invalidate P_WRI_REQ It causes 64 bytes of data to be put on SYSDATA e S OAK Ownership Acknowledged Block Generated by the system when UltraSPARC I wants permission to write to a block that is already in the E Cache No data transfer occurs e S_RBU Read Block Unshared Acknowledge The system commands the input data queue of the UDB to accept 64 bytes of unshared or noncached data from SYSDATA This is in response to a P_RDS_REQ a P_RDO_REQ or a P_NCBRD_REQ e S_RBS Read Block Shared Acknowledge The system commands the input data queue of the UDB to accept 64 bytes of shared data from SYSDATA This is a response to a P_RDS_REQ or a P_RDSA_REO e S_RAS Read Acknowledge Single The system commands the UDB to accept 16 bytes of data from SYSDATA This is a response to a noncacheable read request P_LNCRD_REQ 14 Powered by ICminer com Sun MICROELECTRONICS July 1997 Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A S_SRS Read Single Acknowledge The system commands the UDB to drive 16 bytes of data onto SYSDATA This follows a P_RAS from UltraSPARC I indicating the data to be ready in the UDB e S
29. his module interfaces to the interconnect using the UltraSPARC bus interface definition called the UltraSPARC Port Architecture UPA For more information refer to the manual entitled UPA Interconnect Architecture The I O subsystem and the graphics subsystem may also reside on an UltraSPARC I bus module The physical connections between UltraSPARC I and the UltraSPARC I bus mainly consist of the following a bidirectional address bus for transaction requests between UltraSPARC_I and the UltraSPARC I bus interface two unidirectional one incoming one outgoing reply buses for flow control a bidirectional request for a distributed address bus arbitration scheme The snoop bus although not part of the UltraSPARC I module is present and is used to manage duplicate tags and for efficient data sharing 10 Sun MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A Table 3 shows the UltraSPARC I Port interface as specified in the UPA Interconnect Architecture and the corre sponding pins for UltraSPARC I TABLE 3 UPA Port Interface UPA Port Interface UPA_DataBus 144 UltraSPARC_I Interface EDATA 127 0 EDPAR 15 0 UPA_ECC_Valid 2 UPA_AddressBus 37 UPA_Addr_Valid UPA_Addr_Arb 5 SYSADDRI38 ADR_VLD NODE_RQ 2 0 REQUEST_OUT SC_RQ P_REPLY 5 REPLY 3 0 UPA_P_
30. ibrary Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS Three bits in the tag RAM define the state of each line as follows TABLE 1 External Cache Coherency State Definition State Bit STATE Modified Exclusive Invalid 1 0 Shared Clean 8 0 Exclusive Clean E 1 Shared Modified O 0 Exclusively Modified M The cache coherence protocol operates only on physically indexed physically tagged PIPT writeback caches The unit of cache coherence is a block size of 64 bytes which corresponds to one E Cache line Coherent read write transactions transfer data in 64 byte blocks only using 4 quadwords The state diagram representing the allowed transactions is shown in Figure 4 GINS Figure 4 Cache Coherency Protocol State Diagram Table 2 describes all the transitions between the states as shown in Figure 4 It also shows the transactions that are initiated by either UltraSPARC I or the system and the acknowledgment that is expected after completion of each transaction 8 Sun MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS TABLE 2 Transitions Allowed for Cache Coherency Protocol Transition Req Description to from Port STP1030A Transition Acknowledgment I gt E Load m
31. ical UltraSPARC data buffer microchips The UDB isolates the E Cache from the system and provides data buffers for incoming and outgoing system transactions UDB also provides ECC generation and checking Tag Address Data Address UltraSPARC Processor UDB System Data E Cache 3 System Address Figure 2 UltraSPARC I System Interface Overview of UltraSPARC I Interface The main interfaces to and from UltraSPARC I are shown in Figure 3 A typical module includes an external cache composed of the tag unit and the data unit Both of them can be implemented using commodity SRAMs Separate address and data buses are provided from and to the tag and data SRAMs for increased per formance The main role of the UDB is to isolate UltraSPARC I and its external cache from the main system data bus so that the interface can operate at processor speed reduced capacitance loading The data buffer also provides overlap between system transactions and local E Cache transactions even when the latter need to use part of the data buffer The logic to control the UDB is included on UltraSPARC I to provide fast data transfers between UltraSPARC I and the external cache and the system A separate address bus and separate control signals are provided for supporting system transactions Clock signals reset pins observability pins and JTAG support are also part of UltraSPARC I interfaces discussed here 6 Sun MICROELECTRONICS July 1997 Powered
32. ion Cache and Buffer Load Store Unit LSU Grouping Logic f Integer Reg and Annex Integer Execution Unit IEU Data Cache Load Queue Store Queue a f 7 External Floating Point Unit FPU External Cache Unit ECU k Cache RAM FP multiply FP add FP Reg FP divide a Memory Interface Unit MIU Graphics Unit GRU UltraSPARC Bus Figure 1 Functional Block Diagram 2 Sun MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A TECHNICAL OVERVIEW In a single chip implementation UltraSPARC I integrates the following components see Figure 1 Prefetch branch prediction and dispatch unit PDU e 16 Kilobyte instruction cache I Cache e Memory management unit MMU containing two 64 entry buffers a 64 entry instruction translation lookaside buffer iTLB a 64 entry data translation lookaside buffer dTLB e Integer execution unit IEU with two arithmetic logic units ALUs e Load and store unit with a separate address generation adder Load buffer and store buffer decoupling data accesses from the pipeline e 16 Kilobyte data cache D Cache e Floating point unit FPU with independent add multiply and divide square root sub units Graphics unit GRU composed of two independent execution pipelines e External cache E Cache cont
33. iss data coming from memory to an invalid line P_RDS_REQ S_RBU no other cache has the data I s Load miss data provided by another cache or memory to P_RDS_REQ S_RBS an invalid line another cache has the data or Cache mee P_RDSA_REQ S RBS I gt M Store miss atomic miss on invalid line P_RDO_REQ S_RBU E M Store hit or atomic hit to Exclusive Clean line No Transaction No Transaction E gt S Request from the system to share this line load miss from S_CPB_REQ P_SACK P_SACKD another processor S CPB MSI REQ followed by S_CRAB E gt lI i A clean line is victimized by the processor P_RDS_REQ S_RBU or S_RBS or l Cache miss P_RDSA_REQ S_RBS or Write miss P_RDO_REQ S_RBU ii Request from system to copyback and invalidate this S_CPI_REQ P_SACK P_SACKD line store miss from another processor followed by S_CRAB iii Request from SC to invalidate this line block store S_INV_REQ P_SACK P_SACKD from another processor S gt M Store hit atomic hit to Shared Clean line P_RDO_REQ S_OAK gt l i A Shared Clean line is victimized by UltraSPARC P_RDS_REQ S_RBU or S_RBS or Cache miss P_RDSA_REQ S_RBS or Write hit on shared line P_RDO_REQ S_RBU ii Another processor wants to write this shared line S_INV_REQ P_SACK P_SACKD or S_CPI_LREQ P_SACK P_SACKD followed by S_CRAB iii Request from SC to invalidate this line block store S_INV_REQ P_SACK P_SACKD from another processor M gt O Request from another processor to read a modified line S_CPB_R
34. ith VIS UPA Transactions Supported by UltraSPARC I Transactions Initiated by UltraSPARC I The UltraSPARC I bus transactions initiated by UltraSPARC are sent off through the system address bus Four bits in the packet identifying the transaction type are encoded according to the UltraSPARC I bus defi nition of the corresponding transaction P_RDS_REO Read To Share Coherent read with intent to share UltraSPARC I issues this in response to a load miss e P_RDSA_REO Read to Share Always Coherent read with intent to share always UltraSPARC I issues this in response to an E Cache miss generated by an instruction fetch P_RDO_REQ Read to Own Coherent read with invalidate UltraSPARC I issues this in response to a store miss a store hit on a shared line or a read with intent to write for merging partial writes such as for read modify writes P_RDD_REO Read to Discard Coherent read with no intent to cache the data UltraSPARC I issues this during block loads P_WRB_ REO Writeback Generated when a dirty victimized block from the E Cache must be written back to its home location The writeback is associated with a prior coherent read transaction to the same E Cache location e P_WRI_REO Write Invalidate Coherent write and invalidate request UltraSPARC issues this during a block store e P_INT_REO Interrupt Interrupt transaction request packet UltraSPARC I issues this to deliver a 64 byte interrup
35. lock with a 50 duty cycle The transitions repre sented in the diagrams show what occurs at the pins of UltraSPARC I The position of the transitions relative to the clock transitions is correct but not drawn to scale for instance a set up time of 1 ns is represented by showing the transition of the incoming signal changing slightly before the rising edge of the clock Coherent Read Hit Coherent reads that hit the E Cache are represented in Figure 5 With UltraSPARC I there is no difference between burst reads and two consecutive reads The signals used for a single read are simply duplicated for each subsequent read The timing diagram shows three consecutive reads that hit the E Cache The control signals TSYN_WR_L TOE_L and the address for the tag read ECAT as well as the control signals DSYN_WR_L DOE_L and the address for the data ECAD are shown to transition shortly after the rising edge of the clock Two cycles later the data for both the tag read and data read is back at the pins of the CPU shortly before the next rising edge This meets set up time and clock skew Notice that the reads are fully pipelined and thus full throughput is achieved There are three requests made before the data of the first request comes back and the latency of each request is three cycles of T LJ LOI I LCI LI LI LI LJ ZO D O O O O D TSYN_WR_L i RO R1 R2 i TOEL Ro Ri R2 ECAT X AO tag X Al tag X A2
36. raSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS TABLE 4 P_REPLY Encoding P_IDLE 1 Idle Default State 00000 P_FERR Fatal Error All transactions any time 0100 P_RERR Read Data Error P_NCBRD_REQ 0101 P_SNACK Coherent S_REQ Non Existent ACK S_REQ 0111 P_RAS Read Acknowledge Single P_NCRD_REQ 1000 P_SACK Coherent S_REQ ACK AE 1010 P_IAK Interrupt Acknowledge INT_ 1100 P_SACKD Coherent S_REQ Dirty Victim ACK fe 1101 The class values are indicated as follows 0 hardwired to 0 e X don t care e C Copied from the associated P_REQ packet System Responses S_REPLY to Transaction Request P_REQ or Acknowledgment P_REPLY from UltraSPARC I This is also a unidirectional point to point connection between the system and UltraSPARC I e S IDLE Idle The default state of the wires It indicates no reply e S_RTO Read Time Out Forwards the Read Time Out P_RTO reply from the slave that UltraSPARC I tried to access Note that time outs on writes are reported asynchronously via interrupt by the detecting slave UltraSPARC I S_ERR Error Asserted by the system if the situations described in the UltraSPARC I bus specification occur See the manual on UPA Interconnect Architecture e S_WAS Write Acknowledge Single Generated by the system following a noncacheable write request from UltraSPARC I P_NCWR_REO It causes 16 byt
37. rol unit e Memory interface unit responsible for main memory and I O accesses Prefetch and Dispatch Unit PDU The prefetch and dispatch unit fetches instructions before they are actually needed in the pipeline so the exe cution units do not starve Instructions can be prefetched from all levels of the memory hierarchy including the instruction cache the external cache and the main memory In order to prefetch across conditional branches a dynamic branch prediction scheme is implemented in hardware The outcome of a branch is based on a two bit history of the branch A next field associated with groups of four instructions in the instruction cache I Cache points to the next I Cache line to be fetched The use of the next field makes it pos sible to follow taken branches and basically provides the same instruction bandwidth achieved while running sequential code Prefetched instructions are stored in the instruction buffer until they are sent to the rest of the pipeline Up to 12 instructions can be buffered Instruction Cache 1 Cache The instruction cache is a 16 Kilobyte pseudo two way set associative cache with 32 byte blocks The cache is physically indexed and contains physical tags The set is predicted as part of the next field so that only the index bits of an address are necessary to address the cache 13 bits which matches the minimum page size The instruction cache returns up to 4 instructions from an 8 instruction
38. rom main memory during an E Cache miss or a load to noncacheable locations Writebacks the process of writing a dirty line back to memory before a fill generate data transfers from the E Cache to the UDB con trolled entirely by the CPU Copybacks responses to snoop hits also generate transfers from the E Cache to the UDB Each UltraSPARC data buffer microchip has a 4 entry by 16 byte read buffer that can hold a 64 byte line com ing from main memory due to an E Cache read miss or a non cacheable read The outgoing buffer that is the buffer receiving data from the UltraSPARC I and sending it to the system is divided into three parts There is an 8 x 16 byte writeback buffer an 8 x 16 byte non cacheable store buffer and a 4 x 16 byte snoop buffer Note that the writeback buffer can be snooped consequently internal bypass is provided to send the writeback data to the port requesting the snoop on the interconnect Three 64 bit registers are provided to hold an incoming interrupt data packet while three more are provided to hold an interrupt data packet waiting to be sent 16 Sun MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A SIGNAL DESCRIPTIONS Quick Pin Reference UPA Interface Symbol Name and Function SYSADR 35 0 Bidirectional UltraSPARC transaction request bus Maximum of 3 other masters and 1
39. ronous to system clock Active low XIR_L Driven to signal XIR resets Actually acts like a non maskable interrupt Synchronous to system clock Active low EPD Asserted when UltraSPARC is in power down mode Quick Pin Reference UDB Chip Interface Symbol Name and Function UDB_UEH Asserted when the High UDB drives Edata 127 64 if there is an uncorrectable ECC error associated with that data Synchronous to system clock UDB_UEL Asserted when the Low UDB drives Edata 63 0 if there is an uncorrectable ECC error associated with that data Synchronous to system clock UDB_CEH Asserted when the High UDB drives Edata 127 64 if the data has a corrected single bit error Synchronous to system clock UDB_CEL Asserted when the Low UDB drives Edata 63 0 if the data has a corrected single bit error Synchronous to system clock UltraSPARC I controls the UDB s drive and receive of EDATA Asserted with valid EDATA when driving data to UDB Asserted the cycle before the UDB should drive data Synchronous to system clock UDB_CNTL 4 0 20 SuN MICROELECTRONICS July 1997 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I First Generation SPARC v9 64 Bit Microprocessor With VIS STP1030A TIMING CONSIDERATIONS This section describes the logical timing for the transactions occurring between UltraSPARC I the external cache and the data buffer The diagrams are based on a c
40. rough non allocating 16 Kilobyte direct mapped cache with two 16 byte sub blocks per line It is virtually indexed and physically tagged The tag array is dual ported so that tag updates due to line fills do not collide with tag reads for incoming loads Snoops to the D Cache use the sec ond tag port so that an incoming load can proceed without being held up by a snoop Floating Point Unit FPU Separating the execution units in the FPU allows UltraSPARC I to issue and execute two floating point instructions per cycle Source data and result data are stored in the 32 entry register file where each entry can contain a 32 bit value or a 64 bit value Most instructions are fully pipelined throughput of one per cycle have a latency of three and are not affected by the precision of the operands That is latency is the same for single or double precision The divide and square root instructions are not pipelined These take 12 cycles single precision or 22 cycles double precision to execute but they do not stall the processor Other instruc tions following the divide square root can be issued executed and retired to the register file before the divide square root finishes A precise exception model is maintained by synchronizing the floating point pipe with the integer pipe and by predicting traps for long latency operations Graphics Unit GRU UltraSPARC I introduces a comprehensive set of graphics instructions that provide fast hardw
41. system controller can be connected to this bus 3 3V UPA Bidirectional radial UltraSPARC I Bus signal between UltraSPARC and the system Driven by UltraSPARC I to initiate SYSADR transactions to the system Driven by the system to initiate Coherency Interrupt or Slave transactions to UltraSPARC I Synchronous to the system clock 3 3V UPA UltraSPARC system address bus arbitration request from up to 3 other UltraSPARC bus ports that might be sharing the SYSADR Used by UltraSPARC I for the distributed SYSADR arbitration protocol Connection to other UltraSPARC I bus ports is strictly dependent on the Master ID allocation Synchronous to system clock 3 3V UPA ADR_VLD NODE_RQ 2 0 SC_RQ UltraSPARC system address bus arbitration request from the system Used by UltraSPARC I for the distributed SYSADR arbitration protocol Synchronous to system clock 3 3V UPA UltraSPARC system Reply packet driven to UltraSPARC I Bit 4 of the UltraSPARC I bus S_REPLY is not used by UltraSPARC I Synchronous to system clock 3 3V UPA S_REPLY 3 0 DATA_STALL This is asserted with or after an S_REPLY to hold output system data or signal the delay in arrival of input data from the system 3 3V UPA P_REPLY 4 0 UltraSPARC I processor reply packet driven by UltraSPARC I to the system Synchronous to system clock 3 3V UPA UltraSPARC system address bus arbitration request Asserted when UltraSPARC
42. t packet to the destination see ASI Registers definition in UltraSPARC User s Manual P_NCRD_REO Noncached Read Used when a load or a block load is issued to a noncacheable location One 2 4 8 or 16 bytes can be read with this transaction P_NCBRD_REQO Noncached Block Read Used when a block read 64 bytes is made to a noncacheable location P_NCBWR_REQ Noncached Block Write Generated by UltraSPARC I when a block write 64 bytes is made to a noncacheable location System Transactions Accepted by UltraSPARC I e S_INV_REQ Invalidate Invalidate request from system controller to UltraSPARC I following a Read To Own P_RDO_REO or Write Invalidate P_WRI_REOQ request for a block from another UltraSPARC I e S_CPB_REQ Copyback Copyback request from the system controller to UltraSPARC I following a Read To Share P_RDS_REQ or Read To Share Always P_RDSA_REQ request for a block from another UltraSPARC I e S_CPI_REQ Copyback Invalidate Copyback and Invalidate request from the system controller to UltraSPARC I in response to a Read To Own P_RDO_REQ request for a block from another UltraSPARC I e S_CPD_REQ Copyback To Discard Sent at the UltraSPARC I bus interface to UltraSPARC I in order to service a Read To Discard P_RDD_REQ issued by another UltraSPARC I This transaction does not generate a state change for the E Cache and does not require a flush of the store buffer tag check 12 Sun MICROELECTRON
43. t present e P_RAS Read Acknowledge Single 16 bytes of read data is ready in the output data queue on the UDB Sent following a single non cacheable read request from an UltraSPARC I reply to P_NCRD_REQ e P_SACK Coherent Read Acknowledge Block Asserted for a coherent snoop S_REQ when the data is in the cache and not pending a writeback due to victimization Indicates data is available in the UDB Reply to S_CPB_REQ S_CPD_REQ S_CPI_REQ or S_INV_REO P_SACKD Coherent Read Acknowledge Block for Dirty Victim Asserted for a coherent snoop S_LREQ when the data has been victimized and is pending a writeback Indicates data is available in the UDB Reply to S_CPB_REQ S_CPD_REQ S_CPI_REQ or S_INV_REQ e P_IAK Interrupt Acknowledge UltraSPARC I sends a P_IAK to acknowledge that the interrupt transaction delivered by the system has been serviced This implies that there is room on the UDB for another interrupt request and its 64 bytes of data P_RERR Read Error Returned by UltraSPARC I in response to a noncached block read request P_NCBRD sent to it No data is transferred Cacheable read requests produce undefined results e P_FERR Fatal Error Sent when UltraSPARC I detects a parity error on the SYSADDR bus or E Cache tags Indicates that the system should generate a system wide power on reset July 1997 Sun MICROELECTRONICS 13 Powered by ICminer com Electronic Library Service CopyRight 2003 Ult
44. tag X i j i i i i i TDATA X Do_tag X Ditag X D2 tag X l i i i i i i i i i i DSYN_WR_L Ro Ri R2 i i Doe Ro R1 R2 oe X AO tag X At_tag X A2 tag xX EDATA X EEr X oa X DAA X i H H H i i i i Figure 5 Coherent Read Hit Timing Coherent Write Hits Writes to the external cache are processed through independent tag and data transactions First the tag and the state bits of the E Cache line corresponding to the write are read If the access is a hit and the state is exclu sive or modified the data is written to the data RAM July 1997 Sun MICROELECTRONICS 21 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS In the timing diagram shown in Figure 6 three consecutive write hits to M state lines are shown Access to the first tag D0_tag is started by asserting TSYN_WR_L and TOE_L and by sending the tag address AO_tag In the cycle after the tag data DO_tag comes back it is determined by UltraSPARC I that the access is a hit and that the line is in M state Modified In the next clock cycle a request is made to write the data The data address is presented on the ECAD pins in the cycle after the request cycle 7 for WO and the data is sent in the following cycle cycle 8 as shown in Figure 6 Separating the address and the data by one cycle reduces th
45. ts primary clock source and is the positive differential clock input CLKB This pin provides STP1030A with its primary clock source and is the negative differential clock input LOOPCAP The external PLL loop filter connects to this pin to filter the analog voltage which controls the PLL VCO SCLK_MODE Indicates clock divider mode system frequency is 2 or 3 of the processor frequency UDBCLKA UDBCLKB These are differential inputs of the system clock They are used to generate the phase signal which allows UltraSPARC to synchronize communication to the system with respect to the system clock PLLBYPASS When asserted this pin causes the phase lock loop to be bypassed The clock from the differential receiver is directly passed to the clock trunk STOP_CLOCK Indicates clock has stopped L5CLK A buffered version of UltraSPARC s internal level 5 clock Used to determine PLL lock or clock tree delay when UltraSPARC is in PLL bypass mode Quick Pin Reference JTAG Debug Interface Name and Function IEEE 1149 test data output A three state signal driven only when that TAP controller is in the shift DR state IEEE 1149 test data input This pin is internally pulled to logic one when not driven IEEE 1149 test clock input This pin if not hooked to a clock source must always be driven to a logic 1 or a logic 0 IEEE 1149 test mode select input This pin is internally pulle
46. write 0101 S_OAK Ownership Acknowledge To Master for P_LRDO_REQ 0110 S_INAK Interrupt Nack To Master for PLINT_REQ 0111 S_RBU Read Block Acknowledge Unshared To Master for any block read 1000 S_RBS Read block Acknowledge Shared To Master for coherent shared read 1001 S RAS Read Acknowledge Single To Master for PNCRD_REQ 1010 S RTO Read Time Out To Master forwarding of P_RTO 1011 S_SRS Slave Read Single Read 16 bytes of data from slave 1110 S_SWIB Slave Write Interrupt Block Write 64 bytes of interrupt data to slave 1101 Reserved 1111 Interaction between the E Cache and UltraSPARC Data Buffers UDB External cache accesses although synchronous to the internal clock are not closely coupled to the pipeline Full throughput to the external cache is supported and can make the E Cache look like a very large D Cache The micro architecture used to support this consists of the load buffer dual ported tags separate address buses for tag and data and so forth July 1997 Sun MICROELECTRONICS 15 Powered by ICminer com Electronic Library Service CopyRight 2003 UltraSPARC I STP1030A First Generation SPARC v9 64 Bit Microprocessor With VIS The UDB isolates the system data bus from UltraSPARC I The UDB allows data transfers between UltraSPARC I and the memory system or I O for example noncacheable stores The data buffer also enables transfers between the E Cache and the memory system for instance writebacks to occur much

Download Pdf Manuals

image

Related Search

SUN MICROELECTRONICS STP1030A handbook sun ultra sparc stp1030abga

Related Contents

Copyright © All rights reserved.
DMCA: DMCA_mwitty#outlook.com.