Is the SPARC architecture dead


The topic of RISC is in: When it comes to networked workstations, if the user wants comfortable user interfaces, if he values ​​menu-supported window systems and high-resolution graphics, then he needs the highest computing power. A case for Reduced Instruction Set Computing, so reasons enough for the CW to invite you to a round table discussion. Representatives of the most important RISC companies discussed with CW editor Jan-Bernd Meyer about the architecture with a reduced instruction set, its special features and the opportunities compared to CISC. Professor Arndt Bode from the Technical University of Munich was the neutral leader of the conversation.

Professor Arndt Bode: Today there are quite a large number of providers of RISC architectures and the outsider wonders what are the differences between the individual RISC systems, what are the special features of the special architectures and what are their approaches with which they differ from the Differentiate between competition?

Uwe Kranich (AMD): Our strategy is clearly aimed at the embedded control area and we are concentrating on the RISC processor 29000. The purpose and basis of the development of the module was to create an architecture that enables it, even with slow ones Memory modules work together without any problems without generating high system costs, which we have tried to achieve with special cache structures such as our branch target cache.

Willi Haas (Mips): According to Professor Bode's definition, the current Mips products belong to the second generation of processors: the R3000 is a CMOS processor, the R6000 an ECL processor. With our optimizing compiler family, we are one of the representatives of the Stanford variant.

Rudolf Schmidberger (Motorola): It is not so important how the RISC processors differ. However, there are significant differences in what happens around the processor core. Workstations today are on the one hand very powerful, on the other hand they require very large memory areas or cache support to be implemented. In the embedded controller area, the requirements are not that high, real-time requirements are in the foreground here. We are supporters of the Harvard architecture, which is executed externally.

Martin Lippert (Sun): The Sparc processor from Sun Microsystems was developed in 1983. Sun was interested in a processor base for products for the next 10 to 15 years. The essential thing about the Sparc architecture is the fact that it has been licensed to semiconductor manufacturers to manufacture the processors, but has also been offered to a large number of system suppliers who manufacture in their own computer and workstation areas. Sparc is clearly descended from the Berkeley line. We work in particular with register windows and also use the hardware support through compilers in the operating system.

Robert B. Richards (Cypress): The Cypress implementations of the Sparc chip cover both workstation and server applications. They contain functions such as integers, floating point units and cache controllers and a memory management unit, with the latter offering the option of dedicated cache chips. For this original implementation there is a derivative specially developed for multiprocessing applications and another of the core architecture that supports embedded controller solutions

should offer. Since this chip requires neither a cache controller nor an MMU, it is more suitable for low-cost computing power in embedded applications. Compared to other Sparc chips with eight register windows, it has twelve. It can be assumed that in the very near future we will present a superscalar implementation, floating point cache and MMU functions will be combined.

Sharad Gandhi (Intel): Whether it is an embedded or reprogrammable processor is decided solely by the available range of tools, debuggers and other software. The 860 processor is the best example of how one can define embedded and reprogrammable: Actually a classic RISC architecture, but with a built-in array, i.e. vector processor. It has a million transistors, but half of them are used for the on-chip cache. Because of the Harvard architecture, it is possible for the CPU to access the code cache and the data cache at the same time. Basically, as the 860 chip shows, Intel tends more in the direction of superscalar architecture as opposed to superpipelining designs.

Bode: The total costs for a RISC-based computer system result essentially from the structure of the memory and also the bus structure. What requirements do powerful memory and bus structures have to meet?

Gandhi: Single-processor systems do not need an additional second-level cache in most situations. In multiprocessor systems, however, this is important not only for the computing power, but also for the bus bandwidth. We believe that having an on-chip cache for single-CPU systems is a big plus in terms of system cost.

Schmidberger: For the embedded control area, I assume that cash is not always desirable due to the unforeseeable behavior. According to the Harvard architecture, our processor module also consists of two completely separate bus systems, which we expect to have advantages in terms of access speed. The organization of the cache is also important. Our processor with 16 KB cache can be duplicated accordingly, so that we get a real 256 KB cache per bus side and thus a total of 256 KB cache. We support the user insofar as he has more opportunities to put his effort into the actual design and not into the part close to the processor. However, the user has to consider whether he does not want to leave storage systems the same and only use the pure speed of the processor core with the caches. Another important aspect is whether it is a physical or a logical cache. This also results in the access speeds and the times that are lost in memory management.

We have integrated the cache and the memory management unit on the chips and to this extent eliminate the search processes on the memory management unit despite the implementation of a physical cache. Another advantage of the physical cache is that what is known as bus snooping can be carried out quite easily, thus guaranteeing the cache consistency of several processor nodes.

The separation of the bus can of course be continued in the storage system. When I look at today's workstation market, after the physical caches, the system buses are brought together again, simply to save costs. Dynamic RAMs are generally used for both program and data areas; the cache implementation ultimately guarantees that, despite the restricted access to the shared memory, high hit rates and thus high performance values ​​are still achieved.

Haas: CMOS implementations are currently available from our semiconductor partners in versions between 12 and 40 megahertz. The cache controller in the CMOS version is integrated. We can use caches separately from data from 4 to 256 KB. They are always off-chip, which has the advantage for the user that he can choose the cheapest solution on the market. A number of our manufacturers have already designed additional or special derivatives in which, for example, the entire external cache can be accommodated in one memory. We support multiprocessor options on the chip by means of two signals. The cache protocols for cache coherence must be implemented externally.

Kranich: For us, the 29000 has a classic 3-bus architecture, with instructions and data being completely separate. This enables us to access both buses during a cycle. This leads to bus bandwidths that are easy to calculate. We don't need an on-chip or off-chip cache. The aim was to work with slow DRAMs - slow, by the way, in relative terms, DRAMs deliver high initial access times but also high bandwidths in order to be able to compensate for the initial access times that are available with DRAMs. We achieve this with our branch target cache, which only stores jump targets and therefore does not need a large number of transistors because it does not require many memory entries.

It is important with our complete concept that it is hardware-dependent. This means that we no longer have to use the compiler to ensure that it is appropriately adapted to the memory model. Rather, the hardware determines that. In addition, we don't need zero wait state caches to achieve high performance. Of course, the performance drops with the slower RAMs, but there are no drastic drops if you take a little longer here and there.

Gandhi: With the 860 processor it is possible to manage the cache in such a way that the caches can practically be used as data buffers. It is particularly useful for iterative processes such as vectoring to load the entire program into the cache.

Bode: I see the biggest differences in most systems in the register organization. What register organization does your computer have? What is the maximum clock frequency in the classic CMOS systems and is there a plan to develop a different, faster, i.e. an ECL or gallium arsenide variant?

Kranich: The 29000 from AMD has 192 registers, 128 of which can be used as so-called registers, basement cache so as register window with variable size. We are not planning to go to ECL in the future.

Haas: We have 32 32-bit registers in our architecture that have multipurpose properties. There are also 16 64-bit registers for floating point operations.

Schmidberger: We are working with Data General on an ECL variant.

Lippert: The SPARC architecture generally works in register windows, and variants are defined in the architecture that allow between two or up to 32 register windows. How many of these are actually present on the chip depends on the implementation. We see the main advantage in the fact that these register windows can be used to greatly improve the runtime behavior in a real program in which subroutines are jumped. Our license companies are working on ECL and gallium arsenide variants of the Sparc architecture.

Bode: Can you give us more precise megahertz rates? Lippert: Around 200 MHz. The whole thing should be coupled as a multiprocessor system in which you can combine between four and eight processors. I doubt whether this is of practical importance for the traditional computer user, because it will be very expensive.

Gandhi: Like most of the processors presented here, the 860 has 32 32-bit integers, the floating point unit on-chip has 32 32-bit registers or 16 64-bit registers. We currently have no plans to develop ECL or gallium arsenide based systems.

Bode: For the user, it is very important that development aids are available. I am assuming that all systems have compilers for the common high-level languages?

Gandhi: We have a development system that is a PC with an 860 board. Today we can run practically all compilers such as C, Cobol etc. on the 860 board in native mode, as can debuggers. So we can load code into another embedded system.

Richards: In terms of debugging, there are a few options for our implementation: Real-time debugging applications can be implemented with the "Sun Tool". There are also monitoring functions that can be used for embedded applications. With a debugging software from Flame Computers, the user can set breakpoints in spec registers, with which traps can be generated. All of this is done interactively with a compiler also developed by Flame. Compilers are available for all common languages ​​including C or for Al languages ​​such as Lisp and Smalltalk.

Bode: What about prototype hardware?

Richards: There are boards that are VME-compatible or also work in a PC environment. They can host development on any Sun machine. It is also possible to develop on a non-native host such as the Sun3 or an IBM PC AT with an emulator.

Schmidberger: We don't have any special hardware, we refer to standard products on the market that can be used. On the one hand there are development machines. The range of cards covers the PC, VME, multi-bus and Q-bus cards. Well-known houses offer compilers. Incidentally, these companies are gathered in the 88open user group, where software standards for both compilers and general applications have been developed. In addition to the typical compilers, we also have high-level languages ​​such as Ada, and the user can also use AI languages ​​such as Lisp and Prolog.

Haas: The Mips company developed architecture in connection with the compilers. These features have already been taken into account in the initial phase, front-ends are currently available in C, Fortran, Pascal, PL1, Ada and Cobol. The "Systems Programmers Package" tool package contains all routines and simulators that were originally used when we first constructed workstations in-house. With an architecture simulator, the user can, for example, simulate storage structures on site. It contains a debug monitor, a prompt monitor and all of the down-load utilities that are ultimately required when the developed software is tested on the target system. Our semiconductor partners and third-party companies offer prototype boards.

Kranich: With us, the user can choose between three emulators, among development boards that are designed for stand-alone PCs or as VME bus boards, depending on the environment in which they want to work. The tools run on hosts such as ATs, Sun and Native, ie on the 29000 processor itself. Our architecture simulator also allows the user to determine the resulting performance from a memory configuration and a CPU configuration in advance. In this way, it can already be determined during the development phase which memory model is actually required for a certain required performance.

Bode: It is certain that an essential difference between RISC and CISC architectures is the degree of optimization when translating high-level programming languages ​​into machine code. For the programmer, however, this means that he is expected not to be able to translate his program in an optimized way at first. He must first debug it in order to then recompile and optimize it again. The programs can then no longer be tested on a source-related basis. Are there any approaches from one of the manufacturers to solve this problem?

Haas: We have a special compiler switch that can be used to debug optimized programs. Of course, it will not be possible in all cases to reconstruct the route-optimized features. On the other hand, we can guarantee that the optimized programs behave like the non-optimized ones.

Bode: I still feel a certain skepticism. What is important for the user is the performance that their systems offer. Where can the user obtain reliable information about your products and what information do they have to expect?

Kranich: With us, the customer can run his application with our tools and hardware. He then sees for himself how efficient our system is for his application.

Haas: We are happy that the SPEC-Cooperative was founded, in which all important computer manufacturers have united. As a founding member, we of course publish Specmark results. Of course, other results are also available on request, although we would like to point out that these are usually not very meaningful.

Schmidberger: Motorola publishes numbers, whereby we try to give as precise a description as possible of the criteria according to which they were created and on which machines they can run. But the user should test real executable programs and only to a limited extent under application-specific benchmarks.

Lippert: Like every other manufacturer, Sun provides figures. However, one must be aware that these do not refer to the RISC processor alone, but that of course the performance of an overall system is characterized. Since all of today's tests are purely CPU integer and floating point related, we recommend that customers try out a real application on a system in question.

Gandhi: We regularly publish performance reports. For the 860 we have benchmarks from Spec, Dhrystone, Whetstone, Linpack. We provide descriptions of the systems, compilers and optimization levels and the meaning of various benchmarks.

Bode: In five years you want to put 10-15 million transistor functions on a chip, in 2000 about 50 million. Then what will happen to the RISC philosophy? What remains the main difference between RISC and CISC architectures in the long term? Can't superscalar properties be better accommodated in RISC architectures?

Gandhi: The 486 is an example of using the same technology regardless of the architecture. Whether or not RISC is dead - I think the question becomes irrelevant.In two years' time people will no longer classify the processors according to RISC or CISC, but will query other features such as superscalar or superpipelining, or whether there is a vector unit or support for certain multiprocessing cacheing solutions.

I think the trend is towards multi and parallel processing. Intel's focus in all three architectures is on superscalarity and the support of parallelism of processors on one board and multiple processors within one chip.

Richards: We will integrate the cache on upcoming processors, which are the 0.5 micron generation. We will try to extract functional units together with the associated registers in order to enable superpipelines and superscalar architectures. The bus will be further designed internally and the performance is to be increased by new compilers, which in turn better support pipeline structures.

Lippert: The most important thing will be software development. Hardware development takes place relatively quickly; there are always new, powerful processors. All the application software and compiler development, on the other hand, takes place relatively slowly. I believe that compiler development takes more time than pure processor development and that this will be the key to the question of whether there will be a convergence between RISC and CISC.

Haas: The acronym RISC was chosen a bit unfortunate and the reduced instruction sets are actually more of a side effect of the actual design philosophy. Due to the lack of the load store architecture and the existence of instructions of different lengths, this will always have a detrimental effect on the performance of CICS processors. On the other hand, RISC processors will include additional instructions if it turns out that they can be used to increase performance or achieve other desired properties. The basic characteristics between RISC and CISC will remain, even if we believe that the two approach each other. An R3000 processor today requires around a tenth of the chip area of ​​a 486, and it is roughly twice as powerful.

Kranich: It seems problematic for me to optimally schedule several cycles, which can be of different lengths, or command sequences of different lengths, so that afterwards one really achieves an increase in performance. It is not enough to start two or more commands at the same time. This is not trivial with complex instruction sets. On the other hand, superscalar principles must guarantee binary compatibility in addition to performance increases. Unfortunately, it is a fallacy to believe that you simply put a few PCs together and then they have x-times the performance. With regard to sensible scheduler software for multiprocessor systems in particular, there is still a lot to be done in order to be able to really use what is made available by the degree of integration.

Bode: The main difference between the CISC and the RISC architectures was from the user's point of view that they provided upwardly compatible models within a family. Today, a certain move towards more complex RISC architectures can be seen from various RISC manufacturers. Does that mean for the user that there will be new machine command interfaces at some point? Will he always be able to use his object code with the new technology or not?

Lippert: The machine command interface is defined and laid down in the Sparc architecture definition. Today, Sun no longer has this technology under its own control, as the Sparc-International group is in charge. Of course, I assume that this machine command interface will be retained, so that upward compatibility is guaranteed in any case. Of course, there may come a time when you say you need different functions now. However, this can be implemented as an extension without further ado, without conjuring up any break in upward compatibility.

Kranich: I can only agree with Mr. Lippert: You can't expect users and software houses to completely re-port their software every decade.

Ghandi: I think we are at the very beginning of the development. What can definitely not be foreseen, or only very poorly, is how the various developers of a certain architecture will proceed with which strategy and which policy. In my opinion, it is very difficult to predict how things will look in three years' time with the second or third generation of the same architectures from different manufacturers. Nobody can make a clear statement about whether the architecture will stay the same.

Lippert: Well, I would like to say something about that: If I understand you correctly, you are saying that something can definitely change in the architecture. That the users are expected to go to a different machine command interface again. From my point of view, that would be a mistake. I believe that all systems that do not guarantee compatibility will not stand a chance on the market in ten or 15 years.

Gandhi: My question was only aimed at whether the various manufacturers of Sparc or Mips architectures attach the same importance to discussing upward compatibility with other manufacturers. For example, whether the next generation of Cypress will be compatible with the upcoming generation and an even later one from LSI and will support a similar object code.

Lippert: That is Sparc's objective, the basis

deliver for systems that are truly binary code compatible. I have not yet seen a system from Silicon Graphics that is binary-compatible with that from Digital or any other licensee that is based on the Mips architecture. That is up to the manufacturers.

Haas: It's the operating system, not the processor architecture.

Lippert: Sure, but what is de facto interesting is what people get out of it afterwards. Today it is the case that it must be ensured that a multi-vendor community is created like the one in the PC world. This is the basis for long software use. If you can't guarantee that, it doesn't make much sense.

Richards: The people in the industry have learned a lesson: the individual silicone manufacturer can no longer develop complete instruction set architectures on their own. That is only possible in a group. They must have expertise in the technology as well as in compiler development and system integration.

Only with this combined accumulated knowledge was it possible or could be developed things like the option to switch hardware pipelines in order to start several instructions at the same time, or to integrate user-defined instructions for the future support of co-processors such as graphics Chips that can be mapped into the instruction set. That was never possible on CISC machines.