Media: Chris Malecek 612-683-7133 (Sept. 27-28) Steve Conway 612-683-7133 (after Sept. 28) Financial: Laura Merriam 612-683-7395 CRAY RESEARCH MPP SYSTEM DEBUTS WITH INDUSTRY- LEADING PERFORMANCE, STRONG INITIAL ORDERS CRAY T3D System Expected To "Set The Standard," Accelerate Industrial-Commercial Use of Massively Parallel Processing WASHINGTON, DC, Sept. 27, 1993 -- Cray Research, Inc. (NYSE:CYR) today introduced the CRAY T3D, the company's first massively parallel processing (MPP) system and the first such system built with "true supercomputer technology." In Washington meetings with Congress members, federal officials, customers and reporters, Cray executives said the CRAY T3D is a "next-generation MPP system, a quantum advance over current MPP products." The system -- combining Cray Research supercomputer hardware and software, powerful Alpha RISC microprocessors from Digital Equipment and sophisticated Motorola logic chips -- already demonstrates industry-leading performance and unexcelled price- performance, and has attracted nine initial orders and additional strong interest, they said. The company expects CRAY T3D revenue to begin in the first quarter of 1994. "Cray Research is now the MPP technology leader," declared Cray chairman and CEO John F. Carlson. "A year from today, we expect to be the leading MPP vendor." He said he expects the CRAY T3D to be substantially more useful than current MPP products for industrial and commercial customers, as well as research facilities, and to set the standard for MPP systems. "The CRAY T3D is an important milestone for tackling grand challenge' problems under the government's High Performance Computing and Communications (HPCC) program," he added. Product Summary "The CRAY T3D is the world's first scalable heterogeneous supercomputing system, said Steve Nelson, Cray Research vice president of technology and head of the CRAY T3D development program. "It closely couples proven Cray Research parallel vector capabilities with MPP capabilities (heterogeneity) to tackle a wider range of problems than current MPP products, and efficiently increases applications performance with increased system size -- the true test of scalability." The CRAY T3D is offered in a wide variety of sizes, from a 32- processor version (4.8 peak gigaflops -- 4.8 billion floating point operations per second) priced from $2.2 million in the U.S., on up in powers of two (64, 128, 256, 512 processors) to a 1024-processor version (153.6 peak gigaflops) with U.S. pricing from $31.0 million. A top-of-the-line 2048-processor version (307.2 peak gigaflops) is also being offered. The system comes in a wide range of configurations. For customers who want to add MPP capabilities onto existing Cray Research parallel vector systems (CRAY Y-MP Model E series, C90 series or M90 series) in a multiple-cabinet configuration, a full range of system sizes is available (32 to 2048 processors). Sizes up to 128 processors are available in air- or liquid-cooled versions; larger sizes are liquid-cooled. Memory options range from 0.5 gigabytes to 128 gigabytes, depending on system size. For customers wanting parallel vector and MPP capabilities in one cabinet, liquid-cooled models are available with 128 or 256 processors, two to 16 gigabytes of memory, and one to four parallel vector central processing units (CPUs). U.S. pricing for the 128-processor single-cabinet model starts at $7 million. "Customers do not need to order two separate systems," Nelson stressed. "They can acquire a single system with as little as one parallel vector processor and many microprocessors." "With this many configurations, customers can mix-and-match the proportions of parallel vector and MPP capabilities to meet their own processing needs," he said. Orders, Market Demand Cray Research has received nine orders for CRAY T3D systems from U.S. and international customers in the government, university and commercial sectors. The company is in discussions with more than a dozen additional prospects, Carlson said. "This is impressive in an MPP market with established vendors. The world has been waiting for the kind of next-generation MPP system we are delivering." Speaking at the Washington events were senior officials from: the Pittsburgh Supercomputing Center (PSC), which has installed and accepted a 32-processor CRAY T3D prototype scheduled to grow in early 1994 to a 512-processor production system; NASA's Jet Propulsion Laboratory/Caltech (JPL/Caltech), Pasadena, Calif., scheduled to receive a 256- processor system in fourth-quarter 1993; and Ácole Polytechnique F´d´rale de Lausanne (The Swiss Federal Institute of Technology), which has signed a preliminary agreement to acquire a 256-processor system in early 1994. This agreement is expected to be finalized soon, according to Carlson. As part of the company's Parallel Applications Technology Program (PATP), agreements with these three customers call for the systems to be available not only for the organizations' users, but for collaborations with Cray Research to develop targeted software applications for the CRAY T3D system. The Arctic Region Supercomputing Center, a national facility located at the University of Alaska Fairbanks, has ordered a 128-processor system for installation in first-quarter 1994. Some of the other nine initial customers will be announced in coming weeks. The initial customers include a major petroleum company and a large, well-known Japanese industrial firm. "The CRAY T3D was jointly supported by the Advanced Research Project Agency (ARPA) and Cray Research," Carlson said. "This relationship helped make it possible to develop the CRAY T3D in only 26 months and deliver the system on schedule. This relationship will remain important for Cray Research in the future." He said Cray Research and the CRAY T3D system have been selected to participate in the Evaluation of Early Computing Systems Prototypes portion of the ARPA High Performance Computing (HPC) Program. This portion of the program was established to provide rapid feedback to system architects and software designers, as well as to speed the maturation of software and experimental grand challenge applications on advanced system prototypes. Performance and Price-Performance "It's not enough for an MPP system to use fast microprocessors; what's needed is a balanced system that matches fast processor speed with fast I/O (input/output), fast memory access and capable software," said Steve Nelson, head of the company's CRAY T3D development program. "The CRAY T3D system has already demonstrated that it is the fastest, most balanced MPP system in the world." Nelson said the CRAY T3D system achieved higher sustained speeds than any other MPP system on all eight tests of the NAS Parallel Benchmarks, a widely accepted benchmark suite developed by the NASA Ames Research Center. The tests represent a broad range of MPP computing challenges, with varying interprocessor communications needs. On some tests, the 128-processor CRAY T3D system was as much as four times faster than all other MPP products with up to 128 processors. He expects the CRAY T3D to run the tests even faster in the future. "Other vendors have had time to fine-tune their systems for these benchmarks." Comparisons used latest results published by the NASA Ames Research Center this month. In these tests and in operation at PSC, he said, the CRAY T3D system demonstrated latency of under one microsecond. Nelson said latency--the time it takes for a processor to begin using data it has requested -- is a key factor in overall MPP system performance and ease-of-programming. "Current leading MPP vendors are in the 100-microsecond latency range, and are targeting the tens-of-microseconds range in the next two years," he said. "For this key performance indicator, the CRAY T3D system is typically two orders-of-magnitude ahead of current leading MPP products." Because of the system's low latency and unrivaled bisection bandwidth (the amount of data that can be transferred among processors in a given time), the CRAY T3D's speed advantage over competing systems is expected to increase substantially in larger system sizes. "With the T3D, you won't see the performance-degrading traffic jams' that have plagued other MPP systems. The larger the system size compared, the better we will look," he said. "We also did price-performance comparisons based on the NAS Parallel Benchmarks results, which measure actual sustained performance, as opposed to theoretical peak performance," Nelson said. "We used three separate methods. For each method, the CRAY T3D showed price-performance better than or equal to any of the other MPP products," Nelson said. Product Details The CRAY T3D system was developed in consultation with an MPP Advisory Group , an international group of government, university and commercial customers with first-hand MPP experience using other products, said Nelson. "As a result, the CRAY T3D alleviates performance deficiencies found in other MPP systems and fits customers' needs for production- oriented computing." He noted key features of the system: A scalable heterogeneous architecture allows users to efficiently distribute programs, or portions of programs, between the system's closely coupled parallel vector and MPP environments for fastest solution times. A 3-D torus interconnect topology minimizes network distances and provides the highest-known bisection bandwidth -- up to 76.8 gigabytes per second for a 1024- processor system. The 3-D torus avoids "far neighbor" communication delays found in other MPP systems. High- performance switch nodes, operating bidirectionally in each dimension, handle interprocessor communications without interrupting the processors. - Sophisticated mechanisms for latency hiding and fast synchronization. - A high-bandwidth I/O subsystem (gigabytes per second) to access Cray Research disk, tape and network peripherals. - Globally shared, physically distributed memory allows any microprocessor to access any memory location, supporting ease-of-programming and high performance on applications with fine, medium and coarse-grained parallelism. - The flexible CRAFT (Cray Research Adaptive Fortran) programming model supports traditional message-passing and data parallel programming, and provides a new work sharing capability. Customers can choose the programming style that best fits their applications, or any portion of them -- a choice not available before. Existing MPP codes can be ported easily, typically with improved performance. CRAFT, an extension of Fortran 77, includes Fortran 90 features such as array syntax and intrinsics. - Applications development can start on CRAY Y-MP, CRAY C90, CRAY M90 or CRAY EL90 systems, using the previously introduced CRAY T3D Emulator. - The C programming model provides portability to other platforms, using a highly optimized Parallel Virtual Machine (PVM) implementation of message passing. - The CrayTools development suite includes the MPP Apprentice performance analyzer, the CRAY TotalView debugger, and a range of programming utilities. - UNICOS MAX, a distributed, multiuser MPP operating system, functions as a fully compatible complement to the company's mature, feature-rich UNICOS parallel vector operating system. UNICOS MAX allows the CRAY T3D system to be shared among many users, with applications partitions ranging from two processors to the whole system. Network compatibility with other vendors' systems is assured through compliance with industry standards: UNIX System V, BSD UNIX, POSIX 1003.1 (operating systems); Fortran 77, C (languages); HIPPI, FDDI, Ethernet (networks); RPC, OSI, TCP/IP (protocols); and PVM, RQS/NQS (distributed tools). - A consistent generation-to-generation macroarchitecture means applications written for the CRAY T3D will run easily on future Cray Research MPP systems. A variable microarchitecture means that for future-generation MPP systems, Cray Research can use the fastest microprocessors available at those times. Applications Initiatives "Few important commercial applications are available today on existing MPP systems," said Bob Ewald, Cray Research executive vice president and general manager, Supercomputer Operations. "We expect the CRAY T3D system to rapidly improve that situation." In partnership with application vendors and customers, including those involved in the PATP program, Cray Research is developing a wide range of application software for the CRAY T3D system, Ewald said. The company's first initiatives will focus on these economically important areas: - 3-D prestack seismic processing for petroleum exploration - Atmospheric modeling for weather prediction and climate research - Computational fluid dynamics and structural analysis for the aerospace, automotive, chemical and semiconductor industries - Computational chemistry for drug design and materials science applications - Computational electromagnetics for electronics and defense applications - Combustion modeling for engine design Dr. Michael Levine, scientific co-director of PSC, reported that "our prototype CRAY T3D system is now available full time for applications development, and is proving to be very stable." Representatives of Los Alamos National Laboratory said they already have their Parallel Ocean Program (POP) global climate model running on a CRAY T3D system located at Cray Research. Immediately available for the CRAY T3D system, Ewald said, are third-party applications in computational fluid dynamics (FLO67); chemistry (AMBER, CHARMM, GAMESS USA, SUPERMOLECULE); combustion (FIRE); environment (POP); and structural analysis (LS-DYNA3D). Additional codes in these areas and the petroleum area are currently being pursued. He said the system will use the IMSL mathematical software libraries. The PARMACS programming model is scheduled to be ported to the system later this year, making a range of additional applications immediately available. Ewald outlined the company's previously announced three- phase MPP program. Plans call for delivering the second- generation system in mid-decade, with peak performance of a teraflops (trillion floating point operations per second); and the third-phase system later in the decade, with sustained teraflops performance. He reaffirmed the company's plans to deliver a next-generation parallel vector system, code-named Triton, around the middle of the decade. "For the foreseeable future, some applications will continue to perform better on our parallel vector supercomputers like the CRAY C90, CRAY M90 and their successors, while other programs run more efficiently on our MPP systems. We'll be able to offer customers both," he said. Cray Research creates the most powerful, highest-quality computational tools to help solve our customers' most challenging problems. ###