-----BEGIN PGP SIGNED MESSAGE-----Hash: SHA1MDJ 2003.06.25 (June 25, 2003)==============================   Copyright 2003, GCSF Incorporated.  All rights reserved.Top of the Day--------------**The G5 Revolution**  We've spent the first few days of WWDC gathering lots of  information, and today you'll see the first fruits of that.  MacCyclopedia offers your first complete look at the PowerPC 970  processor, now better known as the PowerPC G5. The demos are  exciting, but how does it work? It's not just clock rate: the G5  can do more work on every clock cycle, talks to other parts of  the system faster than any other PC on the market, and provides  64-bit support for applications that can really use it. We  explain what all that means. And since you've undoubtedly been  barraged by reports that Apple's G5 benchmarks can't possibly be  true, we explain why they are - and what's wrong with all the  attacks on them. It's a techno tour de force that's just warming  up our WWDC coverage.**WWDC 2003**  The first new microprocessor in the Macintosh family in four  years is obviously the biggest story of the month, if not the  year, but there's a lot more going on here in San Francisco: the  actual Power Macintosh G5 system and Mac OS X 10.3, code-named  "Panther," come to mind. We've been collecting information all  week long to understand where the platform is going this year,  and we have lots more to say, so keep some room in your mailbox  free for MDJ_!MacCyclopedia[tm]: The PowerPC G5---------------------------------**New processor incites drooling and ranting**  Late last week, a small GIF image containing specifications for  a "Power Macintosh G5" computer appeared briefly at Apple's  online store. The image was gone within an hour, but not before  several news sites, including MacMinute [1] and MacNN [2],  noticed it and posted the specs. Both sites later complied with  Apple Computer's demand and removed the material. The mysterious  nature of the leak, combined with IBM's announcement last  October of the PowerPC 970 (MDJ_ 2002.10.16), powered waves of  Internet speculation over the weekend. Was the leak real or  intentional? Had Apple's Web site been hacked? (Some pointed a  finger at the MacHack conference, although MacHack has nothing  to do with that kind of hacking.) Most importantly, were the  specs real?  [1] <http://www.macminute.com/2003/06/19/g5>  [2] <http://www.macnn.com/news/19826>  At the end of his WWDC keynote on Monday, Steve Jobs made it  official: the image had been posted in error, but the  information it contained was true. Apple had accidentally leaked  its own product announcement. The new Power Macintosh G5 models,  due to ship in August, represent Apple's first major desktop  processor upgrade since moving to the Motorola PowerPC  7400-family chips (the "PowerPC G4") in August 1999. The most  exciting aspect of the G4 upgrades was the addition of the  Velocity Engine for chewing through massive computations at  speeds that, we're sure, gave executives at Intel more than a  few sleepless nights. The PowerPC G5 chips (Apple's marketing  name for IBM's PowerPC 970 family), in turn, earns its stripes  for being the first 64-bit implementation of the PowerPC  architecture - something the designers planned for almost a  decade ago.  Naturally, the announcements have provoked reactions ranging  from undisguised lust for the raw power these machines promise  to bring to the Mac platform, to naked rage at the suggestion  that the new chip could possibly match or beat the performance  of competing Intel-compatible processors. The keys to remaining  sane in this maelstrom of opinion are remembering how we got  here, looking carefully at what the G5 machines actually do, and  turning a critical eye to the emotional rants penned by  partisans of one technology or the other. In this issue, we  start with the processor and its evolution.**Generations past**  Apple seems to have decided that it gets to declare PowerPC  processor generations. The first-generation PowerPC chips  appeared in the original Power Macintosh machines nine years  ago, starting with the PowerPC 601 running at 60MHz to 80MHz.  The second generation produced the lower-power PowerPC 603 and  higher-performance PowerPC 604, plus their later enhanced 603e  and 604e versions. These chips powered all of Apple's Power  Macintosh models through November 1997. Those two generations  may have gone by in only three and a half years, but remember,  Apple burned through three CEOs in the same period.  In November 1997, Steve Jobs introduced the first Power  Macintosh G3 computers, based on the PowerPC 750 family from IBM  and Motorola. It was a signal point for Jobs's new reign: Mac OS  clone makers such as Power Computing had been showing machines  based on the PowerPC 750 processor months earlier, and said they  were ready to go into production in August of that year - but  Apple wouldn't approve the Mac OS license for such machines.  Just weeks later, Apple bought out Power Computing and  effectively ended the cloning program. Had Macintosh customers  held Apple responsible for delaying these powerful machines for  months, sales could have been very slow, and the troubled  platform might have suffered a fatal or near-fatal blow.  Fortunately for Apple, customers were loyal to the platform and  purchased scads of Power Macintosh G3 machines. In May 1998,  Apple announced the first PowerBook G3 and the first iMac  computers, both based on the PowerPC 750 line, cementing the  processor as the core of Apple's product offerings. The Power  Macintosh G3 (blue-and-white) followed in January 1999, as did  the G3-based iBook in July 1999.  By that time, though, Motorola was ready with its next round of  PowerPC chips, and it was going it alone. The PowerPC alliance  founded by Apple, Motorola, and IBM in 1991 essentially  collapsed under the strain of the end of Mac OS cloning,  something IBM and Motorola expected to be a significant revenue  source for the next decade. Motorola's next-generation PowerPC  7400 included a vector processing unit called AltiVec. The  PowerPC already had thirty-two separate 32-bit integer registers  and thirty-two separate 64-bit floating point registers, but  AltiVec added thirty-two separate 128-bit vector registers, as  well as 162 powerful new instructions for ripping through  calculations on multiple pieces of data simultaneously. A single  AltiVec instruction could perform four separate 32-bit  multiply-and-add operations at once, compared to probably 15-20  cycles to do the same operation in the other registers.  AltiVec was incredibly powerful and a boon to Apple and  Motorola's other customers, but IBM was not interested, deciding  instead to focus its PowerPC development on lower power  consumption and size reduction. IBM didn't want to expend the  microprocessor real estate or the power on a huge, complicated  vector-processing unit, and so it passed on the PowerPC 7400. By  this point, Motorola had already bought out IBM's share in the  Somerset chip design facility in Austin, Texas, where the  PowerPC family had been designed and taken to market. The split  over AltiVec seemed like the last straw: Motorola and IBM would  be going their separate PowerPC ways.  Apple chose Motorola's path, because the AltiVec unit was  perfect for high-end computations such as signal processing,  video and audio work, and even scientific computing like DNA  analysis. At the end of August 1999, Steve Jobs introduced the  Power Macintosh G4 on-stage at Seybold Seminars in San  Francisco. Apple rebranded the AltiVec unit as the "Velocity  Engine" and declared the Power Macintosh G4 to be the world's  first desktop "supercomputer," based on calculations that showed  a full-speed 500MHz AltiVec unit powering through two gigaflops -  two billion calculations per second.**The decline and rise of the G4** -- Apple introduced the Power  Macintosh G4 in three speeds - 400MHz, 450MHz, and 500MHz - with  the low-end system using a G4 grafted onto a revised Power  Macintosh G3 (blue-and-white) motherboard with PCI graphics  instead of the AGP graphics system available on the higher-end  models. All three models were supposed to ship by October, but  in a blow that would damage Apple for the next four years, it  didn't work out quite that way.  Motorola was unable to supply Apple with enough certified  PowerPC 7400 chips running at 450MHz and 500MHz to ramp up the  production line. At the same time, rapidly rising DRAM costs put  a big squeeze on Apple's profits at an already tough time for  the computer industry as a whole. Apple had no high-end machines  to sell, and the price of the low-end machine was too low given  the new component costs. The company responded by scaling back:  on 1999.10.13, Apple cancelled existing orders for the Power  Macintosh G4 computers and revamped the product line to offer  350MHz, 400MHz, and 450MHz machines - at the same prices as the  initially-promised faster versions (MWJ_ 1999.10.16). Apple also  had to enlist the third member of the old alliance, IBM, and  wedged through an agreement to let IBM manufacture PowerPC G4  chips since Motorola obviously couldn't meet demand. That extra  supply, however, did not start until well into 2000.  Keeping the price the same while lowering the speed and the cost  of the processors helped Apple survive exploding RAM costs. But  customers were less understanding, and sales suffered. Apple had  announced the G4 machines two months into the September 1999  quarter, effectively ending Power Macintosh G3 sales at that  point. Apple had been counting on Power Macintosh G4 sales to  take up the slack, of course, but it couldn't ship the new  machines until after the October reconfiguration. And even after  finally shipping the slower machines in quantity during the  December 1999 quarter, customers were so wary that  year-over-year sales in that quarter grew only 10% over the then  year-old Power Macintosh G3 (blue-and-white) model.  Worse was Motorola's inability to get over this hump. By  February 2000, Apple had reintroduced 500MHz machines and was  shipping them in quantity, but the March of Progress had been  delayed by six full months. Personal computer processor speeds  tend to increase by about 50% over the course of a year, and by  about 100% over two years: a top-of-the-line 2GHz processor  today should yield to a top-of-the-line 3GHz processor in one  year, and that 3GHz chip should yield to a top-of-the-line 4GHz  processor by the end of a second year. So theoretically, by  September 2000, Motorola should have been close to shipping  750MHz PowerPC G4 chips.  The six-month delay blew that out of the water. At Macworld Expo  in July 2000, Apple introduced the first major dual-processor  Power Macintosh G4 systems. These were indeed potent systems  that took good advantage of high-end software capabilities, and  the still-unreleased Mac OS X promised to do them even better.  Even so, they were something of a cheat: since Motorola didn't  have faster chips, and since IBM was not developing the PowerPC  G4 (just building them under license from Motorola on Apple's  behalf), Apple had no way to speed up the Power Macintosh G4  except by using more processors. Thanks to lower part costs for  the PowerPC family than for state-of-the-art Pentium chips,  Apple held the line on price - but again, customers noticed.  It wasn't until January 2001 that Apple finally moved up to  733MHz PowerPC G4 chips, followed in July 2001 by 867MHz chips  in the Power Macintosh G4 (QuickSilver) machines (although the  "high-end" model featured dual-800MHz chips, hinting that the  high-end 867MHz chip wasn't available in enough quantity to make  dual-processor systems). The high-end reached 1GHz in January  2002, 1.25GHz in July 2002, and 1.42GHz in January 2003. Note  the progression: in the past two years, speeds have increased  slightly less than 100% in two years, and slightly under 50% in  one year.  In other words, Motorola is still behind the curve, but the  company has come close to fulfilling our rule of thumb since  belatedly breaking the 500MHz barrier in 2001. Unfortunately,  Intel and AMD didn't have a six-month gap in their performance  curves, and they've also managed to keep a little ahead of the  industry norm, so the clock rates of PC systems based on the  dominant x86 processor architecture have continued to race ahead  of PowerPC clock rates. Had Motorola kept up, Apple might have  had 2GHz PowerPC G4 chips by now, or possibly even as far back  as late 2002.**The G5 arrives**  Meanwhile, back in New York, IBM had finally noticed the  advantages of high-end vector-processing. IBM had passed on  AltiVec, but now Apple was using the speed of AltiVec to sell  Power Macintosh G4 systems into the mathematic and scientific  markets that IBM wanted for itself. What's more, servers were  being called upon to do more and more computation for tasks like  encryption and on-demand video compression. A vector processing  unit would really help, and IBM didn't have one. Fortunately,  Motorola's performance missteps provided an opportunity for IBM.  Although details remain sketchy, at some point the two companies  began working together on a new processor to replace the PowerPC  7400 family in Apple's professional systems.The result,  announced late last year by IBM, is the PowerPC 970 family [3].  Branded by Apple as the PowerPC G5 [4], this "fifth-generation"  PowerPC chip is the heart of the Power Macintosh G5 systems that  will ship in August.  [3] <http://www-3.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_970_Microprocessor>  [4] <http://www.apple.com/g5/>  The PowerPC 970's primary new feature is a 64-bit architecture.  We'll try to keep the math short, but some of it's necessary: a  single bit can hold two values, 0 or 1. If that bit is the  address of a byte in memory, your system has total storage of -  well, of two bytes. If you add a second bit, you have four  choices: 0, 1, 2, or 3, and can address four bytes of RAM. Each  bit you add doubles the amount of RAM you can address. The Apple  II family used 16-bit addressing for a total of 216 addresses,  or 64 kibibytes of RAM. All of today's PCs use 32-bit  addressing, so the processor can talk to a maximum of 232 bytes,  or 4 gibibytes. The PowerPC 970 uses 64-bit addressing, and can  therefore address 264 bytes, or 16 exbibytes. That's over 16  million gigabytes.   What's with the "gibi" and "kibi" prefixes? We've mentioned   before that a new international standard approved in 1998   defined different prefixes [5] for the computer terms,   referring to 230 bytes as a gibibyte instead of a gigabyte.   Apple's PowerPC G5 white paper says the processor can address   "18 exabytes," but that's using the now-standard power-of-ten   definition [6] of an exabyte as 1018 bytes, instead of the   traditional computer definition of 260 bytes. These differences   aren't much for a kilobyte (1000 bytes) or kibibytes (1024   bytes), but at exabyte levels, they really start to matter.   While these new terms haven't caught on in standard usage yet,   they do explain the difference: the PowerPC G5 can address   exactly 16 exbibytes, or just over 18 exabytes.  [5] <http://physics.nist.gov/cuu/Units/binary.html>  [6] <http://mathworld.wolfram.com/Exbibyte.html>  Now, being plainspoken Midwestern folk and somewhat set in our  ways, we don't care much for the newfangled terms, so we keep  using the old terms. When MDJ_ says "exabyte," we almost always  mean the computer definition. In cases like this, though, we'll  begrudgingly use "exbibytes" to make the values clear. Our  inconsistency in the service of not polluting our prose with  inelegant neologisms is deplorable, but we're not the only ones  who can't bear to use the new terms all the time - Apple's white  paper uses "exabytes" correctly, but then goes on to say "four  terabytes" when it means "four tebibytes." We suspect that the  new abbreviations will catch on more when these large sizes  become more common, and the differences between powers-of-ten  units and powers-of-two units therefore more important. The  binary abbreviations have an extra "i" in them, so one kibibyte  is "1KiB," compared to "1KB" for one kilobyte.  At today's costs, even the most serious professional can't  afford to put 16EiB of RAM in a desktop computer. There's no  reason to put 64 address lines on the processor when nobody will  use them anyway, and in fact the PowerPC 970's address bus is a  mere 42-bit address bus. Our use of "mere" is facetious; 42 bits  still gives the processor access to a mind-bending four  tebibytes of RAM, or 4096GiB. That's 1024 times as much memory  as a 32-bit PC can address even with all its address lines  intact. Address spaces, however, are still 64-bit. If a program  fills up all 4TiB of available RAM on a PowerPC 970-based system  and asks for more, the OS returns an "out of memory" error. This  could be a serious problem if anyone's still using the PowerPC  970 in eight to ten years.  Moving a program to a 64-bit address space is a _big_ deal. It's  nowhere near as simple as recompiling code. Every program uses  pointers to chunks of RAM, and every one of those pointers on  the Macintosh is and always has been four bytes long. If  pointers suddenly become 64 bits long, they don't fit in any of  the old data structures. Recompiling a very simple application  could change the data structures and make it "64-bit ready," but  in every program of consequence, there are anywhere from a dozen  to ten thousand places that imply a pointer is four bytes long.  It works its way into data files, plug-ins, and lots more.  That's not even counting the places a program calls the  operating system or an external library or framework; these  pieces of code also expect 32-bit pointers, and all of them must  be revised to accept 64-bit pointers as well. Until that's done,  the program can't fully achieve 64-bit nature. Nor is it even  that simple: old programs need 32-bit pointers and new ones need  64-bit pointers. If Apple simply changes the APIs to use wider  addresses, old programs break.  Intel has publicly stated [7] that it doesn't anticipate that  average desktop machines will have 64-bit processors until  perhaps the end of the decade, citing a lack of 64-bit  applications and the challenges of moving developers and users  to a 64-bit chip. Intel's biggest reason, though, is that  there's not much sign yet that the average computer user is  anywhere close to needing more than 4GiB of RAM. If 32 bits are  enough to address all the RAM you have or need, why go through  the pain of migrating to a 64-bit system just for those few  applications that can use it today? In reality, then, the  applications you use will very likely continue to be 32-bit  applications for the foreseeable future.  [7] <http://zdnet.com.com/2100-1103-985432.html>  That's why the PowerPC design specifications have long called  for 64-bit PowerPC chips to fully support the 32-bit design the  chips have used since inception. It's not a "compatibility mode"  or some kind of "emulation" - it's the same full, native, 32-bit  support that's always been there. Behind the scenes, though, the  operating system can take advantage of 64-bit features, even if  applications think the G5 is just another PowerPC chip.  Moving data around, for example, is one of the most common tasks  in any program. The Mac OS has always provided routines to move  data for programs, variants on an original Mac OS routine named  BlockMove. In the past twenty years, Apple has optimized  BlockMove any time new hardware offers a chance for improvement.  In 68040 days, Apple added a BlockMoveData call for moving data  so that it wouldn't flush the instruction cache on the  microprocessor. In the first PowerPC systems, Apple implemented  BlockMove using the floating-point registers instead of the  general-purpose registers because the FP registers are 64 bits  wide, allowing moving twice as much data per cycle. We suspect  current implementations use vector registers if possible.  On a PowerPC 970 chip, BlockMove would have access to thirty-two  64-bit floating point registers and thirty-two 64-bit  general-purpose registers, doubling the number of wide registers  available to move data around. Calculations that might overflow  a 32-bit value can happen in a single 64-bit operation, rather  than in two or three 32-bit operations. The operating system  could even use a single 64-bit address space to easily map up to  four billion 32-bit address spaces, perhaps simplifying memory  management and speeding inter-process operations. That way, each  application would have access to up to four gigabytes of memory  allocated by the operating system from a potentially much larger  pool of physical RAM.  Of course we're speculating wildly here, but if we can come up  with ways Apple could get improved performance out of a 64-bit  chip without requiring application changes, you can be sure  Apple's engineers, who have been living the G5 for the past six  months, can think of many more ideas that will work a lot  better, and we wouldn't be surprised to see some of them in Mac  OS X 10.3, code-named 'Panther."**Raw performance** -- If the PowerPC G5 was nothing more than a  64-bit extension of the PowerPC family, that would be cool. If  it also ran at 2GHz to catch up to where the G4 should have been  by now, that'd be even better. But in fact, there's much more  under the hood. Every microprocessor is divided internally into  separate functional units. The PowerPC G4, for example, has a  load and store unit to get data out of the caches and memory, a  floating-point arithmetic unit, a vector unit, and four integer  arithmetic units - three simple ones and one that can also  multiply and divide.  The PowerPC G5 has two load and store units and two  floating-point units, so right off the bat the chip can process  twice as many floating-point instructions or memory accesses at  once than its predecessor. The PowerPC G5 also offers a square  root computation in hardware, saving dozens of cycles for that  common computation. It has only two integer units instead of the  G4's four, but they're both capable of performing multiplication  operations, and one of them can divide as well. In the G4, all  integer multiply and divide instructions had to be routed to the  single "complex" integer unit that could perform the tasks, so  if you had a loop with a multiply instruction in it, there  wasn't much parallelism going on anyway.  So far, the PowerPC G5 has 64-bit power, a faster clock rate,  and more powerful functional units. It can execute more  instructions at once, faster, and on larger quantities of data  than any of its predecessors. But even that's not all.  Unfortunately, to enter the land of maximum G5 performance,  Apple must abandon a PowerPC G4 marketing point.**The pipeline** -- By July 2001, the "megahertz gap" was really  looking bad for Apple, with Pentium chips approaching 2GHz while  the PowerPC line was approaching 900MHz. As part of the Macworld  Expo presentation that month, Steve Jobs and hardware VP Jon  Rubinstein took time to explain that one reason the PowerPC G4  was a better chip was its shorter pipeline (MDJ_ 2001.07.19).  All modern microprocessors execute instructions in various  stages. Rather than execute every stage for one instruction and  then start with the next, microprocessors keep instructions  moving forward like an assembly line. As soon as the first stage  of one instruction is complete, the processor executes the first  stage of the next instruction while a different part of the chip  executes the second stage of the first instruction. This  assembly line is called the pipeline.  A pipeline is a great way to keep a processor running at peak  efficiency, but there's a small problem: the chip has to know  what instruction comes next. Computer code is all about making  decisions based on computations: a subroutine might perform the  same operation on 4000 pixels in a row, but when it gets to the  end of the row, it has to branch to the next row and start over.  After processing each pixel, the code asks, "Is this the end of  the row?" For the first 3999 times, the answer is "no." On the  4000th time, though, the answer is "yes."  To keep the pipeline full, a microprocessor tries to predict the  answer to the question before it's asked using a technique  called branch prediction. The pipeline units see the question  coming and, based on past answers, guess the probable outcome.  In our example, the processor will almost certainly predict that  the answer is "no, this is not the end of the row," because 3999  times out of 4000, that's the right answer. The pipeline then  fetches instructions assuming that the code will go back and  process another pixel. It starts decoding those instructions,  and thereby keeps the pipeline full 3999 out of 4000 iterations.  This partial execution of instructions that might be executed is  called speculative execution.  When code reaches the end of the row, of course, the prediction  is wrong. When the branch instruction actually gets to its final  stage, the processor sees that it guessed wrong. Unfortunately,  the pipeline is full of partially-executed and decoded  instructions because the processor assumed it was not the end of  the row. The processor has to flush all of those partial results  and start over with the real next instruction. So, for example,  if it takes twenty stages to finish executing an instruction,  the processor loses twenty cycles while the pipeline refills and  the next real instruction goes through all twenty stages.  Therefore, the longer the pipeline, the greater the penalty for  missing a branch prediction. There are other ways to foul up a  pipeline, too: if one instruction depends on the results of the  previous one, the pipeline stalls - the instruction can't  advance until the previous instruction finishes so its result is  known. The same thing can happen when the processor has to wait  for data to be retrieved from main memory. That doesn't flush  the pipeline, but it leaves empty stages in it called bubbles  that are missed opportunities to execute code. Between missed  branches and pipeline bubbles, a long pipeline may spend a lot  of its time with empty stages, reducing the processor's  performance.  In July 2001, Apple listed these and other facts as reasons long  pipelines are bad. The PowerPC G4 chip had a seven-stage  pipeline (up from just four stages in the 400MHz-500MHz models),  so even a complete pipeline flush stalled execution for just  seven cycles. The Pentium 4's pipeline has over twenty stages,  and Apple showed animations [8] to demonstrate what happens when  a long pipeline has bubbles or flushes, and even cleverly called  this performance penalty a "pipeline tax."  [8] <http://www.apple.com/g4/myth/>  That phrase was not in Apple's vocabulary Monday at WWDC,  because the PowerPC G5 pipeline takes anywhere from twelve to  twenty-five stages to execute an instruction. The simple truth  of the matter is that to drive clock rates higher, processor  designers have to make longer pipelines by splitting large tasks  into smaller parts. When Motorola wouldn't do it, Apple said  longer pipelines were bad. Now that IBM's implementation is more  in line with common industry practices, the "pipeline tax" has  vanished.  Instead, Apple now emphasizes a benefit of a longer pipeline:  the PowerPC G5 may have over two hundred separate instructions  in process at the same time, if every stage of the pipeline for  every functional unit is full. The PowerPC G4 chip had, at most,  16 instructions in process at once. The G5 chip can fetch up to  eight new instructions on every cycle and finish up to five  instructions, all significant increases over the Motorola chip.  While there are definite trade-offs between these two design  philosophies, the Pentium has proven that the long-pipeline  approach works reasonably well. Even with bubbles, you can still  come out ahead, and if you miss a branch prediction and have to  blow twenty or so cycles, well, that's why you made the clock  rate faster.  Wouldn't it be nice if the rest of the system managed to feed  the PowerPC 970 enough data to keep it running at full speed?**The special bus** -- Steve Jobs called the original Power  Macintosh G4 the world's first supercomputer on a desktop  because he said it could crank out two gigaflops. How? Several  AltiVec instructions treat 128-bit vector registers as if they  were four 32-bit values, performing the same operation on all  four of them in one cycle. At least one such instruction  performs floating point arithmetic. Four floating point  operations repeated five hundred million times per second is two  billion floating point operations per second - two gigaflops.  It doesn't quite work that way, though. The processor has to get  the data before it can perform any kind of operation on it,  floating point or otherwise, and the PowerPC G4 doesn't have the  2GB of onboardhigh-speed memory it would take to actually  perform five hundred million uninterrupted vector instructions  without stalling. Furthermore, retrieving 2GB of data from the  32-bit-wide 100MHz system bus in those computers would take five  seconds. It really doesn't matter how fast the processor can  chew through numbers if the system can only supply the numbers  at a fifth of that speed.  Apple's critics have been especially derisive of the company's  choice of "slow" buses between the processor and the rest of the  system, typically called a front-side bus. On the latest Power  Macintosh G4 (FireWire 800), the central bus is a PCI bus that  tops out at 167MHz. Multiply 32 bits by 167 million transactions  per second, and then double it because the system uses double  data-rate (DDR) RAM that supplies data on both the rising and  falling sides of each clock cycle, and you've got exactly 1.3GB  per second, the throughput Apple claims for that model. That's  still not enough to keep up with AltiVec even on a lowly 500MHz  G4, and it's even worse when you consider that both processors  in the dual-processor machines share the same bus.  Until 2001, the PowerPC G4 couldn't even accept a front-side bus  faster than 100MHz, so Apple used large, fast level 3 caches  talking directly to the processor to try to make up for it.  (It's called "level 3" cache because the latest members of the  G4 family actually have not one but two on-board caches, making  the external cache the third tier.) Cache RAM is fast and  somewhat expensive, but for typical applications, it is an  effective way of keeping throughput up by storing the most  frequently used parts of main memory where the processor can get  to them quickly. Still, for our example, it wouldn't be enough:  the 2GB of data the processor would need in one second would  completely run through the cache, and the system can only fill  the cache as fast as main memory can retrieve the data: 1.3GBps.  It would take two seconds to fill the cache with data that one  processor would rip through in less than half a second. It's not  bad in the general case because most programs don't rip through  anywhere near that much data at that speed, but in extreme cases  (such as image and video processing) it can become a bottleneck.  Programs that need high throughput have better luck on Intel  systems. Intel and AMD have been driving RAM and bus speeds  higher and higher to feed more data to their processors. With  the main system delivering data to the processor at up to 533MHz -  as fast as Apple's level 3 cache - Pentium 4 chips enjoy a  steady stream of data without an off-chip cache. In fact,  neither Intel's nor AMD's current chips can even use an external  cache. What would they need it for?  Motorola's main PowerPC development efforts focus on embedded  systems, where accelerated front-side buses (and the RAM that  goes with them) are expensive and unnecessary. IBM, however, is  in the workstation and server business, so the PowerPC 970  abandons Motorola's three-tiered cache philosophy. The  front-side bus on the PowerPC G5 runs at half the processor's  clock rate: 800MHz on the 1.6GHz model, 900MHz on the 1.8GHz  model - and a full 1GHz for the 2GHz chip Apple's using in the  high-end Power Macintosh G5, the fastest bus ever in a personal  computer.  Apple's marketing sound bite, "DDR 64-bit bus," is close, but  not quite exact, as Apple's Power Macintosh G5 architecture page  [9] shows. Each processor has a separate data bus with two  independent, unidirectional, 32-bit data paths. It takes two  cycles to send 64 bits to the chip, but the chip can send 64  bits back to the rest of the system simultaneously, so it's more  efficient than a single bi-directional bus - there's no fighting  for the right to send data one way or the other.  [9] <http://www.apple.com/powermac/architecture.html>  Sixty-four bits per cycle times a billion cycles per second (the  definition of 1GHz) gives exactly 8GB per second, a sixfold  increase from the Power Macintosh G4 (FireWire 800). In the  dual-processor 2GHz model, the two processors have independent  buses, providing throughput a whopping twelve times greater than  on the Power Macintosh G4 (FireWire 800).  Not surprisingly, Apple no longer mentions the advantages of a  level 3 cache in any of its marketing materials. IBM is  following Intel's lead here: the PowerPC 970 can't even take an  external cache. The PowerPC G5's on-chip level 1 instruction  cache is twice as large as in previous PowerPC chips, and all  caches have significant improvements, but they're all on the  chip. The level 3 cache will continue to be a performance  bottleneck for iMac and PowerBook models, as well as for the  iBook should it ever get its long-denied G4 chip, but on the  desktop, the level 3 era is over.**The downside** -- So we've got a fast new processor with more  and better functional units, a faster clock speed, a monster  front-side bus, and bigger registers that can address more  memory and execute many more simultaneous instructions than ever  before. Is there a catch?  Not much of one, to be frank. The PowerPC 970 is derived from  IBM's POWER4 family, and is not an update of the PowerPC 7400 or  750 families. It's a new core designed by a different company,  and as such, it has some architectural differences from the G4,  but these differences are not drawbacks per se. For example,  we've already explained how the deeper pipeline means that  missed branch predictions are much more expensive. The chip  compensates with beefier branch prediction logic that, according  to IBM, is right about 95% of the time. The chip can predict up  to two branches per cycle, using up to three branch history  tables and two algorithms to refine the selection, and it allows  applications to provide hints to the processor about which way  to predict a branch will go for even better performance. Even  with the occasional pipeline flush, it's still much faster than  the G4 and (almost by definition) competitive with competing  chips that use pipelines of similar size.  The PowerPC G5 chips will be faster across virtually every  application you can imagine, thanks to the clock rate bump and  extra functional units. But its different architecture mean that  some highly-tuned G4 code will need to be re-tuned to get the  _very_ most out of the G5 series. For example, the PowerPC G5  uses 128-byte cache lines instead of the 32-byte cache lines  employed by the G4. Code using a specific instruction to zero  out 32 bytes of RAM now causes an inefficient load of 128 bytes  of RAM from main memory, zeroing 32 bytes of that chunk and  perhaps writing it back. It works, and it's probably faster than  on a 1GHz G4, but it's a horribly inefficient use of the  processor's resources.  Technical Note #2087 [10], "PowerPC G5 Performance Primer", is  full of very technical knowledge nuggets like that that  developers of highly-optimized applications will have to take  into account when re-tuning their applications for the G5.  Wasting a few bytes here and there to align code on 32-byte or  128-byte boundaries now makes some operations much more  efficient than in previous PowerPC systems. Rearranging a few  instructions can make the difference between using both  floating-point units at once and leaving one idle while it waits  for results from another. To be sure, it'll take some tweaking  from third-party developers and Apple engineers alike to unlock  the massive potential of the PowerPC 970 family, but that's a  better problem to have than trying to squeeze more performance  from a chip that has nothing left to give.  [10] <http://developer.apple.com/technotes/tn/tn2087.html>**A SPEC of truth**  There only seems to be one hard and fast rule among the punditry  for any performance tests Apple publishes for its machines: they  _must_ be wrong. If Intel's x86 architecture comes up slower on  a given benchmark, it means the benchmark has been "cooked" to  make Apple look good and Intel look bad. It's the only  explanation the punditocracy can imagine for any test that shows  a PowerPC system keeping up with or surpassing an Intel system.  If you've been paying attention for the past five years, you  already know the drill. In 1998, Apple released BYTEmark  benchmarks that showed the 233MHz PowerPC G3 chip in the new  iMac kept up with Pentium II chips at speeds up to 400MHz, even  surpassing the Intel chips in integer performance. The cries  from the peanut gallery were deafening. They attacked the  BYTEmark tests as wrong or biased, even though they were  developed by non-partisan BYTE Magazine and had been a trusted  reference for many years. No pundit or analyst had ever had any  problems with BYTEmark until the tests showed the PowerPC G3  keeping up with the Pentium II. The brouhaha became so pitched  that BYTE had to post a FAQ [11] to address the propaganda. In  July of 1998, just a few months after the iMac's introduction,  BYTE ceased publication. We don't think there's a connection -  but who knows?  [11] <http://www.byte.com/bmark/faqbmark.htm>  More recently, Apple has taken to demonstrating PowerPC G4  performance by showing "real-world" applications, such as Adobe  Photoshop and Discreet Cleaner, showing the machines performing  a series of tasks far faster than top-of-the-line Pentium  systems can. Technology columnists attacked those tests as well,  asserting flaws ranging from "Photoshop is rigged to prefer the  PowerPC" to "there had to be some kind of hardware problem on  the PC." There's never been a scrap of evidence for any such  claims, and Adobe vehemently denies that the PC version of  Photoshop is somehow unoptimized.  So the naysayers then claimed that the reason Apple doesn't use  "real" benchmarks anymore is because they know that the PowerPC  can't measure up. (We imagine eyes rolled in Cupertino - the  press didn't like the synthetic benchmarks, but now they want  them back?) A little over a year ago, the German magazine c't  asserted that a 1GHz Power Macintosh G4 was only about as fast  as a 1GHz Pentium III system. To support that outrageous  distortion, the magazine used the well-respected SPEC tests  [12], but managed to break most of the standard rules for  reporting, including failing to document exact technical details  of the systems (MDJ_ 2002.03.13). To add insult to injury, c't  went even further awry: the PowerPC version of the GCC compiler  used to build the benchmarks was actually 18 months older than  the Intel GCC version used, and the reported results completely  obscured the role of operating system and other running programs  in the test. In fact, the magazine's tests showed a 20%  difference when running the same test, with the same compiler,  on the same system - depending only on whether it was run under  Linux, or under the higher overhead of Windows.  [12] <http://www.spec.org/>  Compilers - the programs that turn the programming languages  that programmers use into the binary code that processors run -  are extremely important to the SPEC tests. One binary program  can't possibly run on more than a single processor architecture,  so SPEC provides the source code to its suite of tests. SPEC's  synthetic benchmarks are designed to measure the quantity of  certain operations a system can perform in a given amount of  time. The results are normalized to numbers that are supposed to  provide a rough comparison of performance across systems.  That's _systems_, not _processors_. As you've seen with the  front-side bus, there's more to performance than clock rate. Any  system's performance varies depending on how fast the  motherboard can feed data to the processor. SPEC rules require  disclosing not only the processor speed but also the nature of  the motherboard used for the test, the organization that  performed the test (preferably a SPEC licensee), and the date of  the tests. The c't SPEC tests observed almost none of these  rules, even though the magazine's parent company is a paid  member of SPEC. In fact, c't implied that Apple had something to  hide by not releasing SPEC tests for the Macintosh until late  2001, even though it was in fact SPEC that had refused to  provide the tests because the classic Mac OS doesn't have the  command-line environment SPEC's CPU tests require.  The PowerPC G5 is the first new chip Apple has adopted since  SPEC released Macintosh versions of the benchmarks, and so quite  naturally, Apple performed SPEC tests to demonstrate how fast  the new Power Macintosh G5 systems are. Steve Jobs demonstrated  the results in his WWDC speech, and the details are available in  Apple's Power Macintosh G5 Performance White Paper [13]. The  tests compared a prototype Power Macintosh G5 system running a  prototype version of Mac OS X 10.2.7 (the G5-aware version)  against a Dell Dimension 8300 system with a single 3GHz Pentium  4 processor, and a Dell Precision 650 server with dual-3.06GHz  Intel Xeon chips.  [13] <http://www.apple.com/powermac/pdf/PowerMacG5_Perf_WP_062303.pdf>  You can check out the results for yourself, but roughly  speaking, the G5 outperformed the Xeon system by 30% in  floating-point operations and the Pentium 4 systems by 21%,  though the G5 system was about 5% slower than the Xeon system  and 10% slower than the Pentium 4 system in integer  calculations. On similar SPECrate tests that take advantage of  multiple processors, the dual-2GHz Power Macintosh G5 was 3%  faster than the dual-Xeon system and 67% faster than the single  Pentium 4 system (you can't put more than one Pentium 4 chip in  a system). SPECrate floating-point tests showed the Power  Macintosh G5 as 42% faster than the dual-Xeon system and 95%  faster than the single Pentium 4 system. The complete results,  including full SPEC-required system and configuration  disclosures, are available [14] from VeriTest, a company of  benchmark experts [15] Apple hired to conduct the tests.  [14] <http://www.veritest.com/clients/reports/apple/apple_performance.pdf>  [15] <http://www.veritest.com/services/benchmark.asp?visitor=X>  And thus began the latest round of attacks.**The howling** -- The first complaints about the testing came  from "spl," apparently one of the developers at Haxial, makers  of Mac OS and Windows software. In a "soapbox" entry [16] on  Monday afternoon, the otherwise-anonymous "spl" said he was  "very disappointed that Apple was attempting to deliberately  _mislead_ me about the speed" of the Power Macintosh G5  (emphasis in original). The author's complaints, however, are  largely without substance.  [16] <http://www.haxial.com/spls-soapbox/apple-powermac-G5/>  The author complains that VeriTest's configurations disabled  hyperthreading for multiprocessor testing, but enabled it for  single-processor testing where it would have "little or no  effect." In reality, it's the reverse: hyperthreading makes one  processor pretend to be multiple processors, and would have a  lot more effect on a single-processor system than in one that  already had real, multiple CPUs.  Slashdot already had a phone interview scheduled Tuesday  afternoon with Apple's VP of hardware marketing, Greg Joswiak,  so the site's writer asked [17] Joswiak about the story. Joswiak  said that the tests produced better results on the Dell systems  with hyperthreading off than on, and that's why Apple posted  them - to show the best Intel performance possible and show that  the PowerPC G5 still beat it.  [17] <http://apple.slashdot.org/apple/03/06/24/2154256.shtml>  "spl" is absolutely convinced this can't be true, and is  untroubled by his massive ignorance of both benchmarks and the  PowerPC G5 chip itself. For example, "spl" puts huge weight on  Apple's use of a "high performance, single threaded malloc  library" on the G5 system that "is geared for speed rather than  memory efficiency and is single threaded which makes it  unsuitable for many uses." (malloc is a standard C library  routine that allocates memory, and is how almost all portable C  programs, including SPEC's benchmarks, obtain RAM to use in  calculations; its performance is key to programs that request  memory often.) He sees a huge conspiracy in that Apple used this  "optimized" malloc library on the G5 systems but not on the Dell  systems, and says so several times using words in all capital  letters.  The truth, however, is that the malloc routines in the standard  GCC distribution for Intel systems are _already_  single-threaded. The default malloc routines in Apple's GCC  release are safe for multiple threads executing at once because  all Mac OS X programs are supposed to be thread-safe. This  introduces extra overhead, but makes sure that if more than one  program thread calls malloc at the same time, the library won't  step on itself and crash. That makes the Macintosh malloc  implementation slower than the version for Intel GCC, which is  not thread safe and therefore doesn't have the overhead.  Apple used a new single-threaded malloc implementation to match  the capabilities of the Intel version. To do otherwise would be  to saddle Power Macintosh G5 SPEC benchmarks with the rules of  desktop applications when the Intel versions had no such  restrictions. "spl" insists, without any evidence other than a  testing note, that this must give Apple's system some huge  advantage, when in fact it merely puts malloc on even footing on  both platforms. He also sees conspiracy in wondering whether  this new malloc will be the default on the Power Macintosh G5,  so let's clear that up: it won't, as it's not thread-safe.  Normal Mac OS X programs must run under more strict rules than  the command-line SPEC programs, and that involves additional  overhead for safety.  "spl" says that Apple used the "-fast" compiler option on the G5  test system to make computations faster by relaxing IEEE math  rules, but did not use the "equivalent" option of "-ffast-math"  on the Intel systems. That's also completely wrong. First,  "-ffast-math" is a GCC option, not a platform-specific option  like "-fast". Second, the normal GCC optimizations do not  include "-ffast-math" because, at times, it gives wrong answers  compared to strict IEEE computations. The SPEC tests are  validating tests and require that systems get the expected  results exactly correct, which "-ffast-math" can't guarantee.  But it's not cheating: "spl" is unaware that the PowerPC G5 has  no non-IEEE math mode. The "-fast" flag tells GCC to use a  G5-specific mode that doesn't generate extra code for IEEE  compliance because the chip automatically follows IEEE rules,  and in fact can't be made to perform any other way. (By the way,  if you'd like more information, try searching Google for  "ffast-math" and look at all the people, including Linus  Torvalds himself, complaining about the poor, slow math on X86  processor architecture.)  Tellingly, this is the author's second attempt to find some  justification for why the PowerPC G5's numbers must be fixed.  His first was to complain that Apple used the "-mfpmath=sse"  compiler directive used in the Intel tests didn't enable Intel's  latest vector math units, but of course, even a simple visit to  the GCC documentation would have showed that to be completely  incorrect. Confronted with this, however, "spl" chose to come up  with a different reason why he must have been right in the first  place - a reason that is equally wrong. He didn't come up with a  valid reason he might have criticized: the compiler optimization  flags tell GCC to let the PowerPC G5 use 64-bit math, so it  doesn't generate 32-bit code that would take many more cycles to  add larger numbers.  The biggest part is "spl's" assertion that the Power Macintosh  G5 can't be the "world's fastest PC" because the SPEC site  contains test results for Intel-based systems with higher  numbers than Apple's tests. For example, Apple's test rated the  Dell Precision 650 (dual-Xeon) server with a SPECfp_base2000  score of 646, but Dell reported the same system with a score of  1053. The author says, "This is because Apple used the  performance-enhancing 'relaxed IEEE math' option when they  benchmarked the G5, but _not_ when they benchmarked the Dell  computers."  Uh, no. The Dell results [18] are faster because they use the  Intel's super-optimizing C++ and Fortran compilers. These  compilers aggressively optimize for speed, at the expense of RAM  to the point of being unusable in some real software situations.  What's more, the Intel compilers are widely known to optimize  specifically for the SPEC tests, generating better code for SPEC  sources than for non-SPEC code that's very similar. Who says so?  Try Peter Glaskowsky, editor of the Microprocessor Report, who  has forgotten more about microprocessors than "spl" ever knew.  According to CNet News [19], Glaskowsky "noted that Intel's  chips perform disproportionately well on SPEC's tests because  Intel has optimized its compiler for such tests."  [18] <http://www.spec.org/osg/cpu2000/results/res2003q2/cpu2000-20030407-02062.html>  [19] <http://news.com.com/2100-1042-1020631.html>  SPEC tests under such conditions are still valid for comparing  other Intel-based systems using the same compiler, but they'd  only be valid against other platforms that are also using a  super-optimizing compiler. That brings us to another laugher:  "spl" says that it's unfair for Apple to use GCC and the NAGware  Fortran compiler because GCC "apparently is poorly optimized for  the Intel [Pentium 4]."  Anyone who would argue or repeat this is so clueless as to be  dismissed out of hand. GCC is _the_ Linux compiler. Every single  part of Red Hat Linux and of most other Linux, FreeBSD, OpenBSD,  and other open-source operating systems was built with GCC. The  Linux development teams have been heavily involved in optimizing  GCC's Intel code-generation, and it is very good. To argue  otherwise is basically to argue that the Linux community is  incredibly stupid, and that's just not the case.  If Apple had truly wanted to stack the compiler test, it would  have used xlC, IBM's super-optimizing POWER and PowerPC  compiler, the rough equivalent of Intel's C++ compiler. Instead,  as Apple has said several times, it used GCC because it was  available for both platforms with the same front end and good  code generation used by hundreds of thousands of _real_  programs. Had Apple gone the Intel route and super-optimized the  benchmarks, there's a decent chance they would have eclipsed  even Dell's posted results.  "spl" goes on to speculate that since SPECfp tests depend more  on Fortran than C, that it may be the NAGWare Fortran compiler  [20] that's "bad for Intel" rather than GCC. This is also  incredibly stupid. NAG's (short for "Numerical Address Group")  Fortran compiler doesn't output binary code, it outputs C source  code that you then feed to a C compiler - like GCC. The Fortran  output from NAGWare should be exactly the same on both  platforms, waiting only for GCC to compile it into binary code.  This is another misunderstanding of an element so basic that  you're now seeing why "spl" is completely in the weeds.  [20] <http://www.nag.co.uk/nagware_fortran_compilers.asp>  It goes on from here, but these are the basics. "spl" dismisses  the SPEC rate tests because he incorrectly says, "most programs  are not written to take advantage of a second processor," either  ignoring or not knowing that Mac OS X automatically schedules  threads on any available processor. He complains that Apple  didn't test the "fastest" Pentium 4 chip, using a 3.0GHz  processor that had been "superceded by the 3.2GHz Pentium,"  while failing to note that the 3.2GHz model was introduced  _the_morning_of_the_keynote_, far too late for Apple to have it  independently tested and included in the presentation.  Similarly, "spl" blasts Apple for quoting too high a price for  the Pentium 4 system, but again, prices only fell that morning  when the new 3.2GHz systems appeared.  He says that since the AMD Opteron [21] is a 64-bit processor,  that somehow invalidates Apple's assertion that the PowerPC G5  is the "world's first 64-bit processor in a PC," but that  ignores that AMD specifically says the Opteron is for "servers"  and "workstations," not desktop PCs. Sure, the line has blurred  in recent years, but if you self-classify your chips as for  workstations, you're saying they're not for PCs, and AMD  probably knows more about it than "spl."  [21] <http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_8825,00.html>**Sound and fury** -- Several news reports in the past few days  have noted the "controversy" over Apple's benchmarks, but nearly  every report has linked to "spl's" completely misinformed and  dangerously ignorant screed. The author clearly does not have  the expertise necessary to understand the role of compilers,  memory allocation libraries, IEEE math, or many other aspects of  benchmarking to correctly assess the issues he raises, nor do  thousands of stories that link to it.  It's even more distressing to see professional writers who  should know better echoing the same incorrect themes. Tom Yager  of _InfoWorld_ has become a true Macintosh fan over the past  several months, praising the platform in print at least five  times since December 2002 (MDJ_ 2002.12.04, 2002.12.17,  2003.01.24, 2003.02.12, 2003.04.23). Having shown an  understanding of Mac OS X, it's quite disappointing to see Yager  write [22] that Apple's SPEC results are "invalidated by  severely lopsided testing conditions," an assertion that is  simply without merit. Yager's evidence was that "Apple used a  prototype G5 running its special GNU compiler and an unreleased  version of Mac OS X," but used "shipping hardware, vanilla GNU  compilers, and Red Hat 9" for the Intel test.  [22] <http://weblog.infoworld.com/yager/2003/06/24.html>  Of course, there couldn't be tests if it weren't this way: the  Power Macintosh G5 is not shipping, so Apple has to test  "prototype" systems and a "special" compiler that, unlike  released compilers, has PowerPC G5 support. But more disturbing  is Yager's assertion that the tests aren't "objective" unless  all systems tested are tuned in hardware and software for  "best-case performance." Yager later withdrew his objections to  GCC, but maintains that vendors should be free to "pull out all  the technical stops."  Martin Reynolds of the Gartner Group makes a similar argument,  saying [23], "Apple said that using GCC on both systems allowed  a pure hardware comparison by eliminating common SPEC  optimizations. However, this approach also eliminated legitimate  optimizations, possibly slanting the test to favor the G5. Only  G5 performance results using a SPEC-optimized compiler will  resolve this question. Apple also used a set of multimedia  benchmarks that are difficult to verify and include heavily  optimized PowerPC code. These results - which we are inclined to  find credible - would be more convincing if not for Apple's  approach to the SPEC results."  [23] <http://www3.gartner.com/DisplayDocument?doc_cd=115876>  (In case you didn't follow that, Reynolds is simultaneously  arguing that Apple's SPEC results are invalid because they're  not heavily optimized, but that the multimedia tests are  unconvincing because they're heavily optimized. Being an analyst  is great work if you can disengage your sense of logic.)  In essence, both Reynolds and Yager are again making the  compiler argument: if Apple wants to claim the "world's fastest  computer," they're saying Apple should be able to beat the best  SPEC numbers any PC maker can generate with Intel's best  compiler. Even further, they're saying that tests using anything  less than a super-optimizing compiler are unfair, no matter how  unrealistic it is to use that compiler in other circumstances.  But, again, experts admit that Intel's compiler has  _special_code_ in it to make SPEC tests faster than other  programs with the same types of code would be. Rather than  condemning this policy of constructing the compiler to game the  test, Yager and Reynolds are saying that every manufacturer has  to do the same thing and produce compilers with the same kinds  of "optimizations." In this view, it's not about actually  comparing performance, it's about finding a way to get the SPEC  number just a little bit higher.  Is that really how the experts  want to treat the closest thing available to a cross-platform  benchmark?  One article [24] at the AMD Zone said AMD held all the titles  Apple did through a clever mechanism: author Chris Tom just  decided that the Power Macintosh G5 is really a workstation, and  therefore AMD-powered workstations had long eclipsed its  features. Another widely cited article [25] from _ExtremeTech_  is easily dismissed. Author Mark Hachman attacks Apple for using  BYTEmark tests five years ago (once again) without suggesting  what benchmark Apple should have used, since SPEC didn't create  Mac OS benchmarks. He questions the use of two developer tools  to turn on PowerPC G5 modes that the prototype Mac OS X 10.2.7  does not yet set on its own, and again questions the simple idea  of putting malloc on the same footing on all platforms.  [24] <http://www.amdzone.com/articleview.cfm?articleid=1296>  [25] <http://www.extremetech.com/article2/0,3973,1136018,00.asp>  He doesn't understand why compilers make a difference, and he  calls "spl's" article "a very damning analysis." There's nothing  new here, including the lack of technical understanding from a  writer who is questioning technical conceps he admits he does  not understand.  You, however, have to understand that there was no way for Apple  to avoid this kind of baseless condemnation. If Apple had used a  super-optimizing compiler in SPEC tests, pundits would point to  it and claim the tests were rigged. Had the company not released  benchmarks at all, detractors would have said it was because  they proved Apple's speed claims were really false. Apple's  reasonable choice - to make the SPEC subsystems as equal as  possible - is now condemned for not being insanely optimized.  These critics really need to consider what information they hope  to glean from synthetic benchmarks before pushing the industry  into using whatever unrealistic assumptions it can to get the  numbers higher.**The G5 future**  You know Apple is on to something when people feel compelled to  criticize it months before it arrives. It happened with the  iMac, and it's happening again with the Power Macintosh G5. A  PowerPC chip that kicks the pants off both Pentium 4 and Xeon  processors challenges a lot of assumptions, and it's far easier  for some people to question the performance than change their  beliefs. Even sources you should be able to trust are piling on  compilers this week, so it's not easy to know what's based in  fact and what's cognitive dissonance.  The truth of the matter is that the PowerPC 970 is the processor  Apple has been waiting for, and it's even better than anyone  anticipated from last fall's previews. It's not just 64-bit  registers, it's not just extra functional units, it's not just a  dramatic increase in clock speed, and it's not just the fastest  front-side bus ever seen on a personal computer. It's all of  those things, plus full native penalty-free 32-bit operation in  an architecture that has room to grow. At the keynote, IBM and  Apple promised to meet the performance curve in the next year,  moving to 3GHz within twelve months - a 50% increase in one  year, just as it should be.  Perfectly sane benchmarks show that, in essence, the 2GHz  PowerPC G5 has regained parity with Intel's current fastest  models. However, since the Velocity Engine in the G5 ("AltiVec"  is a Motorola trademark that is not properly applied to IBM's  PowerPC 970) still far outshines the vector processing unit in  any of Intel's chips, Apple was able to show media applications  like Photoshop and Luxology running two or more times faster  than 3GHz Pentium 4 or Xeon systems. As long as the G5 chips  gain speed approximately as fast as Intel's processors, the  Power Macintosh family should be able to maintain that  significant edge - at least until Intel gets better vector  processing units.  The PowerPC G5 is a long way away from use in iMacs or  PowerBooks. It's unsuitable for portables because it's large and  generates a lot of heat. IBM's adoption of popular  high-performance processor techniques also puts an end to  another PowerPC hallmark - using the same chip in desktop and  portable devices. The only way to get a G5 into a PowerBook  would be to step down the clock rate by about 30%, but a  PowerBook could not take heat-generating performance features  like the fast front-side bus or massive amounts of very fast  RAM. Even if the processor could fit, performance would be far  below what you'd expect from a similarly-clocked desktop unit.  Even so, it's been the professional market that's lagged. Apple  and its customers have waited a long time for this. The  year-plus decline in Power Macintosh sales must continue through  the July quarter, of course, as the Power Macintosh G5 won't be  available until August, and that's if all the schedules work  correctly. After that, though, the turnaround begins.  People who haven't been excited by new Power Macintosh G4 models  have been drooling over the Power Macintosh G5 since the moment  it was announced. It's Apple's most exciting product in years -  not just because it's a quantum leap forward in the  architecture, but because it's the first in what should be a  long line of Power Macintosh G5 computers that will each improve  upon the last.-----------------------------------------------------------------  MDJ_, The Daily Journal for Serious Macintosh[tm] Users, is  published by GCSF, Incorporated.  Publisher:           Matt Deatherage    <mattd@macjournals.com>  Staff:               Justin Seal       <justin@macjournals.com>                       Nathaniel Irons    <irons@macjournals.com>                       John C. Welch     <jwelch@macjournals.com>                       Jerry Kindall    <kindall@macjournals.com>                       John Gruber       <gruber@macjournals.com>  Copyright (c) 2003 GCSF, Incorporated.  All rights reserved.  All trademarks are the property of their respective holders  and owners.  The symbol **[D]** indicates potential conflicts of interest,  but for readability, the necessary disclaimer has been omitted  from the text. Such disclaimers may be found online at  <http://www.macjournals.com/disclaimers.html>.  MDJ_ contains news, information, strong opinion, parody, biting  sarcasm, and things you need to know.  Those easily offended  should seek information elsewhere.  Humans often answer the telephone between 10 AM and 6 PM Central  (US) Time, Monday through Friday. Voicemail is available at any  hour.  This file is formatted as setext.  For more information, send  email to <setext@tidbits.com>.  A file will be returned shortly.  It is also digitally signed using PGP technology to verify the  integrity of the transmission.    Our DH/DSS corporate PGP key maybe obtained at  <http://www.macjournals.com/pages/gcsf/gcsf_keys.html#Anchor-GCSF_DSSKey>.    GCSF, Incorporated.  P.O. Box 1021  El Reno, OK  73036-1021  (405) 262-1399  <mdj@macjournals.com>-----BEGIN PGP SIGNATURE-----Version: PGP 8.0.2iQA/AwUBPvriX7H4QMSEyVHCEQJWYQCgt4nrd7UWmUwV8A1jWNoS47HKiBMAoIrIOV4VKArD9mWioYunZfksM5EP=P6vi-----END PGP SIGNATURE-----