MWJ Analysis: Yo Ho Hokum

Annual Piracy Figures Still Based on Imagination

Reprinted from MWJ 2000.06.03

Reading news on the Internet last weekend almost certainly led you to one story or another about how software piracy is still a huge, undeniable, intractable problem for the world's software makers. A raft of stories reflected on piracy losses in countries around the world, and the message is uniformly bad: there's too much piracy in the world and it stifles software innovation. The headlines tell the story: "Over half of software used in Hong Kong pirated", "Software piracy losses in Singapore rise", "Canadian economy loses over US$660 million to software piracy", "Software makers' piracy losses rise in Middle East", and "W. Europe piracy costs software makers US$3.5 billion". The Prime Minister of Malaysia has already reacted to the idea, saying that software makers will continue to suffer from piracy until they make products at prices Southeast Asian customers can afford.

What prompted all this soul-searching over stolen software? The annual report on the subject from the Software & Information Industry Alliance and the Business Software Alliance. If "SIIA" doesn't quite sound familiar, we'll remind you that it's the result of a late 1998 merger between the Information Industry Association (IIA) and the Software Publishers' Association (SPA). The BSA, more so than SIIA, is an industry trade group representing software developers, and is actively involved in anti-piracy education programs and in trying to get those caught pirating software punished more severely.

The press release for this year's report makes the case very gravely. "Five Years: US$59.2 Billion Lost. Software Industry Suffers From Cumulative Impact of Global Software Piracy; Publisher Losses Total US$12.2 Billion in 1999." The details inside are just as grim. "The 1999 software piracy estimates indicate that more than one in every three business software applications in use during 1999 was pirated. Piracy losses for the U.S. and Canada lead every other region of the world at US$3.6 billion, or 26% of the total. The continuing problem means lost jobs, wages, tax revenues, and a potential barrier to success for software start-ups around the globe." US$975 million in lost revenue in Japan, US$165 million in both Poland and Russia, and on and on.

Small wonder the world's press gives this such attention. There's only one catch: the figures are pure imagination. The methodologies for the annual piracy survey are either masked with the labels "proprietary" and "confidential," or they're based on methods of estimating software sales that are so discredited that they've disappeared from the public discourse. The more you look at the anti-piracy survey, the less you see. Let us show you why.

The SPA Data Program

To understand why the anti-piracy numbers are invalid, we must first go back a few years to the SPA Data Program, an initiative of the Software Publishers' Association, now part of SIIA. The SPA Data Program used proprietary and confidential information from its member companies, representing a huge chunk of the US software market, to estimate overall software sales in the US and other regions of the world. Each quarter, SPA would disgorge press releases that typically showed how much the software market had grown over the previous year, helping the trade association convince lawmakers and investors that the software industry was of growing importance.

Flawed Methods

During Apple Computer's troubles in 1996 and 1997, these reports were devastating to the perception of the Macintosh business. Each quarter, SPA told the world--and an attentive trade press--that Macintosh sales were tumbling over the prior year. For Q2 1996, Macintosh software sales were said to be down 21.1% (MDJ 1996.09.26). For Q3 1996, down 36% from the same period in the prior year (MDJ 1996.12.23); a 28% decline in Q4 1996, and a 24% drop for the year 1996 over 1995 (MDJ 1997.04.02). The figures, repeated in the nation's business and computer press, told the world that the Macintosh market was falling apart and none too slowly, increasing public perception that Apple Computer was nearing the end and further slowing needed sales.

But when you looked more closely, the numbers refused to add up. In early 1996, SPA told the world that Macintosh software sales for 1995 totaled just over US$1 billion, a total the organization listed as "down 13.4%" from 1994 levels. In early 1997, however, SPA said that worldwide Mac OS software sales for 1996 came to US$1.170 billion--a "24% decline" from 1995 sales. Wondering how moving from US$1.0574 billion to US$1.170 billion is a 24% decline? That's easy--in early 1997, SPA quietly revised the numbers for 1995. When SPA released a new round of figures, the group also included figures from the same period in the prior year so everyone could calculate the comparisons for themselves. The problem is that these "year-ago" figures usually differed from the figures for the same market segments originally released a year earlier.

In this case, in 1997, SPA released new figures that said the Mac OS software market for 1995 totalled US$1.5427 billion in sales, not the US$1.0574 billion that had been announced in early 1996. When the 1997 figures of US$1.170 billion for 1996 was compared to the revised figure of US$1.5427 billion for 1995, it indeed looked like a 24% drop. If you compared what SPA originally said for each year's sales figures, the Mac OS market showed real gains in almost every period where the SPA had happily announced it was declining.

This turned out to be the most egregious of several fatal flaws in the SPA Data Program. Although SPA crowed that its members accounted for 85% of North American software sales (bizarrely restricting "North America" to north of the Rio Grande), the group refused to reveal how many companies participated in the quarterly software surveys--and before MDJ raised that as an issue, the group's non-US data program press releases routinely listed participation as small as thirty companies that it would not identify. If several of those companies had no significant Mac OS sales--like Borland or Oracle--the overall results would be skewed. Data from member companies was used as submitted with no outside verification--if a company claimed that 100,000 hybrid CD-ROM sales were all Windows sales, SPA had no data foundation to question such a report. Also, since SPA membership at the time started at US$750 per year and rose as high as US$50,000, depending on annual software sales, many of the smaller Mac OS developers doing the most innovative work (like GoLive Systems, later purchased by Adobe) were probably not members. SPA refused to release enough information for anyone else to decide if the reporting sample was useful or not, calling the numbers "proprietary."

Weak Explanations

What about those changing numbers? In an October 1996 interview with MDJ, SPA spokesperson David Phelps said that as more accurate information came in to SPA after the original reporting deadline, the organization naturally incorporated that data into its models for future use. In other words, if a company had significant Mac OS utility sales but failed to report them to SPA before the initial deadline, SPA would release figures excluding those sales because SPA knew nothing about them. If the company later reported them, SPA would add those sales to the totals--but never issue any kind of advisory that the numbers had been revised.

SPA tracked software sales across more than a dozen categories. Several of these categories--word processors, spreadsheets, databases, integrated programs--showed almost no change from the initial reports to the revised versions a year later. Other categories were not so stable--in our example of 1995 Mac OS sales as reported in both 1996 and 1997, the 1997 figures were twice what the 1996 report said. Similar large gains between reports were recorded in personal information management, home education, and drawing and painting software categories. We also found that other software categories had similar gains a year later, including most Windows software categories and even some DOS and UNIX sales. The difference was that the revision changed those figures into larger gains. Only for Macintosh software did the revisions change the figures from losses to gains, a difference in sign that made all the difference in the world to public perception.

When the issue became more public, including stories in MDJ and MacWEEK, SPA changed its tune and vehemently denied that any figures were "revised." Comparing what were essentially preliminary figures to later, more mature figures is inherently invalid, and SPA did its best to portray its actions as not "revising" the figures, though that is plainly what the organization was doing. In April 1997, SPA research director Jim Sanders told MacWEEK that any differences were due to unit shipments: "The whole point of a shipment methodology is that you have adjustments, it's the difference between sell-in and sell-through." As we said at the time, "Sell-in is the number of units software companies sell to distributors and dealers; sell-through is how many of those actually make it into customer hands. If inventory is high at dealers and distributors--warehouses full of unsold products--then sell-through is lower than sell-in. Sanders was saying that the earlier numbers measured sell-in while the year-later numbers more accurately measured sell-through, but he weakened his own case by admitting this. Comparing sell-in to sell-through is invalid by definition." (MDJ 1997.07.21) (The MacWEEK article was originally available online, but the URL has changed over the years, and MacWEEK.com's archive was not responding for us at press time.) Sanders didn't believe there was a problem, anyway, because he told MacWEEK that other analysis showed "the Mac platform is in decay." He had no studies or evidence to support this--just the empirical notion that the Macintosh was in trouble, fueled in part by his own group's flawed software studies.

Most people saw through this insufficient explanation, and SPA remained on the defensive in April 1997. At one point, Sanders posted an open letter to similar criticisms on Joe Ragosta's now-gone Web site, this time returning to the first explanation: the later data is more accurate because it reflects late-reported sales and returns that happened after the first reporting deadline. But that went back to the first problem: comparing mature date to immature data is never valid. Sanders, faced with this fact, chose to blame SPA's critics: "The advantage of this methodology is that we are always dealing with the most recent and most complete information about sales for the period. However, it is confusing to readers who are not familiar with the methodology."

Sanders also told Ragosta that the SPA always included "an analysis of sales by non-participating companies to create a total industry analysis." Praytell, how did they accomplish that? There are only two choices: SPA either extrapolated from its existing data, magnifying any errors in the sample, or it used a set of assumptions about how large the overall market was and what percentage their sample represented. In that latter case, the assumptions and calculations should have been made public so people could decide for themselves if the methods were useful. SPA refused.

Two weeks later, MacWEEK ran another story on SPA numbers with damning quotes from member companies, who told the magazine that SPA would ask for future projections and use them even though they later turned out to be incorrect. SPA answered with a public FAQ about the data program, which included this gem: "Each year's press releases have been constructed to stand alone and compare only to the previous year's data contained in that release. Differences in data from an earlier press release do not indicate that revisions have been made to the sales data. The program has been designed to create a snapshot of software sales performance, using current participants and comparing their performance against their own last year's performance. The audit approach provides particularly meaningful percent changes between periods as well as overall market condition. The SPA Data Program is not designed to be a time series."

In other words, SPA massaged the data for each press release to fit the companies that happened to report for that time period, using confidential methods (and assumptions inherent to them) to adjust for companies that reported for one period but not another. Despite Jim Sanders explicitly telling critics that later numbers reflected later and more accurate data, the FAQ again denied this, saying changes were due to "construction." That construction, by the way, made comparing any press release to any other "invalid." What SPA was admitting, but trying not to directly say, was that the numbers they were releasing had no basis in reality. You couldn't tell if the real numbers for 1995 were US$1 billion or US$1.5 billion or any other number because they were altered, every quarter, to fit SPA's idea of the current quarter's figures. Sanders told MacWEEK, "We are being asked to do what we don't do--which is provide a trackable data set that can be used to describe the software market at any particular time." But SPA's own press releases made exactly those claims: absolute figures for software sales that the group trumpeted to show how big and important the software market was.

The house of cards finally tumbled in July 1997, when SPA "suspended" the data program. An announcement on the group's Web site at that time said, "The SPA Data Programs are currently suspended. SPA is using this time to determine new ways to ease reporter burden while maintaining the shipment- and country-level data participants have come to expect. Details will be posted about a new software sales tracking program as they become available." "Ease reporter burden" is code: the organization wanted reporters to take its word for software sales numbers, as they had until late 1996 when it became obvious that SPA was playing games with the figures to suit its own purposes. In the process, its data program proved devastating to the same Macintosh developers who the industry purported to represent.

That was nearly three years ago. No new data program has appeared to replace the old one.

The Anti-Piracy Survey

So why resurrect this ghost of bad numbers past? Because, three years later, the spirit of the SPA Data Program is not yet dead. The massive news generated last weekend by the annual anti-piracy survey has, at its heart, the same flawed model that doomed SPA's efforts to boast about the software industry--but this time, the errors are magnified again and again to the point that we seriously wonder whether the numbers have any relationship to reality at all.

Measuring illegal activity is always a difficult task, since by definition those responsible for it really don't want to stand up and be counted. The 1999 anti-piracy survey, conducted for SIIA and BSA by the International Planning and Research Corporation (IPR) as in years past, has as good an idea as any. IPR proposes each year to measure the demand for software worldwide, broken down into several demographic groups. It then proposes to measure how much software was actually sold in 1999. The difference, if any, is piracy--the number of programs in use above what can be shown to have been purchased. After that premise, however, IPR's methodology goes downhill fast.

Measuring Demand

IPR's way of measuring software demand is flawed from the start. As it says in the report, "PC [unit] shipments for the major countries were estimated from proprietary and confidential data supplied by BSA and SIIA member companies. The data was compared and combined to form a consensus estimate, which benefited from the detailed market research available to these member companies." Right away, we're in trouble. BSA and SIIA members, aside from Apple Computer, are not typically hardware makers. They're software vendors, but they're being asked to supply hardware figures for the countries covered by the anti-piracy survey. IPR doesn't go to leading market research firms like Dataquest or IDC, whose methodologies are generally known.

That's probably because those figures are public, and that might give away a potentially embarrassing assumption. After estimating worldwide PC shipments, IPR then attempts to calculate the number of software applications "installed per PC shipment." This sounds like it means the number of applications bundled on each computer, but it's really just the number of applications installed on each new PC, whether they were bundled or user-installed. The key number here is the total number of applications installed Disguising it as a ratio to PC sales doesn't change that the total number of applications on each machine is what will determine the overall piracy rate, once the number of applications sold is subtracted from it. And where does IPR get this all-important number? "From market research provided by member companies," the same way the old SPA Data Program used to work.

That's bad enough, but it leaves other questions open. Specifically, do these estimates of applications shipped worldwide include software from non-US vendors? Imagine a new PC sold in Japan. It probably comes with some US-made software and some Japan-made software, but Japanese companies aren't members of BSA or SIIA. If the estimates don't include non-US applications, the demand estimates aren't realistic for non-US regions. If they do, there are issues on the supply side, as we'll explore later.

Once IPR has these numbers ready to form ratios, they break them down into several demographic categories: home vs. non-home usage, new PCs vs. "replacement" PCs (those replacing older computers as upgrades), one of five levels of "technological development" classes to measure how sophisticated the computer users may be, and one of three software categories. That, too, is problematic. The study says, "Piracy rates can vary among applications. Grouping the software applications into three tiers and using specific ratios for each tier further refined the ratios. The tiers used were General Productivity Applications, Professional Applications, and Utilities. These were chosen because they represent different target markets, different price levels, and it is believed, different piracy rates." Already it looks like assumptions are influencing the study: IPR's working makes it look like it is dividing estimated demand into groups because it believes one group has higher piracy rates than another. That's what a study like this is supposed to determine, not what it should start out believing and try to prove.

The study goes on, "From this work, a total software applications installed estimate was calculated for a large set of major countries and 'rest of region' areas totaling worldwide shipments." Which countries were those? "These countries and areas are those that were covered in the SIIA data program." In other words, the SIIA and BSA member companies--all corporations that have a vested interest in tougher anti-piracy laws--come up with the estimates of how many applications are actually shipped in a given year, whether sold or not, using "proprietary and confidential" information that is beyond anyone's review or sanity checking. Some of the ratios calculated are also apparently determined because of belief that those particular market segments suffer from higher piracy rates, poisoning the survey from the start.

Measuring Supply

So if the measurement of demand is already so tainted, can the measurement of supply be much worse? Oh yes--much worse. Here it's best to let the methodology in the study speak for itself:

For the 1995 and 1996 piracy studies, the main source of data for software shipments was the SIIA Data Program [known then as the SPA Data Program--Eds]. However the SIIA Data Program ceased operation in 1997. The challenge for the subsequent studies is to replace that data source.

Our approach was to utilize the member companies of BSA and SIIA to develop piracy study sponsors who would volunteer their proprietary shipment data to the study under non-disclosure agreements for the purpose of constructing an accurate estimate of the software industry's 1999 shipments. This became the primary source of software shipment data.

Because the SIIA Data Program was active until early 1997, we could continue to use the SIIA Data Program as relatively fresh estimates of the software industry, and could use it to gage the results of the data collection from our sponsors in determining software shipments for the total industry. This became our crosscheck and led us to believe that our data collection provided reliable and consistent estimates.

Make sure you understand this: the invalid, discredited, forced-out-of-business SPA Data Program is the anti-piracy survey's model. The study used it for its first two years, and once it was dismantled for being unrelated to reality, the study basically adopted the same methodology. IPR gets confidential information from the same companies that contributed to the SPA Data Program. Much like that program, the deadline for one year's data is shortly after the beginning of the next year. SPA used to release data for one year in April of the following year; the anti-piracy report has come out in May. Will the reporting companies fill in gaps later and double their sales numbers in some categories, as they did for some segments of the Mac OS software market? No one knows--and since IPR doesn't revise each year's report later, as SPA used to do, no one is likely ever to know.

But it gets worse. The BSA and SIIA members are, for the most part, US software companies. They don't reflect worldwide software sales except to the extent that consumers in other countries buy a lot of US-made software (like Microsoft Office, AppleWorks, and so on). If a consumer in Japan purchases a computer and several pieces of Japanese-made software, the BSA and SIIA numbers can't track those applications because their member companies didn't make or sell that software.

How does IPR adjust the SIIA Data Program-inspired figures to reflect worldwide demand? IPR fudges them, that's how. The survey firm applied two "uplift" factors to the number. One is intended to adjust the numbers from just the responding companies to represent all US software publishers, and a second adjusts figures from US software publishers to those shipped by all software publishers. These factors are vital to the survey, for they reflect IPR's assumptions on how much of the world market their responding companies represent--and they're secret. Just as you're not allowed to know how few companies are responding, you're not allowed to know how far that data is extrapolated.

The international problem may rear its head again at this point. If the demand figures omit non-US software, then including an estimate of non-US software production in the supply estimates would make the piracy numbers far too low for countries other than the US. If the demand figures do include non-US applications, as we suspect, then the piracy estimates for those countries are almost entirely dependent on IPR's secret and unexaminable estimate of how important the responding companies are to worldwide software sales. Either way, these widely-reported figures of rampant piracy, particularly in Southeast Asia, are entirely dependent on estimates that no third party can examine or validate. That's not a story that should get major news coverage.

The Fatal Flaws

There is no evidence that IPR, BSA, or SIIA intends to deceive anyone with this survey. Even so, the numbers have no integrity. Every key metric in the survey is either based on a proprietary estimate, or on private and "confidential" information provided in the style of the SPA Data Program. History proved that method to be inaccurate and unsustainable when examined more closely. BSA and SIIA member companies pull an estimate of the number of software applications used worldwide out of a hat, and then they pull a similar estimate of how many were sold out of another hat. The difference is "piracy," and for five years in a row, the figures support BSA and SIIA legislative agendas for stricter piracy controls, especially in Southeast Asia.

For anyone to take these figures seriously, be they press or policy makers, the groups involved need to be much more forthcoming about how these estimates are constructed, about how many companies are participating in the survey (so independent observers can judge if the sample size is large enough to be significant or not), and about how stable the data turns out to be over time. Until then, you can safely look at the news stories about BSA and SIIA piracy information and laugh them off. The methodology involved is so flawed that no level of accuracy can be presumed for the data. Piracy may be a serious global problem, but this study doesn't prove it.


GCSF, Incorporated
P.O. Box 1021
El Reno, OK 73036-1021
(405) 262-1399
mwj@macjournals.com

Copyright © 2000 GCSF, Incorporated. All rights reserved. All trademarks are the property of their respective holders and owners.

See more MWJ Samples

Learn more about MWJ

Get a free three-week trial subscription to MWJ

Subscribe now to MWJ

© 2008, GCSF, Incorporated. All Rights Reserved.
This page last modified on Sunday, March 9, 2008 1:28 AM