Analyst Merv Adrian takes on the question, “What, Exactly, Is ‘Proprietary Hadoop’?” in a Gartner blog entry, writing about the popular, open-source, parallel-processing software framework. The question is broader, however — What, Exactly, is “Proprietary Software” — and the answer is definitely NOT, “Anything Not Open Source.” That’s because open-source software, as formally defined, is subject to owner-imposed license restrictions and is therefore proprietary.
I’ll develop a supporting argument. Let’s start with the software in question, Apache Hadoop. You can acquire Hadoop not only via download from the Apache Web site — Hadoop is an Apache project — but also packaged with non-core, added-value elements in distributions provided by entities that include Cloudera, Hortonworks, and MapR. So Merv cites the Merriam-Webster definition of “proprietary” and proposes an answer, that proprietary Hadoop be defined as “‘distribution-specific’.”
The Distribution Wars have long raged in the Linux world (or maybe that should be the “GNU Operating System world,” and No, I don’t wish to take on the word “free”). Hadoop is merely a newish battlefront. If you are interested in an overview of Hadoop distros, check out, for instance, Jeff Kelly’s Wikibon write-up, Hadoop: From Innovative Up-Start to Enterprise-Grade Big Data Platform.
Merv Adrian exploits a teachable moment. He examines the varying senses, and misapplications, of the word “proprietary,” and states that “the use of ‘proprietary’ [in referring to Hadoop distributions] is both inaccurate and not to the point.” I agree, and will pile on: The use of “proprietary” here is valueless. As I’ve stated, Hadoop is itself proprietary, regardless of distribution.
Consider the Merriam-Webster definition of “proprietary” that Merv cites: “‘Something that is used, produced, or marketed under exclusive legal right of the inventor or maker’.” As an Apache project, “The source code of the Apache™ Hadoop® software is released under the Apache License,” according to the Apache Wiki. That license describes terms, imposed by the “inventor or maker” (namely the Apache project), on any use, production, or marketing of Hadoop. It also governs derivations and packagings, namely Hadoop distributions.
I stated that all open-source software is licensed and therefore proprietary. My touchstone is Open Source Initiative’s formal definition of “open source.”
The Apache project sees it that way. Note the trademark and copyright symbols in the text I quoted above, referring to “the Apache™ Hadoop® project.” These symbols are marks of ownership — assertions of exclusive legal right to control the use of the marked terms — restrictions on your uses of the terms. Hadoop is, itself, proprietary, as is Apache, as is open source.
In the software world, “proprietary” does not mean “specially packaged,” nor does it mean “non-open source.”
There is an opposite of “proprietary.” The opposite of “proprietary” is “public domain.” Intellectual property that is in the public domain is simultaneously public property and no one’s property: You can do anything you want with it, short of claiming ownership and the right to impose restrictions on others’ use of it.
Now let’s get back to what’s important to business. As Merv puts it, “Consumers need to buy what will work, and want to buy what will be supported.” Regarding Hadoop distributions and anything else that’s for sale (or available for free download), our focus should be on usefulness and value.