Beyond Thesis: Does the GPL go too far and what makes a derivative work?

After the analysis of several people (including myself) about the inclusion of GPL’ed WordPress code, I think the debate over Thesis has mostly subsided with Chris on the losing end.  However, the reason the debate was so huge has a lot more to do with what people thought Thesis was doing and why they felt it should (or should not) be subject to the GPL.  Although it turns out to be a poor test case, the fact remains that there are several heavily grey areas in the GPL, especially when dealing with dynamic, object-oriented code.  Linux module developers have been dealing with these issues for a long time and there still remains a lot of questions.  As a disclaimer once again:

  • This post now has nothing to do with Thesis. I’ll talk totally in abstract and simple examples to try and explore the GPL
  • I am a developer, not a lawyer, and I intend to look at these issues from a technical perspective.
  • The GPL is a license.  It deals with copyright law.  It defines who is allowed to copy & distribute the software (everyone).
  • The GPL doesn’t prevent you from charging for the software, but anyone you sell it to inherits the copyright and can then copy & distribute it as they see fit.
  • The GPL has been tested in court a few times. The biggest issue is that these cases have dealt really with embedded systems where they should have been providing the source, but weren’t. See Harald Welte vs. Sitecom, gpl-violations.org vs. D-Link, and BusyBox vs. Monsoon Multimedia.
  • The GPL stood up well in these tests. However, they did little to answer the fundamental question of what constitutes a derivative work of source code when dealing with dynamic & scripted code.

What is a derivative work?

Basically, anytime you copy and modify something that is copyrighted, you are producing a derivative work. Let’s start with what US copyright law defines as a derivative work: 17 U.S.C. § 101 says ”

“derivative work” is a work based upon one or more pre-existing works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted.”

It seems clear and obvious that source code can produce a derivative work, the key being is it “recast, transformed, or adapted“? Please note that these are legal terms, not computer coding terms (which don’t mean the same thing). US Copyright Office Circular 14: Derivative Works goes on to say:

A typical example of a derivative work received for registration in the Copyright Office is one that is primarily a new work but incorporates some previously published material. This previously published material makes the work a derivative work under the copyright law. To be copyrightable, a derivative work must be different enough from the original to be regarded as a “new work” or must contain a substantial amount of new material. Making minor changes or additions of little substance to a preexisting work will not qualify the work as a new version for copyright purposes. The new material must be original and copyrightable in itself. Titles, short phrases, and format, for example, are not copyrightable.

So what’s clear here is that 1) You must have incorporated the original material in some way and 2) The original material must be significant (short lines of code, common tidbits, etc don’t count). This is where the Abstraction/Filtration/Comparison test comes in.

Now let’s look at what the GPL says is a derivative work. My examples will be with regards to writing plug-ins, themes, and extensions. This is where the Thesis debate was originally derived and where there still exists a lot of great area. From the GPL FAQ‘s:

If the program dynamically links plug-ins, and they make function calls to each other and share data structures, we believe they form a single program, which must be treated as an extension of both the main program and the plug-ins. This means the plug-ins must be released under the GPL or a GPL-compatible free software license, and that the terms of the GPL must be followed when those plug-ins are distributed.

If the program dynamically links plug-ins, but the communication between them is limited to invoking the ‘main’ function of the plug-in with some options and waiting for it to return, that is a borderline case.

Even the GPL writers acknowledge how grey this area is. As we’ll see, I think their interpretation already over-reaches the legal limits of copyrights. It’s also important to read the FSLC’s opinion with regard to WordPress Themes:

The PHP elements, taken together, are clearly derivative of WordPress code. The template is loaded via the include() function. Its contents are combined with the WordPress code in memory to be processed by PHP along with (and completely indistinguishable from) the rest of WordPress. The PHP code consists largely of calls to WordPress functions and sparse, minimal logic to control which WordPress functions are accessed and how many times they will be called. They are derivative of WordPress because every part of them is determined by the content of the WordPress functions they call. As works of authorship, they are designed only to be combined with WordPress into a larger work.

Some Simple Test Cases

So let’s look at some simple code examples and see where things break down. I’ll provide three pieces of code:

Able has written:

#Able has released this code under the GPL
class Foo
  def perform_work
    puts "Able's Foo has performed work"
  end
end

Baker has written:

#Baker has licensed this privately and very strictly
class Foo
  def perform_work
    puts "Baker's Foo has performed work"
  end
end

Charlie has written a

class BigFoo < Foo
  def do_it_all
    perform_initial_work_by_charlie
    perform_work #Who's work?
    perform_cleanup_work_by_charlie
  end
end

First a sticking point: WordPress calls include() to include a theme. My example has the supposedly derivative work calling the dependency. I don’t believe the flipflop is relevant because the GPL FAQs use the same answer for both scenarios. I’ve done it this way for simplicity and ease-of-understanding.

So is Charlie’s work derivative of Able’s? It’s dependent on either Able’s or Baker’s to be sure. but derivative?
The key argument for themes is “They are derivative of WordPress because every part of them is determined by the content of the WordPress functions they call”. This is almost entirely the definition of inspiration. What seems to matter is “Did Charlie refer to Able’s code or Baker’s code when developing his own code?”. However, I believe this argument breaks down as well.

Inspiration

When you’re talking about a creative work, inspiration is a key factor. Fan fiction runs up against this wall. Fan fiction is derivative because someone else’s intellectual property (their characters, settings, plot lines, etc) have been used to create your work. Fan fic has been consistently found to be derivative and subject to copyright claims. But there is a very big difference between Charlie’s work and a piece of fan fiction. When you sell a fan-fic book, the original author’s characters are actually incorporated into what you’re selling. However, Charlie’s work does not significantly incorporate any bits of Able’s code, at least at the time of distribution. The key is that the in-memory combination occurs with the end user, who is not distributing ANY code and therefore is not required by the GPL to do anything. What you incorporate into your distribution is really the issue, not inspiration.

Distribution

This becomes a major distinction. Charlie is not selling or distributing Able’s source, only his own. The incorporation must happen at the time of distribution. With non-compiled source code, the incorporation happens with the end user when they run the software. So here is a more apt analogy: Monopoly the board game. Let’s say I create my own board for a board game, with its own set of rules and no shared ideas (beyond those common to all board games, such as having a big card board piece on which you play). Now I wish to sell this board and only the board. In order to use my board game, I instruct you the end user to use the pieces from a Monopoly game because they work very well with my game. My game was even designed with using these exact pieces in mind (I was inspired by Monopoly pieces, although other pieces COULD work…that’s totally up to the people playing). However, you must obtain the Monopoly game on your own. My board game will not work without Monopoly pieces, so it is dependent but not derivative. The two pieces are combined at someone’s kitchen table to produce the final product. So it is clear that having distributed only my own original work, regardless of where my inspiration came from, no copyright is inherited from the original product.

Interoperability

So Charlie’s code contains no code from Able. Charlie wrote it all himself. Copyright law seems to stop right there. Having failed a test for substantial similarity and a lack of distribution of anything under the GPL, the GPL can exert no copyright over Charlie’s code. In fact, this is a copyright issue that has been settled many times. In Lewis Galoob Toys, Inc. v. Nintendo of America, Inc., or the GameGenie case, the court decided that interoperability does not create a derivative work. Much like our coding examples and WordPress themes, the GameGenie wouldn’t work without a Nintendo. It was once again clearly dependent, but not derivative. This argument becomes even clearer when we refer only to binaries.

Summary

I think in the end it’s clear that the GPL over-reaches in its determination of what constitutes a derivative work. This is where the “infectious” moniker comes from. It tries to ‘claim’ copyright over completely original works that it has no authority over. My examples deal with the very precise cases of dynamic scripting (non-compiled) code, distributed via source only, where no shared code exists between the GPL’ed and non-GPL’ed product, so please be cautious about reading into the too much with regards to things like Linux Kernel Modules or the like. Finally, I’m really enjoying the overall debate (I do love the GPL), and the great attitude everyone I’ve talked to (on both sides) has taken. Let’s continue that, for the good of all OSS.

Advertisements

An analysis of GPL’ed code in Thesis

Introduction

There has come to be a huge debate regarding whether the Thesis WordPress theme can be premium licensed when the WordPress code itself is released under the GPL.  The GPL requires that any ‘derivative work’ must also be licensed under the GPL, so the raw question is whether or not Thesis is a derivative work.

First some disclaimers:

  • My blog is hosted on wordpress.com.  I even pay them a little for some premium features.  But I otherwise have no affiliation with either WordPress or Thesis
  • I’m not a lawyer, I’m a developer.  My views here are my own and are based on TECHNICAL knowledge and experience with the GPL, not on the law (which astute observers will note often does not reflect real life)

So is Thesis a derivative work?  Wordpress and the Software Freedom Law Center think so.  But their claim is based on

“The template is loaded via the include() function. Its contents are combined with the WordPress code in memory to be processed by PHP along with (and completely indistinguishable from) the rest of WordPress. The PHP code consists largely of calls to WordPress functions and sparse, minimal logic to control which WordPress functions are accessed and how many times they will be called. They are derivative of WordPress because every part of them is determined by the content of the WordPress functions they call.”

This seems extremely far reaching.  My viewpoint is based on Why the GPL does not apply to premium wordpress themes.  The long and short is that SFLC’s opinion could be applied to any software that runs on Linux.  Meaning you could never have a closed-source software product running on the linux kernel (“Oh, your code calls fork()? GPL!”).  It is commonly accepted that simply integrating with an existing product does not produce a derivative work.  If your code is totally your own, the GPL has no say over how you license it.  This is actually an argument about fair-use far outside just the GPL and has been settled on many different topics including OEM car parts, Nintendo, iPod connectors and other questions of being allowed to build something that interoperates with someone else’s product.  Per the GPL itself a derivative work is: “a work containing the Program or a portion of it, either verbatim or with modifications” so you must copy actual lines of code from their source to be a derivative work. Simply calling WordPress functions doesn’t cut it.

The Question

The problem is then simply one of analysis: Does Thesis contain code from from WordPress?  I wrote a quick script to find out.  Here are the basics:

  • The script takes every line of WordPress source and puts it into an in-memory hash.
  • Every line is lower cased and has all whitespace removed to prevent missing matches from simple indentation or capitalization changes
  • This hash is then checked against every line in the Thesis source
  • It checks only PHP files (for simplicity…avoiding images and such)
  • It excludes lines less than 20 characters long: This could cause it to miss matches, but also helped to filter a lot of stuff like ‘<?php‘ lines
  • It will fail to find code lines that have been modified

So the short of it is that the script can easily detect wholesale copying, but can’t prove that code wasn’t copied and then modified.  However, I think it serves its purpose.  See the bottom of the post for the code (GPL’ed) and instructions on running it yourself.  The results are extensive because many small lines are similar, although many are insignificant.  For example:

-- Match from <wordpress/wp-admin/export.php>:27 to <thesis_17/lib/admin/options_manager.php>:14 --
if (isset($_GET['download'])) {

This is a line you would see in any PHP code where you need to check if the download parameter was in the GET request.  It shows up as a match, but is irrelevant.  In the end, some common sense and technical knowledge must be applied to know if the results are significant.

The Results

My conclusion is that Thesis does contain GPL licensed code from WordPress.  There were several examples that fit, so I’ve chosen the strongest one here that is sufficient to show that the code has been reused.  ONE OF the functions in question is:

wp_list_comments from wordpress/wp-includes/comment-template.php:1387
thesis_list_comments from thesis_17/lib/classes/comments.php:169

And you can see a comparison of the exact matching lines: http://gist.github.com/477051

Not every line is sequential, but these two functions match pretty closely.  I feel comfortable that this section of code is very clearly in the GPL, and so I am posting a portion of it here:

WordPress/Thesis diff

A section from the start of the function in WP was removed.  But then the rest of it is nearly identical.  Where the lines do not match exactly, their differences are insignificant and clearly show that the original method was copied as a template.

So what does this mean?  I’d say it’s clear that Thesis uses GPL’ed code from WordPress and is therefore subject to the GPL as well. This makes the whole issue of whether “calling functions” or “running in memory with” requires the code to be subject to the GPL completely irrelevent.   Whether or not this means the ENTIRE Thesis codebase must be GPL’ed I can’t say.

The Code

The Perl script for doing this comparison is available at:  http://gist.github.com/477060.  You must have a subdirectory with the WordPress source called ‘wordpress’ and a subdirectory with the Thesis source (which is not freely available, so you must have paid for it) in ‘thesis_17’.  I used the Thesis 1.7 source code and WordPress 3.0 (latest) for my analysis.

Closing

I don’t have any cruel feelings towards Chris Pearson and this isn’t about who is a jerk or any of the other flaming going on.  But as developers, it’s critical that you are very careful when using and re-selling code to follow the license agreements that we all adhere to.  Thesis clearly has a ways to go in this regard.

I encourage constructive comments, forking of my code, and additional analysis.  If you’re interested in how the GPL over-reaches in cases where there is no copied code, please read about What is a derivative work?