Open Data Licenses – Managing Risks to Maximize the Opportunities

9 minute read

While not new, the use of “open data” is on the rise as teams everywhere look for existing low-cost datasets to facilitate their projects without the need to personally incur the time, expense and internal approval process of sourcing and structuring such information. Despite the perceived flexibility that the name “open data” suggests, the multitude of license types and novel use cases for such open datasets magnify the impact that a license term could have in the commercialization of data-intensive technologies, especially in the creation of proprietary artificial intelligence algorithms and applications. The diverse licensor pool further underscores the permeating nature of open data to the modern organization, with government agencies, nonprofits and for-profit entities increasingly publicly offering data in innumerable categories. Some licensors currently shoehorn their open datasets into existing open-source software or other licenses for copyrightable works. However, such form terms are often not best suited for open data licensing and omit unique considerations of the licensed subject matter. In addition, the many different forms of licenses that may promote “open data” principles on their face may, in fact, contain significantly different requirements that lack the clarity or predictability that comes with mass use of a smaller set of form licenses for consistent purposes.

This update discusses open data basics, highlights common license types and terms, and raises initial discussion points for your team to manage associated risks.

What Is Open Data?

There is no one universally accepted definition of “open data,” though many of the same principles from the open-source software realm are brought over by licensors and organizations for offering open data licenses to promote flexibility and transparency. For instance, the Open Knowledge Foundation defines open data as “data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.” Meanwhile, the Government of Canada defines open data in a narrower way and along slightly different lines, identifying such information as “structured data that is machine-readable, freely shared, used and built on without restrictions.” Commercial entities sometimes interpret the term differently and consider data as “open” where they are making it publicly available without a fee—even if the associated license contains restrictive terms and obligations. This lack of a uniform definition makes the evaluation of applicable license terms a critical initial step before utilizing any new dataset to ensure the terms and use expectations are aligned and do not lead to unanticipated burdens.

Why Use Open Data?

Open data has immediate and broad appeal. Depending on the license type and nature of the dataset, it can serve as a source for free, easily accessible, structured datasets. Dozens of websites, such as the World Bank, Kaggle and the Registry of Open Data on AWS, make finding a relevant dataset easy. To demonstrate a common use case, if a product team is considering a new product feature requiring a new data stream, rather than waiting the weeks or months it may take to obtain internal approvals and necessary user permissions, collect new data, refine such data in a usable form and evaluate its viability going forward, the use of open datasets allows developers to bypass most of these steps and start testing feasibility and viability. However, this ease of access also creates a larger surface area through which open data licensing issues could enter an organization, making the understanding of licensing terms and a coordinated approach to evaluating licenses and their impact particularly important to any organization.

What License Terms Apply to Open Data?

Licensor goals and motivations in contributing open data can vary significantly, which has led to a wide variety of license terms that reflect such objectives and created a sliding scale of “openness”. In some instances, organizations have drafted new types of open data licenses to tailor the legal terms to the nature of the licensed material (data) and/or with the hopes of setting a new standard.[1] In other instances, licensors have chosen to license their data under preexisting licensing regimes and simply added “data” to the scope of licensed materials. For instance, the European Commission adopted certain Creative Commons licenses to license their data to the public after finding they offered optimal flexibility and ease of use, a set of licenses which have long been used by creators to publicly license open-source software and other types of copyrightable works. While governments are more typically allowing the use of their open datasets under license terms with few restrictions, private entities are often creating restrictions on users of open data or differentiating in their terms between commercial uses and noncommercial uses.

Key Terms to Look Out For and Usage Risks to Consider

While many forms of open data licenses exist, certain core concepts appear in many license types and must be considered in conjunction with your use case. Particularly if a prospective licensee intends to use open data in a commercial context, such as for the purpose of developing or offering its own products and services to third parties, such user should pay special attention to certain common license restrictions that may impede or entirely restrict the contemplated business model:

  • Use Restrictions – Some licenses restrict the right of the licensee to use a licensed dataset for commercial purposes (including for internal business purposes). Certain other licenses permit the use of the licensed data solely for specific purposes, such as for “analysis by a computer” only.
  • Derivative Works Restriction – Certain licenses prohibit the sharing of derivative works based on a licensed dataset, which significantly limits how a licensee can seek to leverage a given dataset in its organization or products, as datasets are rarely useful in original form.
  • ShareAlike Terms – Where a licensee of open data seeks to share its own datasets (which contain such licensed, modified open data) with third parties, many such open data licenses require the licensee to utilize the same permissive terms (or a compatible license) for such third-party sharing. Such obligation can limit the ability of a licensee to effectively commercialize a proprietary database that includes information from an open dataset that is used in a way that triggers share-alike terms.
  • Attribution Requirement – Open data licenses commonly require some form of attribution to the licensed data source where a licensee further distributes open data. By comparison, while attribution requirements in the open-source software realm frequently lead to the use of a separate page or app menu tab containing such information, data attribution requirements can be more demanding in practice and can require attribution in the event a user shares a white paper or presentation slides containing statistics calculated by analyzing the dataset. Some open data licenses even require a certain level of licensee effort to effectuate actual notice.

Steps for Controlling Risk and Protecting Your Organization

As with the use of internal open-source software policies, thoughtful implementation of open data usage strategies and policies can help your organization navigate open data risks: 

  • Inbound Open Data Usage Approval Process – While open-source software policies may typically solely target an audience of software developers, open data policies must recognize such data could enter your organization through a broader pool of individuals, such as business analytics teams, product managers or data scientists. Such dynamic likely necessitates more extensive messaging throughout your organization and a more formalized approval structure recognizing the various sources where such needs could arise among your teams. Our team is happy to apply our experience to help you develop practical policies and personnel communications that manage risk while enabling your business goals.
  • Policy Documentation and Requirement Tracking – While a central approval mechanism will provide some accountability within an organization, documentation of open data license approval criteria and input from outside advisors (such as technical and legal experts) will ensure your organization is consistently evaluating such use on relevant criteria and within the bounds of an accepted risk tolerance. Teams should ensure tracking mechanisms are in place to tag data sourced via an open data license so that applicable license terms are recognized throughout the data life cycle and surfaced for new internal users. Tracking open data requirements in an ongoing manner that accounts for the diverse array of teams that may come into contact with such open data is a recommendation for controlling open data risk. Our team can help you develop new policies to address your use of third-party open-source materials (including open data licenses) from scratch or to augment your existing open-source software policies and practices.
  • Periodic Auditing – As with open-source software and personal data, organizations should periodically audit the sources of open data that are used within its material datasets, especially if used to train proprietary algorithms or where such datasets are shared externally. Your review of audit results also offers an ideal opportunity to revisit your internal policies and consider whether updates to reflect your latest business objectives, new licensing trends, or recent case law are appropriate. Our team can help you set audit parameters and goals, connect you with technical experts to assist, and provide legal analysis to identify and remediate legal risk.
  • Outbound Open Data License Review Process – If an organization is considering releasing its own dataset pursuant to an open data license, a multidisciplinary discussion should occur in order to identify goals, approve such data for release (e.g., confirming legal requirements and business sensitivity) and select an appropriate license type that is congruent with such goals. Prospective open data licensors should also consider whether existing confidentiality obligations with customers or other third parties or security concerns outweigh the benefit that is sought by offering the data publicly. Our team can recommend off-the-shelf licenses that may meet your needs or draft the bespoke license terms that achieve your business goals.

Next Steps

Our Technology Transactions team, including the authors of this update, and our Cyber, Privacy and Data Innovation team can help you evaluate specific licenses of interest (either to license in or publicly offer your dataset), implement internal policies around the use of open data, and plan remediation options to meet the needs of your team and business.

[1] Examples include the Computational Use of Data Agreement (“C-UDA”), Open Use of Data Agreement (“O-UDA”), Community Data License Agreement – Permissive, Community Data License Agreement – Sharing, Open Data Commons Public Domain Dedication and License (“PDDL”), Open Data Commons Open Database License (“ODbL”) and the Open Data Commons Attribution License (“ODC-By”).