9 minute read
While not new, the use of “open data” is on the rise as teams everywhere look for existing low-cost datasets to facilitate their projects without the need to personally incur the time, expense and internal approval process of sourcing and structuring such information. Despite the perceived flexibility that the name “open data” suggests, the multitude of license types and novel use cases for such open datasets magnify the impact that a license term could have in the commercialization of data-intensive technologies, especially in the creation of proprietary artificial intelligence algorithms and applications. The diverse licensor pool further underscores the permeating nature of open data to the modern organization, with government agencies, nonprofits and for-profit entities increasingly publicly offering data in innumerable categories. Some licensors currently shoehorn their open datasets into existing open-source software or other licenses for copyrightable works. However, such form terms are often not best suited for open data licensing and omit unique considerations of the licensed subject matter. In addition, the many different forms of licenses that may promote “open data” principles on their face may, in fact, contain significantly different requirements that lack the clarity or predictability that comes with mass use of a smaller set of form licenses for consistent purposes.
This update discusses open data basics, highlights common license types and terms, and raises initial discussion points for your team to manage associated risks.
There is no one universally accepted definition of “open data,” though many of the same principles from the open-source software realm are brought over by licensors and organizations for offering open data licenses to promote flexibility and transparency. For instance, the Open Knowledge Foundation defines open data as “data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.” Meanwhile, the Government of Canada defines open data in a narrower way and along slightly different lines, identifying such information as “structured data that is machine-readable, freely shared, used and built on without restrictions.” Commercial entities sometimes interpret the term differently and consider data as “open” where they are making it publicly available without a fee—even if the associated license contains restrictive terms and obligations. This lack of a uniform definition makes the evaluation of applicable license terms a critical initial step before utilizing any new dataset to ensure the terms and use expectations are aligned and do not lead to unanticipated burdens.
Open data has immediate and broad appeal. Depending on the license type and nature of the dataset, it can serve as a source for free, easily accessible, structured datasets. Dozens of websites, such as the World Bank, Kaggle and the Registry of Open Data on AWS, make finding a relevant dataset easy. To demonstrate a common use case, if a product team is considering a new product feature requiring a new data stream, rather than waiting the weeks or months it may take to obtain internal approvals and necessary user permissions, collect new data, refine such data in a usable form and evaluate its viability going forward, the use of open datasets allows developers to bypass most of these steps and start testing feasibility and viability. However, this ease of access also creates a larger surface area through which open data licensing issues could enter an organization, making the understanding of licensing terms and a coordinated approach to evaluating licenses and their impact particularly important to any organization.
Licensor goals and motivations in contributing open data can vary significantly, which has led to a wide variety of license terms that reflect such objectives and created a sliding scale of “openness”. In some instances, organizations have drafted new types of open data licenses to tailor the legal terms to the nature of the licensed material (data) and/or with the hopes of setting a new standard. In other instances, licensors have chosen to license their data under preexisting licensing regimes and simply added “data” to the scope of licensed materials. For instance, the European Commission adopted certain Creative Commons licenses to license their data to the public after finding they offered optimal flexibility and ease of use, a set of licenses which have long been used by creators to publicly license open-source software and other types of copyrightable works. While governments are more typically allowing the use of their open datasets under license terms with few restrictions, private entities are often creating restrictions on users of open data or differentiating in their terms between commercial uses and noncommercial uses.
While many forms of open data licenses exist, certain core concepts appear in many license types and must be considered in conjunction with your use case. Particularly if a prospective licensee intends to use open data in a commercial context, such as for the purpose of developing or offering its own products and services to third parties, such user should pay special attention to certain common license restrictions that may impede or entirely restrict the contemplated business model:
As with the use of internal open-source software policies, thoughtful implementation of open data usage strategies and policies can help your organization navigate open data risks:
Our Technology Transactions team, including the authors of this update, and our Cyber, Privacy and Data Innovation team can help you evaluate specific licenses of interest (either to license in or publicly offer your dataset), implement internal policies around the use of open data, and plan remediation options to meet the needs of your team and business.
 Examples include the Computational Use of Data Agreement (“C-UDA”), Open Use of Data Agreement (“O-UDA”), Community Data License Agreement – Permissive, Community Data License Agreement – Sharing, Open Data Commons Public Domain Dedication and License (“PDDL”), Open Data Commons Open Database License (“ODbL”) and the Open Data Commons Attribution License (“ODC-By”).