National AI Education Strategy: Moving Beyond Vendor Lock-In

June 18, 2026

Why the EdTech Industry Is Building AI All Wrong

July 3, 2026

Why Critical Solutions Require Open Data

Published by SocialLab Team on June 29, 2026

The Open Systems We Already Rely On

When a journalist needs a map, they use OpenStreetMap. When a researcher needs global health statistics, they query the WHO’s open data portal. When a developer needs historical weather patterns, they pull from NOAA’s public archives. These are not products. They are shared foundations, and most people who use them do not think about what it took to build them or how they are sustained.

Shared open systems have specific properties that distinguish them from products. They are built to be used by others, not consumed by their builder. They are maintained for reliability, not optimized for revenue. They are governed for the long-term benefit of those who depend on them, not the short-term benefit of those who hold them.

The datasets we most need to understand the world like climate models, disease surveillance feeds, electoral integrity data, orbital environment catalogs, have the properties of shared public goods. They describe collective systems. Their value multiplies when more people can access and build on them. No single organization is equipped to extract their full value alone.

And yet, the default model for managing them remains institutional control, commercial licensing, or bureaucratic access restriction. The result is a persistent gap between the data that exists and the intelligence that could be built from it, a gap that appears most visibly in crises, when the decisions that need to be made fastest are the ones least supported by available information.

What a Data Commons Actually Means

The concept of a commons has a long history in political economy: shared resources governed by communities for collective benefit, with rules that prevent both exclusion and overexploitation. Elinor Ostrom won the Nobel Prize in Economics in 2009 for demonstrating that commons can be governed sustainably, that the choice is not simply between privatization and depletion, but between bad governance and good governance.

A data commons applies this logic to information. It is not simply data that is free to download. It requires four properties that together make it a genuine shared resource:

Property 01

Openly Licensed

With terms that permit use, modification, and redistribution without friction. Not just accessible, but genuinely open in the legal sense. The distinction matters because data that is technically downloadable but legally restricted is not part of a commons.

Property 02

Structurally Maintained

With versioning, documentation, provenance tracking, and quality standards that make it reliable enough to build on. Raw data dumps are not a commons. Maintained, documented, versioned datasets are. The difference is the investment in making data usable rather than merely available.

Property 03

Community Governed

With contribution pathways, governance structures, and accountability mechanisms ensuring no single entity can unilaterally close, degrade, or weaponize the resource. This is what distinguishes a commons from a benevolent monopoly dressed in open-source language.

Property 04

Reproducibility-Compatible

Meaning that research, analysis, and decisions made using the data can be verified, challenged, and built upon by others. This is what distinguishes a commons from a black box dressed in open language. If the data cannot support independent replication, it does not yet qualify as a commons.

These properties are not technically difficult to implement. They are institutionally difficult, they require organizations to make deliberate choices about openness over control, and to sustain those choices over time even when control is commercially or politically advantageous.

Three Domains Where Closed Data Is Failing Us

Domain 01 — Climate and Environmental Intelligence

The Policy Gap Is a Data Gap

The climate crisis is, among other things, a data problem. The physical systems involved like ocean temperatures, ice mass, atmospheric carbon concentration, biodiversity indices, are vast, complex, and interdependent. Understanding them requires integrating data from thousands of sources across dozens of jurisdictions.

Much of this data is open, and the scientific community has built remarkable open systems around it. The Copernicus Climate Change Service, operated by ECMWF on behalf of the European Commission, is one of the most ambitious open data projects in history. But significant gaps remain: proprietary satellite data, national datasets shared only under restrictive agreements, and commercial sensors whose output is licensed rather than published. The consequence is not just incomplete science. It is incomplete policy. Decisions about capital investment, agricultural planning, and climate finance are being made with partial information, not because the underlying data does not exist, but because it has not been made part of a commons.

Domain 02 — Crisis and Humanitarian Intelligence

Fragmentation Costs Lives

SocialLab’s Data for Crisis initiative, developed in partnership with Deutsche Welle Akademie and supported by the German Federal Ministry for Economic Cooperation and Development, was built around a specific observation: the data needed to understand and respond to crises exists, but it does not cohere.

Population displacement figures are held by agencies with inconsistent sharing policies. Conflict event data is scattered across monitoring organizations with incompatible formats. Economic shock indicators that would help predict humanitarian crises are locked in proprietary financial databases. The cost of this fragmentation is measured in response time, misallocated aid, and stories that go untold because the data to tell them accurately does not exist in a usable form.

Open data commons change this structure. When a crisis data platform is built on open, versioned, documented datasets rather than proprietary feeds, it can be forked by local journalists, adapted by community organizations, and built upon by researchers, multiplying its value rather than concentrating it.

Domain 03 — Orbital and Space Domain Intelligence

The Newest Frontier of the Same Problem

Near-Earth space is now genuinely crowded. There are more than 9,000 active satellites in orbit, with commercial operators planning constellation expansions that will add tens of thousands more in this decade. There are an estimated 580,000 objects larger than one centimeter traveling at orbital velocities. The risk of cascading collision events — Kessler Syndrome — is no longer theoretical.

The United Nations Office for Outer Space Affairs has called for improved data sharing to manage orbital traffic. Researchers, developers, and analysts who could contribute to improving collective situational awareness find themselves blocked by government access restrictions, commercial licensing costs, or tools designed either for specialized aerospace engineers or for mass-market consumers who need a simple alert. This is the newest version of the same institutional failure: critical shared systems, inadequately shared data.

The Open Data Precedents That Worked

The argument for open data commons is not theoretical. It is demonstrated, repeatedly, across domains.

Maps

OpenStreetMap

Began in 2004 as a response to restrictive commercial map licensing. Today it is the foundational geographic dataset for humanitarian response organizations, urban planners, autonomous vehicle development, and crisis response platforms worldwide, including SocialLab’s own work and the Humanitarian OpenStreetMap Team, whose community mapping has enabled targeted vaccination campaigns, reaching 95% vaccination rates in DRC through precise route mapping.

Genomics

The Human Genome Project

Made a deliberate, contested decision in the late 1990s to place sequencing data in the public domain immediately upon generation, formalized through the Bermuda Principles of 1996. The downstream value was enormous: research using public Human Genome Project data produced nearly twice as many academic papers as comparable research using proprietary data, a documented multiplier effect from openness that would not have occurred under institutional control.

Knowledge

Wikipedia and Wikimedia

296 billion page views across Wikimedia projects in 2024 — an average of nearly 10,000 every second. The knowledge it makes accessible was, before its existence, locked in encyclopedias that cost money to purchase and were updated on cycles measured in years. In each case, the commons did not emerge automatically. It required deliberate design, sustained governance, and communities willing to contribute to something they did not own individually.

Why Openness Is Not Naivety

The objection to open data commons is usually framed in terms of quality, sustainability, or security. Open data, the argument goes, is unverified data. Free data is unmaintained data. Public data is exploitable data.

These are real concerns, but they are engineering problems, not fundamental objections.

Quality in open data commons is addressed through provenance tracking, versioning, and community review, the same mechanisms that make open-source software reliable enough to power the world’s most demanding systems. Linux powers the majority of the world’s servers not despite being open-source, but in part because it is: more eyes on the code means more eyes on the errors.

Sustainability is addressed through governance structures that do not depend on a single organization’s continued goodwill or financial health. OpenStreetMap’s data does not disappear if a company changes its business model. Wikipedia’s knowledge base does not degrade if a platform loses funding. The commons, when well-governed, outlasts any of its contributors.

Security in open data is, counterintuitively, often better than in closed systems. Sensitive data, data about individuals, classified operational information, proprietary commercial intelligence, should not be in a public data commons. The question is whether the excuse of sensitivity is being used legitimately to protect genuinely sensitive information, or as institutional cover for data that would generate no security risk if made public but significant inconvenience if it had to be maintained to public standards.

The orbital catalog is not sensitive in a meaningful security sense. The climate sensor data withheld under national sovereignty claims is not sensitive. The humanitarian data held behind agency access agreements is not sensitive. It is restricted because restriction is the default, not because openness would cause harm.

What Building the Commons Actually Requires

The barriers to open data commons are not primarily technical. The technology to build, maintain, and distribute open datasets at scale is mature, affordable, and well-documented. The barriers are institutional, incentive-based, and cultural, and addressing them requires deliberate choices that most organizations currently have no structural reason to make.

Institutions that hold data need different incentives. Government agencies that maintain critical datasets often have no structural incentive to make them interoperable or openly licensed. The incentives that would change this, funding tied to openness requirements, recognition for data stewardship as public service, are available but underused. The Open Data Charter and Open Government Partnership provide frameworks, but frameworks require adoption.
Contributors need recognition and sustainability. The communities that build and maintain open data commons do so through intrinsic motivation and institutional support. Academic researchers need to be able to publish data contributions in venues that count toward tenure and promotion, which is why JOSS, the Journal of Open Source Software, represents an important institutional innovation. Professional recognition for data stewardship needs to reach parity with recognition for data analysis.
Standards matter more than platforms. The most durable open data commons are built around interoperable standards rather than specific platforms. When data is documented in formats that any tool can consume, it outlives the platform that first published it. The FAIR data principles — Findable, Accessible, Interoperable, Reusable — represent the minimum viable standard for data that aspires to commons status.
The Open Source Initiative and Open Data Handbook provide the legal and operational frameworks that lower the institutional cost of openness. The main thing that keeps these frameworks from wider adoption is not their complexity. It is the absence of organizational leadership willing to make openness a default rather than an exception.

The Argument for Now

There is a particular urgency to this argument in 2026 that did not exist a decade ago. The systems we most need to understand — the climate, the information environment, the orbital domain — are changing faster than our institutional capacity to respond. The window for building the open data systems to understand and govern these changes is narrowing, not expanding.

The precedents are established. The technology is available. The communities that would contribute to and govern these commons exist and are growing. What is required is the decision, by institutions that hold critical data, to treat their stewardship role as a public responsibility rather than a competitive asset.

At SocialLab, this argument is not new. It is the foundation of how we have built intelligence systems since 2015 — from crisis data platforms designed for data journalists in the Global South, to disinformation detection tools built on open methodologies. Our work on why AI transparency is the new competitive advantage makes the same point in a different register: the organizations that will generate the most durable value from AI are the ones that make their systems explainable, auditable, and accountable, not because they are required to, but because opacity is, ultimately, fragile.

The same principle applies to the data commons. Closed data is fragile data. It depends on the continued goodwill, financial health, and institutional priorities of whoever holds it. Open data, when governed well, is resilient, it belongs to the commons and cannot be revoked by any single decision.

Intelligence, when open, becomes a shared foundation. That foundation, when governed well, becomes a commons. And a commons, when built with care, becomes something that outlasts any of the organizations that contributed to it.

The domains change. The principle does not. SocialLab exists to advance AI and data science for charitable purpose, and that purpose requires the open data foundations on which sustainable intelligence is built. The UN’s 2030 Agenda for Sustainable Development cannot be met without it.

The question is not whether we can afford to build an open data commons. It is whether we can afford the crises that will arrive without one.

Frequently Asked Questions

Common questions about open data, data commons, and SocialLab’s approach.

What is the difference between open data and a data commons?

Open data refers to data that is freely accessible and licensed for reuse. A data commons is a more specific concept — it adds community governance, sustained maintenance, and interoperability standards. All data commons involve open data, but not all open data constitutes a commons. The FAIR principles (Findable, Accessible, Interoperable, Reusable) describe the minimum viable standard for data that aspires to commons status.

How does SocialLab approach open data in its own projects?

SocialLab's data projects are designed with openness as a core requirement where the nature of the data permits it. The Data for Crisis platform was built on open standards and published with documentation enabling replication by other organizations. Disinformation detection tools use open datasets and publish methodologies openly, consistent with JOSS and FAIR standards.

What domains does SocialLab see as most urgent for open data commons development?

Three domains are particularly urgent: crisis and humanitarian data (where fragmentation costs lives and response time), climate and environmental intelligence (where incomplete data leads to incomplete policy), and orbital domain awareness (where the absence of open data creates risks for an environment everyone depends on).

How can organizations contribute to open data commons?

Organizations can contribute by open-licensing datasets they hold that serve public purposes, by funding the maintenance of existing commons rather than building proprietary alternatives, by aligning their data standards with FAIR principles, and by advocating within their sectors for openness as a governance default rather than an exception.

Open intelligence systems

Building on open data? Working on crisis intelligence, climate data, or space domain awareness? Let’s talk.

Talk to SocialLab Explore our Portfolio →

SocialLab has built open intelligence systems since 2015 — from crisis data platforms to disinformation research. sociallab.ai

Why Critical Solutions Require Open Data

The Open Systems We Already Rely On