
National AI Education Strategy: Moving Beyond Vendor Lock-In
June 18, 2026- The Open Systems We Rely On
- What a Data Commons Actually Means
- Three Domains Where Closed Data Fails
- Precedents That Worked
- Why Openness Is Not Naivety
- What Building the Commons Requires
- The Argument for Now
- Frequently Asked Questions
There is a dataset that tracks every significant object orbiting Earth. It covers more than 27,000 tracked objects, describes trajectories accurate enough to predict collision risks, and is updated continuously. That dataset is not fully open. This is not unique to orbital data. It is the default condition of the most critical datasets in the world.
Researchers who need it must navigate access agreements. Developers who want to build on it must interface with systems designed in a different era of computing. Journalists who want to explain orbital congestion to a public that depends on satellite systems for GPS, weather forecasting, and emergency communications must work around restrictions that serve no clear public interest.
This pattern — consequential data, fragmented access, inadequate governance — repeats across every domain where the stakes are highest. Climate models. Disease surveillance feeds. Electoral integrity data. Crisis intelligence. The datasets we most need to understand the world are exactly the ones most likely to be locked behind institutional walls or scattered across incompatible systems. And the cost of that fragmentation is not abstract.
The Open Systems We Already Rely On
When a journalist needs a map, they use OpenStreetMap. When a researcher needs global health statistics, they query the WHO’s open data portal. When a developer needs historical weather patterns, they pull from NOAA’s public archives. These are not products. They are shared foundations, and most people who use them do not think about what it took to build them or how they are sustained.
Shared open systems have specific properties that distinguish them from products. They are built to be used by others, not consumed by their builder. They are maintained for reliability, not optimized for revenue. They are governed for the long-term benefit of those who depend on them, not the short-term benefit of those who hold them.
The datasets we most need to understand the world like climate models, disease surveillance feeds, electoral integrity data, orbital environment catalogs, have the properties of shared public goods. They describe collective systems. Their value multiplies when more people can access and build on them. No single organization is equipped to extract their full value alone.
And yet, the default model for managing them remains institutional control, commercial licensing, or bureaucratic access restriction. The result is a persistent gap between the data that exists and the intelligence that could be built from it, a gap that appears most visibly in crises, when the decisions that need to be made fastest are the ones least supported by available information.
What a Data Commons Actually Means
The concept of a commons has a long history in political economy: shared resources governed by communities for collective benefit, with rules that prevent both exclusion and overexploitation. Elinor Ostrom won the Nobel Prize in Economics in 2009 for demonstrating that commons can be governed sustainably, that the choice is not simply between privatization and depletion, but between bad governance and good governance.
A data commons applies this logic to information. It is not simply data that is free to download. It requires four properties that together make it a genuine shared resource:
Openly Licensed
With terms that permit use, modification, and redistribution without friction. Not just accessible, but genuinely open in the legal sense. The distinction matters because data that is technically downloadable but legally restricted is not part of a commons.
Structurally Maintained
With versioning, documentation, provenance tracking, and quality standards that make it reliable enough to build on. Raw data dumps are not a commons. Maintained, documented, versioned datasets are. The difference is the investment in making data usable rather than merely available.
Community Governed
With contribution pathways, governance structures, and accountability mechanisms ensuring no single entity can unilaterally close, degrade, or weaponize the resource. This is what distinguishes a commons from a benevolent monopoly dressed in open-source language.
Reproducibility-Compatible
Meaning that research, analysis, and decisions made using the data can be verified, challenged, and built upon by others. This is what distinguishes a commons from a black box dressed in open language. If the data cannot support independent replication, it does not yet qualify as a commons.
These properties are not technically difficult to implement. They are institutionally difficult, they require organizations to make deliberate choices about openness over control, and to sustain those choices over time even when control is commercially or politically advantageous.
Three Domains Where Closed Data Is Failing Us
The Policy Gap Is a Data Gap
The climate crisis is, among other things, a data problem. The physical systems involved like ocean temperatures, ice mass, atmospheric carbon concentration, biodiversity indices, are vast, complex, and interdependent. Understanding them requires integrating data from thousands of sources across dozens of jurisdictions.
Much of this data is open, and the scientific community has built remarkable open systems around it. The Copernicus Climate Change Service, operated by ECMWF on behalf of the European Commission, is one of the most ambitious open data projects in history. But significant gaps remain: proprietary satellite data, national datasets shared only under restrictive agreements, and commercial sensors whose output is licensed rather than published. The consequence is not just incomplete science. It is incomplete policy. Decisions about capital investment, agricultural planning, and climate finance are being made with partial information, not because the underlying data does not exist, but because it has not been made part of a commons.
Fragmentation Costs Lives
SocialLab’s Data for Crisis initiative, developed in partnership with Deutsche Welle Akademie and supported by the German Federal Ministry for Economic Cooperation and Development, was built around a specific observation: the data needed to understand and respond to crises exists, but it does not cohere.
Population displacement figures are held by agencies with inconsistent sharing policies. Conflict event data is scattered across monitoring organizations with incompatible formats. Economic shock indicators that would help predict humanitarian crises are locked in proprietary financial databases. The cost of this fragmentation is measured in response time, misallocated aid, and stories that go untold because the data to tell them accurately does not exist in a usable form.
Open data commons change this structure. When a crisis data platform is built on open, versioned, documented datasets rather than proprietary feeds, it can be forked by local journalists, adapted by community organizations, and built upon by researchers, multiplying its value rather than concentrating it.
The Newest Frontier of the Same Problem
Near-Earth space is now genuinely crowded. There are more than 9,000 active satellites in orbit, with commercial operators planning constellation expansions that will add tens of thousands more in this decade. There are an estimated 580,000 objects larger than one centimeter traveling at orbital velocities. The risk of cascading collision events — Kessler Syndrome — is no longer theoretical.
The United Nations Office for Outer Space Affairs has called for improved data sharing to manage orbital traffic. Researchers, developers, and analysts who could contribute to improving collective situational awareness find themselves blocked by government access restrictions, commercial licensing costs, or tools designed either for specialized aerospace engineers or for mass-market consumers who need a simple alert. This is the newest version of the same institutional failure: critical shared systems, inadequately shared data.
The Open Data Precedents That Worked
The argument for open data commons is not theoretical. It is demonstrated, repeatedly, across domains.
OpenStreetMap
Began in 2004 as a response to restrictive commercial map licensing. Today it is the foundational geographic dataset for humanitarian response organizations, urban planners, autonomous vehicle development, and crisis response platforms worldwide, including SocialLab’s own work and the Humanitarian OpenStreetMap Team, whose community mapping has enabled targeted vaccination campaigns, reaching 95% vaccination rates in DRC through precise route mapping.
The Human Genome Project
Made a deliberate, contested decision in the late 1990s to place sequencing data in the public domain immediately upon generation, formalized through the Bermuda Principles of 1996. The downstream value was enormous: research using public Human Genome Project data produced nearly twice as many academic papers as comparable research using proprietary data, a documented multiplier effect from openness that would not have occurred under institutional control.
Wikipedia and Wikimedia
296 billion page views across Wikimedia projects in 2024 — an average of nearly 10,000 every second. The knowledge it makes accessible was, before its existence, locked in encyclopedias that cost money to purchase and were updated on cycles measured in years. In each case, the commons did not emerge automatically. It required deliberate design, sustained governance, and communities willing to contribute to something they did not own individually.
Why Openness Is Not Naivety
The objection to open data commons is usually framed in terms of quality, sustainability, or security. Open data, the argument goes, is unverified data. Free data is unmaintained data. Public data is exploitable data.
These are real concerns, but they are engineering problems, not fundamental objections.
Quality in open data commons is addressed through provenance tracking, versioning, and community review, the same mechanisms that make open-source software reliable enough to power the world’s most demanding systems. Linux powers the majority of the world’s servers not despite being open-source, but in part because it is: more eyes on the code means more eyes on the errors.
Sustainability is addressed through governance structures that do not depend on a single organization’s continued goodwill or financial health. OpenStreetMap’s data does not disappear if a company changes its business model. Wikipedia’s knowledge base does not degrade if a platform loses funding. The commons, when well-governed, outlasts any of its contributors.
Security in open data is, counterintuitively, often better than in closed systems. Sensitive data, data about individuals, classified operational information, proprietary commercial intelligence, should not be in a public data commons. The question is whether the excuse of sensitivity is being used legitimately to protect genuinely sensitive information, or as institutional cover for data that would generate no security risk if made public but significant inconvenience if it had to be maintained to public standards.
The orbital catalog is not sensitive in a meaningful security sense. The climate sensor data withheld under national sovereignty claims is not sensitive. The humanitarian data held behind agency access agreements is not sensitive. It is restricted because restriction is the default, not because openness would cause harm.
What Building the Commons Actually Requires
The barriers to open data commons are not primarily technical. The technology to build, maintain, and distribute open datasets at scale is mature, affordable, and well-documented. The barriers are institutional, incentive-based, and cultural, and addressing them requires deliberate choices that most organizations currently have no structural reason to make.
- Institutions that hold data need different incentives. Government agencies that maintain critical datasets often have no structural incentive to make them interoperable or openly licensed. The incentives that would change this, funding tied to openness requirements, recognition for data stewardship as public service, are available but underused. The Open Data Charter and Open Government Partnership provide frameworks, but frameworks require adoption.
- Contributors need recognition and sustainability. The communities that build and maintain open data commons do so through intrinsic motivation and institutional support. Academic researchers need to be able to publish data contributions in venues that count toward tenure and promotion, which is why JOSS, the Journal of Open Source Software, represents an important institutional innovation. Professional recognition for data stewardship needs to reach parity with recognition for data analysis.
- Standards matter more than platforms. The most durable open data commons are built around interoperable standards rather than specific platforms. When data is documented in formats that any tool can consume, it outlives the platform that first published it. The FAIR data principles — Findable, Accessible, Interoperable, Reusable — represent the minimum viable standard for data that aspires to commons status.
- The Open Source Initiative and Open Data Handbook provide the legal and operational frameworks that lower the institutional cost of openness. The main thing that keeps these frameworks from wider adoption is not their complexity. It is the absence of organizational leadership willing to make openness a default rather than an exception.
The Argument for Now
There is a particular urgency to this argument in 2026 that did not exist a decade ago. The systems we most need to understand — the climate, the information environment, the orbital domain — are changing faster than our institutional capacity to respond. The window for building the open data systems to understand and govern these changes is narrowing, not expanding.
The precedents are established. The technology is available. The communities that would contribute to and govern these commons exist and are growing. What is required is the decision, by institutions that hold critical data, to treat their stewardship role as a public responsibility rather than a competitive asset.
At SocialLab, this argument is not new. It is the foundation of how we have built intelligence systems since 2015 — from crisis data platforms designed for data journalists in the Global South, to disinformation detection tools built on open methodologies. Our work on why AI transparency is the new competitive advantage makes the same point in a different register: the organizations that will generate the most durable value from AI are the ones that make their systems explainable, auditable, and accountable, not because they are required to, but because opacity is, ultimately, fragile.
The same principle applies to the data commons. Closed data is fragile data. It depends on the continued goodwill, financial health, and institutional priorities of whoever holds it. Open data, when governed well, is resilient, it belongs to the commons and cannot be revoked by any single decision.
Intelligence, when open, becomes a shared foundation. That foundation, when governed well, becomes a commons. And a commons, when built with care, becomes something that outlasts any of the organizations that contributed to it.
The domains change. The principle does not. SocialLab exists to advance AI and data science for charitable purpose, and that purpose requires the open data foundations on which sustainable intelligence is built. The UN’s 2030 Agenda for Sustainable Development cannot be met without it.
The question is not whether we can afford to build an open data commons. It is whether we can afford the crises that will arrive without one.
Frequently Asked Questions
Common questions about open data, data commons, and SocialLab’s approach.





