Data is the bedrock of progress across nearly every field. It serves as the raw material from which profound insights are forged, enabling us to precisely measure current realities, identify critical trends, and possibly predict future outcomes.
At Google, our mission with Data Commons is to organize the world's publicly available statistical data, making it more accessible and useful for everyone. It's an open-source knowledge graph that unifies a vast array of public data from diverse sources, simplifying access and comprehension for developers, researchers, and data analysts alike. Along with the datacommons.org website, Google Search uses Data Commons to answer queries like What is the population of San Francisco?, with the top graph generated by Data Commons.
Today, we're announcing the general availability of the new Python client library for the Data Commons based on the V2 REST API. This new Python library dramatically enhances how data developers can leverage Data Commons.
This milestone was significantly shaped by the vision and substantial contributions of our partner The ONE Campaign, a global organization working to create the investments needed for economic opportunities and healthier lives in Africa. We built Data Commons as an open-source platform precisely to encourage community contributions and enable innovative uses, and this partnership with The ONE Campaign perfectly exemplifies that goal. ONE advocated for, proposed the design and coded the client library to make Data Commons' rich insights available to data scientists and analysts who want to leverage the rich ecosystem of Python analytical tools and libraries.
The Data Commons platform also allows organizations, like the United Nations or ONE, to host their own Data Commons instances. These custom instances enable the seamless integration of proprietary datasets with the foundational Data Commons knowledge graph. Organizations leverage the Data Commons data framework and tools while maintaining full control over their data and resources.
One of the most impactful additions in the V2 library is robust support for custom instances. This means you can now use the Python library to programmatically query any public or private instance—whether hosted locally, within your organization or on the Google Cloud Platform.
The Python library makes it very easy to perform common queries against Data Commons data, such as:
V2 of the client library offers many technical improvements over the V1 library, including:
variable = "sdg/SI_POV_DAY1"
variable_name = "Proportion of population below international poverty line"
df = client.observations_dataframe(variable_dcids=variable, date="all", parent_entity="Earth", entity_type="Continent")
df = df.pivot(index="date", columns="entity_name", values="value")
ax = df.plot(kind="line")
ax.set_xlabel("Year")
ax.set_ylabel("%")
ax.set_title(variable_name)
ax.legend()
ax.plot()
To get started with the Data Commons Python library, you can install the package directly from PyPI. We've also provided comprehensive resources to help you dive in, including reference documentation and online tutorials available as Google Colab notebooks.
For those currently using the V1 Python API, we strongly recommend upgrading to the new V2 Python library. The V1 API is scheduled for deprecation, and adopting the new library ensures you'll have access to the latest features and continued support.
This library is a testament to the power of open-source collaboration. The open-source code is available on GitHub, and we welcome contributions from the community under the Google Contributor License Agreement.