Earth Day Special with Ricardo Miron and Katie Wetstone

GitHub
22 Apr 202465:00

TLDRIn celebration of Earth Day, GitHub hosted a special event featuring Ricardo Miron, CTO at the Digital Public Goods Alliance, and Katie Wetstone, a Senior Data Scientist at Driven Data. They discussed the importance of open-source technology in promoting environmental protection and sustainable development. The event spotlighted Project Samba, an open-source Python package developed by Driven Data, which utilizes machine learning to automate the analysis of camera trap videos for wildlife conservation. Miron highlighted the role of digital public goods in advancing sustainable development goals, while Wetstone detailed the technical aspects of Samba, its application for conservationists, and the opportunities for community contribution. The discussion also covered the 'For Good First Issue' tool on GitHub, which helps developers contribute to open-source projects, and the 'DPG Standard' that ensures digital public goods are designed responsibly. The event concluded with an invitation for the audience to engage with the projects and contribute to the open-source community.

Takeaways

  • 📚 GitHub is more than a code repository; it offers tools like GitHub Projects, GitHub Codespaces, GitHub Copilot, and GitHub Actions to streamline the development lifecycle.
  • 🌍 Earth Day is celebrated annually on April 22nd to support environmental protection, promote sustainable practices, and raise awareness about environmental issues.
  • 🤖 GitHub Copilot assists developers by making their code more readable and efficient through AI-powered coding assistance.
  • 📈 Ricardo Miron, CTO at the Digital Public Goods Alliance, discussed the importance of open-source technologies in achieving the sustainable development goals and their impact on creating a more equitable world.
  • 📉 The value of open-source contributions is immense, with a Harvard study estimating the supply side at around $4.50 billion and the demand side at over $48.8 trillion.
  • 🌿 Digital Public Goods (DPGs) are a special kind of open-source that includes software, AI models, open datasets, and open content collections, all aimed at advancing the sustainable development goals.
  • 🔍 DPGs adhere to the 'Do no harm' principle and have a set of nine criteria known as the DPG standard to ensure they meet the required benchmarks for recognition.
  • 🐘 Project Samba, introduced by Katie Wetstone from Driven Data, is a tool that uses Python, machine learning, and computer vision to automate the analysis of camera trap videos for wildlife conservation.
  • 📱 Samba has a user-friendly web application that allows non-programmers to process videos without writing code, thus enabling subject matter experts to utilize the tool effectively.
  • 🚀 Driven Data organizes machine learning competitions and provides consulting services to mission-driven organizations, in addition to maintaining open-source tools like Samba.
  • 🌐 The 'For Good First Issue' tool on GitHub helps developers, especially beginners, to find and contribute to open-source projects by addressing issues that match their skills.

Q & A

  • What is GitHub and how does it benefit developers like Mona?

    -GitHub is a platform for software development that allows developers to store, track, and manage their code. It benefits developers like Mona by providing project management tools, customizable views, filters, and layouts that facilitate team collaboration. It also offers GitHub Copilot for coding assistance, GitHub Actions for automated testing, and advanced security features to protect code integrity.

  • What is Earth Day and why is it significant?

    -Earth Day is an annual event celebrated on April 22nd globally, dedicated to supporting environmental protection. It involves worldwide engagement in activities such as tree planting, community beach cleanups, and educational events to raise awareness about environmental issues and promote sustainable practices.

  • Can you explain the role of Ricardo Miron in the context of the digital public goods Alliance?

    -Ricardo Miron is the CTO at the digital public goods (DPG) Alliance. His role involves helping open source technologies gain more impact and contribute to a more equitable world through the sustainable development goals. He is responsible for ensuring that open source projects meet the DPG standard, which includes having the correct license, advancing sustainable development goals, and being designed without causing harm.

  • What is the significance of Project Samba in the context of conservation efforts?

    -Project Samba is a digital public good that uses Python, machine learning, and computer vision to automate the analysis of video data captured by camera traps in the wild. It helps conservationists process hours of footage by identifying relevant animal species, estimating distances for population size calculations, and segmenting animal parts for behavior study, thus streamlining the conservation monitoring process.

  • How does the 'For Good First Issue' tool on GitHub assist developers in contributing to open source projects?

    -The 'For Good First Issue' tool on GitHub helps developers find open source projects with open issues that need contributions. It allows users to explore projects, view open issues, and directly contribute by submitting pull requests. The tool is designed to lower the barrier to entry for contributing to open source, making it beginner-friendly and facilitating engagement for a wider range of developers.

  • What are the key principles of digital public goods as defined by the United Nations?

    -The key principles of digital public goods as defined by the United Nations include having a correct open source license, advancing the sustainable development goals, and being designed in a way that does not harm. These principles ensure that the technology is accessible, beneficial, and safe for use towards achieving societal and environmental goals.

  • How does Driven Data contribute to social impact through data science?

    -Driven Data sits at the intersection of advanced machine learning tools and social impact. They contribute by hosting machine learning competitions that are open to the public, providing direct consulting services to mission-driven organizations, and maintaining open-source tools that help people practice data science better. They also work on projects like Zamba, which directly aid in conservation efforts.

  • What are the technical challenges involved in working with video data for conservation projects?

    -Video data is extremely large and cannot be treated as an image problem due to the high number of frames. Predicting on every frame of a video is computationally expensive and slow. Zamba addresses this by using a frame selection process to identify and work with only the most relevant frames, making the analysis more efficient.

  • How does Zamba handle the processing of videos for species detection and behavior analysis?

    -Zamba uses pre-trained models to detect different species in videos and allows users to train their models with new species. It can estimate the distance between a camera and an animal for certain species, which is crucial for population size estimation. Additionally, Zamba can segment parts of an animal, particularly useful for studying animal behavior.

  • What are the limitations of using pre-trained models in Zamba?

    -The limitations of pre-trained models in Zamba are that they can only identify certain species based on the training data they were given. If a user is working with species or ecosystems not covered by the pre-trained models, there may not be a significant overlap, and the models may not be as effective without additional training.

  • How can non-developers contribute to projects like Zamba?

    -Non-developers can contribute by participating in data labeling, correcting predicted labels on videos processed by Zamba, and sharing custom models they have created. They can also engage in community discussions, provide feedback, and help improve documentation.

Outlines

00:00

📚 Introduction to GitHub's Features and Earth Day Celebration

The first paragraph introduces GitHub as a platform beyond code storage, highlighting its project management tools, customizable views, and filters. It tells the story of Mona, a developer who utilizes GitHub for tasks, iterations, and coding efficiency with GitHub Copilot. The paragraph also touches on GitHub's security features and concludes with an introduction to an Earth Day celebration event, emphasizing environmental protection and sustainable practices.

05:01

🌿 Earth Day Special: Project SVA Presentation

The second paragraph focuses on an Earth Day event at GitHub, where a special project called SVA is presented. Ricardo Miron, CTO of Digital Public Goods Alliance, and Katie Wetstone from Driven Data join the stage to discuss the project. They delve into the importance of open-source technology and its impact on sustainable development goals. The discussion also covers the concept of digital public goods (DPGs) and their role in creating a more equitable world through technology.

10:02

📈 The DPG Standard and Its Role in Technology

This paragraph delves into the DPG standard, a set of criteria that digital public goods must meet to be officially recognized. It discusses the importance of these standards in ensuring technology supports sustainable development goals without causing harm. The paragraph also provides an overview of the Digital Public Goods Alliance, its members, and its goal to increase the sustainability and desirability of digital public goods. It highlights the use of sustainable development goals as a framework for the alliance's work.

15:03

🐘 Project Samba: Conservation Through Machine Learning

The fourth paragraph introduces Project Samba, a tool developed by Driven Data to support conservation efforts. Katie Wetstone explains that Samba is a Python package designed to process camera trap videos, making it easier for conservationists to monitor wildlife. The tool can detect species, estimate distances for population size calculations, and segment animal parts for behavior analysis. The paragraph also discusses how Samba is made accessible through both a command-line tool and a user-friendly web application.

20:06

🤖 Technical Insights into Project Samba

In this paragraph, Katie Wetstone provides a deeper look into the technical aspects of Project Samba. She explains the challenges of working with video data due to its large size and how Samba addresses this through frame selection. The paragraph also discusses the use of machine learning models, such as Mega detector light and YOLO X, to identify and analyze animal movements in the videos. It highlights the importance of these models in making Samba efficient and effective for conservation work.

25:07

🌱 Open Sourcing Samba and Engaging the Community

The sixth paragraph discusses the process of making Samba open source and the importance of engaging with the community. Katie emphasizes the iterative process of development and delivery, ensuring that technical advancements are accessible to end-users. The paragraph also explores the concept of the 'missing middle' of model implementation, the importance of clear documentation, and the role of citizen scientists in the initial labeling for training the Samba models.

30:08

🔍 Optimizing Samba for Efficiency and Accessibility

This paragraph explores the optimization techniques used in Samba, particularly the frame selection process that allows the tool to handle video data efficiently. It also touches on the availability of an archive of models and the ability for users to share custom models through a model zoo on GitHub. The paragraph emphasizes the importance of making the tool as accessible as possible, including for users with lower bandwidth settings.

35:09

🌟 Showcasing Zamba and Inviting Contributions

The eighth paragraph serves as a call to action, inviting contributions to the Zamba project. It discusses the importance of good open-source practices, such as clear ownership, a code of conduct, and community governance. The paragraph also mentions the development of new capabilities in Zamba, funded by a grant, and the anticipation of new opportunities for contributors with the upcoming release of Zamba version 3.

40:10

🌍 Driven Data's Initiatives and Upcoming Events

The ninth paragraph highlights other projects by Driven Data, such as SCI-FI, which detects harmful algal blooms, and mentions their participation in competitions and events. Katie Wetstone shares upcoming talks and events, emphasizing the importance of ethics in AI and the role of open-source projects in addressing global challenges. The paragraph concludes with an invitation to connect with Driven Data and engage in their initiatives.

45:12

📝 Final Thoughts and Earth Day Acknowledgments

The final paragraph wraps up the discussion on Project Samba and the work of the Digital Public Goods Alliance. It emphasizes the importance of Earth Day and the role of open-source technology in promoting environmental awareness and sustainability. The paragraph also provides information on how to get involved with Driven Data's projects and competitions, encouraging the audience to continue the conversation and contribute to their initiatives.

Mindmap

Keywords

💡GitHub

GitHub is a web-based platform for version control and source code management, which is central to the video's discussion. It allows developers to collaborate and track changes in their code. In the script, GitHub is mentioned as a place where developers can store code, manage projects, and use various features like GitHub Copilot for coding assistance and GitHub Actions for automated testing.

💡Digital Public Goods (DPGs)

Digital Public Goods are a special category of open-source software that adhere to a set of principles ensuring they are designed to support sustainable development goals without causing harm. The video discusses how DPGs are not only software but also include AI models, open datasets, and other digital resources that are freely accessible and modifiable. The Digital Public Goods Alliance, mentioned in the script, is an organization that recognizes and supports these types of projects.

💡Sustainable Development Goals (SDGs)

The Sustainable Development Goals are a collection of 17 global goals set by the United Nations to address various social, economic, and environmental challenges. The script emphasizes the importance of these goals in guiding the development and use of digital public goods, ensuring that technology contributes positively to societal and environmental well-being.

💡Machine Learning

Machine learning is a subset of artificial intelligence that involves the use of data and algorithms to enable machines to learn from that data without being explicitly programmed. In the context of the video, machine learning is used in Project Samba to automate the analysis of video data for conservation efforts, highlighting its role in processing large volumes of data efficiently.

💡Project Samba

Project Samba is a recognized digital public good and an open-source tool featured in the video. It uses machine learning and computer vision to automate the analysis of camera trap videos, aiding in wildlife conservation by filtering irrelevant footage and identifying animal species. This tool exemplifies the application of technology in achieving SDGs, particularly those related to life on land and underwater.

💡Open Source

Open source refers to software or a project where the source code is made available to the public, allowing anyone to view, use, modify, and distribute it. The video emphasizes the importance of open-source projects in fostering collaboration, innovation, and the development of community-driven technologies that can have a significant impact on various sectors, including environmental protection.

💡Earth Day

Earth Day is an annual event celebrated on April 22nd to demonstrate support for environmental protection. The video uses the occasion of Earth Day to highlight the importance of sustainable practices and the role of technology in advocating for the sustainability of our planet. It serves as a backdrop for discussing projects like Samba that contribute to environmental conservation.

💡Code Spaces

GitHub Codespaces is a cloud-based development environment that allows developers to write, build, and test code from anywhere. In the script, it is mentioned as a tool that Mona, a developer in the video, uses to set up an on-demand development environment, eliminating the need to manage local dependencies and allowing her to focus on coding.

💡Automated Testing

Automated testing involves the use of software to execute tests on applications to determine their correctness, performance, and other attributes. The video discusses how GitHub Actions can be used to set up automated testing, which is crucial for ensuring the quality and reliability of the code being developed, particularly in the context of complex projects.

💡Secret Scanning

Secret scanning is a security feature that helps prevent the accidental exposure of sensitive information, such as passwords or API keys, in a codebase. The video mentions the use of GitHub's secret scanning as a part of devops governance practices to maintain the security of the code and protect against potential vulnerabilities.

💡For Good First Issue

For Good First Issue is a tool that helps developers, especially beginners, to find and contribute to open-source projects by addressing open issues. The video showcases this tool as a way to engage with the community and contribute to digital public goods, like Project Samba, thereby promoting a culture of collaboration and sustainable development.

Highlights

GitHub is more than just a code repository; it can streamline the entire development lifecycle with tools like GitHub Projects, GitHub Codespaces, GitHub Copilot, and GitHub Actions.

Mona, a developer in the story, uses GitHub's customizable features to manage tasks and track work iterations efficiently.

GitHub's Earth Day Celebration focuses on environmental protection and promoting sustainable practices globally.

Project SVA, introduced during the celebration, is an innovative tool that uses Python, machine learning, and computer vision to automate video data analysis.

Ricardo Miron, CTO at the Digital Public Goods Alliance, discusses the importance of open-source technology in achieving sustainable development goals.

Digital Public Goods (DPGs) are a special category of open-source that includes software, AI models, open datasets, and content collections, designed to adhere to the UN's principles.

The DPG standard consists of nine indicators or criteria that any solution must meet to be recognized as a digital public good.

The Digital Public Goods Alliance has over 40 members and 160 DPGs, working to increase the sustainability and desirability of these goods.

Katie Wetstone from Driven Data explains how Project Zamba automates the analysis of camera trap videos to support conservation efforts.

Zamba uses pre-trained models to detect different animal species and can be customized for specific ecosystems.

The tool can estimate the distance between a camera and an animal, crucial for population size estimation.

Zamba's web application allows non-programmers to use the tool through a user-friendly interface without writing code.

Contributions to Zamba and other digital public goods can be made through campaigns and projects run by the Digital Public Goods Alliance in collaboration with GitHub.

The 'For Good First Issue' tool by GitHub helps developers, especially beginners, to contribute to open-source projects by highlighting suitable issues.

Zamba's development was inspired by a need expressed by conservationists for an automated tool to process extensive camera trap footage.

The tool optimizes video analysis by selecting key frames from hours of footage, making the process more efficient and less computationally expensive.

Zamba's technical approach uses a combination of models, including Mega detector and YOLO X, to identify and analyze animal movements in videos.

The project encourages community contributions and has a 'good first issue' tag on GitHub to help newcomers start contributing.