Earth Day Special with Ricardo Miron and Katie Wetstone
TLDRIn celebration of Earth Day, GitHub hosted a special event featuring Ricardo Miron, CTO at the Digital Public Goods Alliance, and Katie Wetstone, a Senior Data Scientist at Driven Data. They discussed the importance of open-source technology in promoting environmental protection and sustainable development. The event spotlighted Project Samba, an open-source Python package developed by Driven Data, which utilizes machine learning to automate the analysis of camera trap videos for wildlife conservation. Miron highlighted the role of digital public goods in advancing sustainable development goals, while Wetstone detailed the technical aspects of Samba, its application for conservationists, and the opportunities for community contribution. The discussion also covered the 'For Good First Issue' tool on GitHub, which helps developers contribute to open-source projects, and the 'DPG Standard' that ensures digital public goods are designed responsibly. The event concluded with an invitation for the audience to engage with the projects and contribute to the open-source community.
Takeaways
- π GitHub is more than a code repository; it offers tools like GitHub Projects, GitHub Codespaces, GitHub Copilot, and GitHub Actions to streamline the development lifecycle.
- π Earth Day is celebrated annually on April 22nd to support environmental protection, promote sustainable practices, and raise awareness about environmental issues.
- π€ GitHub Copilot assists developers by making their code more readable and efficient through AI-powered coding assistance.
- π Ricardo Miron, CTO at the Digital Public Goods Alliance, discussed the importance of open-source technologies in achieving the sustainable development goals and their impact on creating a more equitable world.
- π The value of open-source contributions is immense, with a Harvard study estimating the supply side at around $4.50 billion and the demand side at over $48.8 trillion.
- πΏ Digital Public Goods (DPGs) are a special kind of open-source that includes software, AI models, open datasets, and open content collections, all aimed at advancing the sustainable development goals.
- π DPGs adhere to the 'Do no harm' principle and have a set of nine criteria known as the DPG standard to ensure they meet the required benchmarks for recognition.
- π Project Samba, introduced by Katie Wetstone from Driven Data, is a tool that uses Python, machine learning, and computer vision to automate the analysis of camera trap videos for wildlife conservation.
- π± Samba has a user-friendly web application that allows non-programmers to process videos without writing code, thus enabling subject matter experts to utilize the tool effectively.
- π Driven Data organizes machine learning competitions and provides consulting services to mission-driven organizations, in addition to maintaining open-source tools like Samba.
- π The 'For Good First Issue' tool on GitHub helps developers, especially beginners, to find and contribute to open-source projects by addressing issues that match their skills.
Q & A
What is GitHub and how does it benefit developers like Mona?
-GitHub is a platform for software development that allows developers to store, track, and manage their code. It benefits developers like Mona by providing project management tools, customizable views, filters, and layouts that facilitate team collaboration. It also offers GitHub Copilot for coding assistance, GitHub Actions for automated testing, and advanced security features to protect code integrity.
What is Earth Day and why is it significant?
-Earth Day is an annual event celebrated on April 22nd globally, dedicated to supporting environmental protection. It involves worldwide engagement in activities such as tree planting, community beach cleanups, and educational events to raise awareness about environmental issues and promote sustainable practices.
Can you explain the role of Ricardo Miron in the context of the digital public goods Alliance?
-Ricardo Miron is the CTO at the digital public goods (DPG) Alliance. His role involves helping open source technologies gain more impact and contribute to a more equitable world through the sustainable development goals. He is responsible for ensuring that open source projects meet the DPG standard, which includes having the correct license, advancing sustainable development goals, and being designed without causing harm.
What is the significance of Project Samba in the context of conservation efforts?
-Project Samba is a digital public good that uses Python, machine learning, and computer vision to automate the analysis of video data captured by camera traps in the wild. It helps conservationists process hours of footage by identifying relevant animal species, estimating distances for population size calculations, and segmenting animal parts for behavior study, thus streamlining the conservation monitoring process.
How does the 'For Good First Issue' tool on GitHub assist developers in contributing to open source projects?
-The 'For Good First Issue' tool on GitHub helps developers find open source projects with open issues that need contributions. It allows users to explore projects, view open issues, and directly contribute by submitting pull requests. The tool is designed to lower the barrier to entry for contributing to open source, making it beginner-friendly and facilitating engagement for a wider range of developers.
What are the key principles of digital public goods as defined by the United Nations?
-The key principles of digital public goods as defined by the United Nations include having a correct open source license, advancing the sustainable development goals, and being designed in a way that does not harm. These principles ensure that the technology is accessible, beneficial, and safe for use towards achieving societal and environmental goals.
How does Driven Data contribute to social impact through data science?
-Driven Data sits at the intersection of advanced machine learning tools and social impact. They contribute by hosting machine learning competitions that are open to the public, providing direct consulting services to mission-driven organizations, and maintaining open-source tools that help people practice data science better. They also work on projects like Zamba, which directly aid in conservation efforts.
What are the technical challenges involved in working with video data for conservation projects?
-Video data is extremely large and cannot be treated as an image problem due to the high number of frames. Predicting on every frame of a video is computationally expensive and slow. Zamba addresses this by using a frame selection process to identify and work with only the most relevant frames, making the analysis more efficient.
How does Zamba handle the processing of videos for species detection and behavior analysis?
-Zamba uses pre-trained models to detect different species in videos and allows users to train their models with new species. It can estimate the distance between a camera and an animal for certain species, which is crucial for population size estimation. Additionally, Zamba can segment parts of an animal, particularly useful for studying animal behavior.
What are the limitations of using pre-trained models in Zamba?
-The limitations of pre-trained models in Zamba are that they can only identify certain species based on the training data they were given. If a user is working with species or ecosystems not covered by the pre-trained models, there may not be a significant overlap, and the models may not be as effective without additional training.
How can non-developers contribute to projects like Zamba?
-Non-developers can contribute by participating in data labeling, correcting predicted labels on videos processed by Zamba, and sharing custom models they have created. They can also engage in community discussions, provide feedback, and help improve documentation.
Outlines
π Introduction to GitHub's Features and Earth Day Celebration
The first paragraph introduces GitHub as a platform beyond code storage, highlighting its project management tools, customizable views, and filters. It tells the story of Mona, a developer who utilizes GitHub for tasks, iterations, and coding efficiency with GitHub Copilot. The paragraph also touches on GitHub's security features and concludes with an introduction to an Earth Day celebration event, emphasizing environmental protection and sustainable practices.
πΏ Earth Day Special: Project SVA Presentation
The second paragraph focuses on an Earth Day event at GitHub, where a special project called SVA is presented. Ricardo Miron, CTO of Digital Public Goods Alliance, and Katie Wetstone from Driven Data join the stage to discuss the project. They delve into the importance of open-source technology and its impact on sustainable development goals. The discussion also covers the concept of digital public goods (DPGs) and their role in creating a more equitable world through technology.
π The DPG Standard and Its Role in Technology
This paragraph delves into the DPG standard, a set of criteria that digital public goods must meet to be officially recognized. It discusses the importance of these standards in ensuring technology supports sustainable development goals without causing harm. The paragraph also provides an overview of the Digital Public Goods Alliance, its members, and its goal to increase the sustainability and desirability of digital public goods. It highlights the use of sustainable development goals as a framework for the alliance's work.
π Project Samba: Conservation Through Machine Learning
The fourth paragraph introduces Project Samba, a tool developed by Driven Data to support conservation efforts. Katie Wetstone explains that Samba is a Python package designed to process camera trap videos, making it easier for conservationists to monitor wildlife. The tool can detect species, estimate distances for population size calculations, and segment animal parts for behavior analysis. The paragraph also discusses how Samba is made accessible through both a command-line tool and a user-friendly web application.
π€ Technical Insights into Project Samba
In this paragraph, Katie Wetstone provides a deeper look into the technical aspects of Project Samba. She explains the challenges of working with video data due to its large size and how Samba addresses this through frame selection. The paragraph also discusses the use of machine learning models, such as Mega detector light and YOLO X, to identify and analyze animal movements in the videos. It highlights the importance of these models in making Samba efficient and effective for conservation work.
π± Open Sourcing Samba and Engaging the Community
The sixth paragraph discusses the process of making Samba open source and the importance of engaging with the community. Katie emphasizes the iterative process of development and delivery, ensuring that technical advancements are accessible to end-users. The paragraph also explores the concept of the 'missing middle' of model implementation, the importance of clear documentation, and the role of citizen scientists in the initial labeling for training the Samba models.
π Optimizing Samba for Efficiency and Accessibility
This paragraph explores the optimization techniques used in Samba, particularly the frame selection process that allows the tool to handle video data efficiently. It also touches on the availability of an archive of models and the ability for users to share custom models through a model zoo on GitHub. The paragraph emphasizes the importance of making the tool as accessible as possible, including for users with lower bandwidth settings.
π Showcasing Zamba and Inviting Contributions
The eighth paragraph serves as a call to action, inviting contributions to the Zamba project. It discusses the importance of good open-source practices, such as clear ownership, a code of conduct, and community governance. The paragraph also mentions the development of new capabilities in Zamba, funded by a grant, and the anticipation of new opportunities for contributors with the upcoming release of Zamba version 3.
π Driven Data's Initiatives and Upcoming Events
The ninth paragraph highlights other projects by Driven Data, such as SCI-FI, which detects harmful algal blooms, and mentions their participation in competitions and events. Katie Wetstone shares upcoming talks and events, emphasizing the importance of ethics in AI and the role of open-source projects in addressing global challenges. The paragraph concludes with an invitation to connect with Driven Data and engage in their initiatives.
π Final Thoughts and Earth Day Acknowledgments
The final paragraph wraps up the discussion on Project Samba and the work of the Digital Public Goods Alliance. It emphasizes the importance of Earth Day and the role of open-source technology in promoting environmental awareness and sustainability. The paragraph also provides information on how to get involved with Driven Data's projects and competitions, encouraging the audience to continue the conversation and contribute to their initiatives.
Mindmap
Keywords
GitHub
Digital Public Goods (DPGs)
Sustainable Development Goals (SDGs)
Machine Learning
Project Samba
Open Source
Earth Day
Code Spaces
Automated Testing
Secret Scanning
For Good First Issue
Highlights
GitHub is more than just a code repository; it can streamline the entire development lifecycle with tools like GitHub Projects, GitHub Codespaces, GitHub Copilot, and GitHub Actions.
Mona, a developer in the story, uses GitHub's customizable features to manage tasks and track work iterations efficiently.
GitHub's Earth Day Celebration focuses on environmental protection and promoting sustainable practices globally.
Project SVA, introduced during the celebration, is an innovative tool that uses Python, machine learning, and computer vision to automate video data analysis.
Ricardo Miron, CTO at the Digital Public Goods Alliance, discusses the importance of open-source technology in achieving sustainable development goals.
Digital Public Goods (DPGs) are a special category of open-source that includes software, AI models, open datasets, and content collections, designed to adhere to the UN's principles.
The DPG standard consists of nine indicators or criteria that any solution must meet to be recognized as a digital public good.
The Digital Public Goods Alliance has over 40 members and 160 DPGs, working to increase the sustainability and desirability of these goods.
Katie Wetstone from Driven Data explains how Project Zamba automates the analysis of camera trap videos to support conservation efforts.
Zamba uses pre-trained models to detect different animal species and can be customized for specific ecosystems.
The tool can estimate the distance between a camera and an animal, crucial for population size estimation.
Zamba's web application allows non-programmers to use the tool through a user-friendly interface without writing code.
Contributions to Zamba and other digital public goods can be made through campaigns and projects run by the Digital Public Goods Alliance in collaboration with GitHub.
The 'For Good First Issue' tool by GitHub helps developers, especially beginners, to contribute to open-source projects by highlighting suitable issues.
Zamba's development was inspired by a need expressed by conservationists for an automated tool to process extensive camera trap footage.
The tool optimizes video analysis by selecting key frames from hours of footage, making the process more efficient and less computationally expensive.
Zamba's technical approach uses a combination of models, including Mega detector and YOLO X, to identify and analyze animal movements in videos.
The project encourages community contributions and has a 'good first issue' tag on GitHub to help newcomers start contributing.