Building Tool Chains for RISC-V AI Accelerators - Jeremy Bennett, Embecosm

RISC-V International

31 Oct 202418:38

Summary

TLDRIn his presentation, Jeremy Bennett discusses optimizing AI systems by integrating accelerators with neural network frameworks like PyTorch. He emphasizes the importance of a graph-based approach for executing operations efficiently on both CPUs and accelerators. Bennett highlights tools like the Open XLA compiler and OneAPI for enhancing performance and interoperability. He also addresses the challenges of adapting existing CUDA code for broader use and the necessity for collaboration in improving AI technologies. The talk underscores the critical role of community support and innovation in advancing AI capabilities.

Takeaways

😀 Jeremy Bennett discusses the integration of AI systems on various platforms, emphasizing the need for efficient dispatching and execution of operations.
🤖 Neural networks are represented as graphs in frameworks like PyTorch, TensorFlow, and ONNX, with operations being evaluated through a graph evaluator.
⚙️ Optimization in AI systems primarily focuses on parallelism, with the dispatcher being a crucial component for executing operations efficiently.
📊 Apache TVM and Open XLA are tools that can enhance the efficiency of neural network graphs through operator fusion and transformations.
🖥️ The presentation highlights a specific example of a PyTorch RIS 5 co-processor setup, illustrating how to assess and optimize operations on accelerators.
🔄 The use of the Private Use 1 interface in PyTorch allows for interception of operations to test their efficiency on target accelerators.
🏗️ The OneAPI framework, which evolved from OpenCL, provides a hardware abstraction layer for easier implementation across different platforms.
🔍 The talk underscores the importance of prototyping and testing before full implementation, utilizing tools like QEMU for modeling accelerators.
🧑‍🤝‍🧑 Collaboration with customers is essential for refining the toolchain and ensuring compatibility with various AI models.
📅 Resources and tutorials are available for developers looking to implement these methodologies in their own projects, indicating a commitment to community support.

Q & A

What is the primary focus of the presentation?
-The presentation focuses on how to make AI systems work efficiently on various platforms, particularly through the use of neural networks and optimizations in the dispatch process.
What role does the graph evaluator play in an AI system?
-The graph evaluator walks over the neural network graph, taking inputs, generating outputs, and deciding where to execute operations, either on the CPU or an accelerator.
What are some tools mentioned for optimizing neural networks?
-Tools mentioned include Apache TVM, OpenXLA compiler, and the oneAPI interface, which help optimize neural network performance.
How does the presenter describe the optimization process in AI systems?
-The optimization process involves parallelism in the dispatcher and graph-level transformations, focusing first on making the dispatch process efficient.
What technology did they use for prototyping with a customer?
-They used QEMU to model the accelerator and test the efficiency of operations on a simulated environment before having the actual silicon.
What is the significance of the 'private use' interface in PyTorch?
-The private use interface allows for intercepting specific PyTorch operations and redirecting them to alternative implementations, crucial for testing efficiency on target hardware.
What challenges does the presenter identify regarding proprietary solutions?
-The presenter notes that many operations and solutions are proprietary, making it challenging for open-source companies to develop their own implementations.
How does oneAPI relate to OpenCL and other frameworks?
-oneAPI serves as a unified interface for various hardware accelerators, providing a hardware abstraction layer that simplifies development across different platforms.
What future developments does the presenter anticipate for AI acceleration?
-The presenter anticipates improvements in multi-tenancy and more advanced parallelism solutions as the AI industry continues to evolve.
What advice does the presenter give for those interested in implementing these technologies?
-The presenter suggests exploring the available documentation and tutorials for PyTorch, oneAPI, and reaching out for assistance to navigate the complexities of these technologies.