Awesome resources for learning more about things relating to Apache Arrow, focussed on the R package arrow.
If you have any suggestions for other resources to add here, please submit a PR!
Key:
👩🏫 Workshop
📄 Blog post
📽️ Video
🎞️ Slides
- "Larger-Than-Memory Data Workflows with Apache Arrow" - UseR! 2022 conference workshop 👩🏫
- "Doing More with Data: An Introduction to Arrow for R Users" by Danielle Navarro 📽️
- "Getting started with Apache Arrow" by Danielle Navarro 📄
- "Efficient Data Analysis on Larger-than-Memory Data with DuckDB and Arrow" by Tom Mock 📽️
- "Bigger data with arrow and duckdb" by Tom Mock & Edgar Ruiz 🎞️
- "New Directions for Apache Arrow" by Wes McKinney 📽️
- "Bigger Data With Ease Using Apache Arrow" by Neal Richardson 📽️
- "Apache Arrow: Enabling Data Engineering Tasks in R" by Ian Cook 📽️
- "Data serialisation in R" by Danielle Navarro 📄
- "Data types in Arrow and R" by Danielle Navarro 📄
- "Arrays and tables in Arrow" by Danielle Navarro 📄
- "Binding Apache Arrow to R" by Danielle Navarro 📄
- "Arrow New Feature Showcase: show_exec_plan()" by Nic Crane 📄
- "Creating an Arrow dataset: An exploration of the file formats that Arrow can read and write." by François Michonneau 📄
- "Creating an Arrow dataset (part 2): How does partitioning impact query performance?" by François Michonneau 📄
- "Understanding the Parquet file format" by Colin Gillespie 📄
- "Folks, C’mon, Use Parquet" by Piotr Storożenko 📄