Highlights
- Pro
Data Engineering
Represent, send, store and search multimodal data
Malloy is an experimental language for describing data relationships and transformations.
Transforms PDF, Documents and Images into Enriched Structured Data
BlackJAX is a Bayesian Inference library designed for ease of use, speed and modularity.
Simple, modern and fast file watching and code reload in Python.
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
Tool for probabilistically linking the records of individual entities (e.g. people) within and across datasets
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
A simple and efficient tool to parallelize Pandas operations on all available CPUs
Recipes for using Python's polars library