Pallas TPU# TPU specific documentation. Guides Writing TPU kernels with Pallas What is a TPU? Noteworthy properties and restrictions Supported operations TPU Pipelining TPU and its memory spaces TPU-specific Pipelining Features Matrix Multiplication Background Your first matrix multiplication kernel Matrix multiplication performance Performance of pipelined kernels Templating the matrix multiplication Conclusion Scalar Prefetch and Block-Sparse Computation Dynamic Block Indexing with Scalar Prefetch Example: Block Dynamic Slice with Scalar Prefetch Sparse Kernels: Representing Sparse Data Example: Sparse @ Dense Matrix Multiplication Sparse Access Patterns on Dense Data Example: Dense @ Dense Matrix Multiplication with a Block-Sparse Output Mask Distributed Computing in Pallas for TPUs TPU Topologies Remote Direct Memory Access (RDMA) Model Advanced Techniques Final Notes Pallas Core-specific Programming Environment setup A simple per-core kernel Pipelining with core_map Scalar prefetch Mapping over SparseCores Pseudo-Random Number Generation Using the jax.random API Using the hardware PRNG Block-invariant sampling