🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Agentic AI Framework for Java Developers
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.
A high-throughput and memory-efficient inference and serving engine for LLMs
common in-memory tensor structure
nanobind: tiny and efficient C++/Python bindings
MSCCL++: A GPU-driven communication stack for scalable AI applications
Fast and memory-efficient exact attention
FlashMLA: Efficient Multi-head Latent Attention Kernels
Development repository for the Triton language and compiler
Fast and memory-efficient exact attention
FlashInfer: Kernel Library for LLM Serving
A modern formatting library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
CUDA Templates and Python DSLs for High-Performance Linear Algebra