Daily AI Digest — 2026-05-16

Published

May 16, 2026

English · 日本語

Hacker News Signals

When “idle” isn’t idle: how a Linux kernel optimization became a QUIC bug

Cloudflare details a subtle interaction between the Linux kernel’s TCP/UDP receive coalescing behavior and QUIC’s loss recovery logic that could spiral into connection collapse. The core issue: the kernel’s SO_RCVTIMEO and busy-poll / NAPI mechanisms can batch packet delivery in ways that skew the inter-packet arrival timestamps seen by userspace. QUIC implementations use these timestamps to drive RTT estimation and the pacing/CC feedback loop. When the kernel holds packets during an “idle” coalescing window and then delivers them in a burst, the QUIC stack interprets the burst as a congestion signal, backs off, and — in the worst case — the CC state machine enters a self-reinforcing reduction cycle (the “death spiral”).

The fix required distinguishing between kernel-side coalescing delay and genuine network delay at the socket receive path. Cloudflare patched their QUIC stack to account for the kernel’s SO_TIMESTAMPING hardware/software timestamp delta, stripping artificial queuing delay before feeding timestamps into RTT samples. They also adjusted the idle detection heuristic to not treat post-coalescing bursts as reordering events.

The broader lesson is that QUIC’s design assumption — that userspace owns all timing decisions — breaks when the OS abstracts away packet scheduling. TCP stacks have decades of kernel integration that handles this implicitly; QUIC in userspace re-exposes all of it. This is a known pain point for any high-performance QUIC implementation (see also: recvmmsg batch sizing, GRO interaction). Anyone building a QUIC stack on Linux needs to audit every timestamping path and validate behavior under NAPI polling modes, not just under standard blocking I/O.

Source: https://blog.cloudflare.com/quic-death-spiral-fix/


Bun Rust rewrite: “codebase fails basic miri checks, allows for UB in safe rust”

A GitHub issue filed against Bun’s nascent Rust components alleges that code in the repository contains constructs that violate Rust’s safety guarantees even within safe blocks — specifically, patterns that Miri (the Rust undefined-behavior interpreter) flags as UB. The filed examples include aliased mutable references constructed via raw pointer casts that are then re-wrapped in safe references, violating the stacked-borrows model, and integer-to-pointer transmutes used to smuggle data through type-erased interfaces.

The technical substance: Miri enforces the stacked-borrows and tree-borrows aliasing models, which are stricter than what LLVM will silently miscompile today but represent the intended Rust memory model. Code that passes rustc without unsafe blocks can still be UB if it relies on internal unsafe in library code that assumes the caller upholds invariants. The issue suggests some of Bun’s Rust code is invoking such APIs without upholding those invariants, effectively laundering unsafe through a safe interface.

This matters because Bun is marketed partly on safety relative to its C++ predecessors. If the Rust components fail Miri, the safety argument weakens considerably — Miri failures do not guarantee exploitation, but they mean the compiler’s optimizer is free to produce surprising behavior on legal transformations. The stacked-borrows violations in particular are the class of bug that causes heisenbugs under LTO or PGO.

The comments on the issue are contentious, with maintainers disputing the severity and whether Miri’s experimental tree-borrows model is the right baseline. The core technical disagreement — whether Miri-clean code is a hard requirement or an aspirational goal for a performance-critical runtime — is unresolved upstream.

Source: https://github.com/oven-sh/bun/issues/30719


A 0-click exploit chain for the Pixel 10

Project Zero published a full 0-click exploit chain targeting the Pixel 10, requiring no user interaction. The chain combines three vulnerabilities: a memory corruption bug in the cellular baseband firmware (reachable over-the-air via malformed 5G NAS messages), a privilege escalation from the baseband isolation domain into the application processor via a shared memory interface, and a sandbox escape in the Android media stack triggered by a malformed media metadata packet delivered post-baseband compromise.

The baseband entry point is a heap overflow in the NAS protocol parser triggered by a crafted REGISTRATION ACCEPT message with a malformed PDU session establishment accept IE. The overflow corrupts adjacent heap metadata, enabling controlled write primitives. From there, the researchers abuse a shared DMA buffer interface (present for modem-AP communication) that lacks sufficient bounds checks on the AP side, achieving code execution in a privileged Android process.

The exploit does not require the phone to be in active use — receiving a 5G signal is sufficient for the over-the-air trigger. This is the canonical “0-click” threat model: the attack surface is always on, antenna up.

Technically notable: the baseband ASLR entropy is low (the firmware is large and partially position-dependent), making the heap-shape manipulation more reliable than it would be on a general-purpose OS. Project Zero also documents the exploit’s unreliability against the Pixel 10’s memory tagging extension (MTE) deployment, which caught the corruption in testing configurations — a concrete data point for MTE’s practical efficacy against this bug class.

The post includes full PoC code and patch diffs. Patches shipped in the May 2026 Android security bulletin.

Source: https://projectzero.google/2026/05/pixel-10-exploit.html


I designed a nibble-oriented CPU in Verilog to build a scientific calculator

This project implements a custom 4-bit (nibble) CPU in Verilog targeting an FPGA, with the specific goal of running a scientific calculator. The ISA is nibble-oriented: the data path is 4 bits wide, memory is nibble-addressable, and arithmetic is done via multi-nibble BCD (binary-coded decimal) operations to avoid the decimal-to-binary conversion overhead that plagues fixed-width binary ALUs when displaying human-readable results.

The ALU supports nibble-serial addition with carry propagation, BCD correction (add 6 if nibble sum exceeds 9), and a small set of shift/rotate operations. Multiplication and transcendental functions (sin, log, etc.) are implemented in microcode — the CPU has a two-level control store, with the top level decoding opcodes into microinstruction sequences stored in ROM. This is architecturally similar to early calculator chips (HP’s Nut series, TI’s TMS0100) which were also nibble-serial for exactly this reason.

The Verilog is structured with a pipelined fetch/decode/execute, though the nibble-serial datapath means multi-cycle execution for wide operands is fundamental rather than optional. The repository includes the assembler (written in Python), a simulator for pre-FPGA validation, and the calculator firmware itself in the custom assembly language.

This is a legitimate from-scratch CPU design, not a soft-core instantiation of an existing ISA. The BCD-native approach is an interesting constraint: it trades ALU simplicity for the complexity of BCD arithmetic microcode, which is the historically correct tradeoff for human-interface numeric devices. The project is well-documented with architecture diagrams and ISA reference.

Source: https://github.com/gdevic/FPGA-Calculator


Building ML framework with Rust and Category Theory

This post attempts to use category-theoretic abstractions to structure a transformer implementation in Rust. The central idea is to model neural network layers as morphisms in a category where objects are tensor types (parametrized by shape and dtype), and composition of morphisms corresponds to layer stacking. Functors map between categories of computation graphs and categories of runtime tensor operations.

Concretely, the author defines a Layer trait in Rust that is parameterized over input and output types, with forward as the morphism application. Composition is via a compose combinator that chains layers and enforces type-level shape compatibility — this is the part that benefits from Rust’s type system: shape mismatches become compile errors rather than runtime panics. The attention mechanism is expressed as a natural transformation between functors representing query/key/value projections.

The category theory framing is more pedagogical than operational here. The actual tensor operations bottom out in ndarray or similar, and the categorical abstraction does not automatically yield autodiff — the author notes this and leaves it as future work, which is the crux of whether this approach can scale. Differentiable programming requires either a traced computation graph (monad-like structure over operations) or a dual-number approach; neither maps cleanly onto the simple functor composition described.

The interesting engineering claim is that the type-level shape tracking catches a real class of bugs at compile time with zero runtime overhead, and that the categorical composition laws (associativity of compose) give a principled basis for graph optimization passes. Whether category theory is doing necessary work here versus being a framing for well-known type-driven design patterns is left to the reader.

Source: https://hghalebi.github.io/category_theory_transformer_rs/


SQL patterns I use to catch transaction fraud

A practical writeup of SQL-based fraud detection patterns implemented directly in a relational database, without a dedicated ML pipeline. The patterns are window-function heavy and rely on temporal self-joins.

Key techniques described:

Velocity checks: Count transactions per user/card in a rolling window using COUNT(*) OVER (PARTITION BY user_id ORDER BY ts RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW). Thresholding this catches card-testing attacks (many small transactions in a short window).

Geographic impossibility: Self-join on consecutive transactions ordered by timestamp, compute haversine distance between merchant coordinates, divide by time delta to get implied travel speed. Transactions implying >900 km/h flag as impossible simultaneous use.

Merchant category deviation: Per-user baseline the distribution of MCC codes over a 90-day lookback, then flag transactions in MCCs with low historical frequency using a simple z-score on the empirical frequency. This catches account takeovers that shift spending patterns.

Graph-based shared-attribute clustering: Transactions sharing device fingerprint, email, or IP are grouped via recursive CTEs or CONNECT BY (Oracle) / WITH RECURSIVE (Postgres). Clusters with high chargeback rates propagate risk scores to new members — a lightweight graph propagation step in SQL.

The post is honest about limitations: SQL rules are static, require manual threshold tuning, and have no feature learning. They are, however, auditable, fast to deploy, and interpretable for dispute resolution — properties that matter operationally. The author recommends using these as a first-pass filter before any ML scoring, which is standard practice in production fraud systems.

Source: https://analytics.fixelsmith.com/posts/sql-fraud-patterns/


Radicle: Sovereign code forge built on Git

Radicle is a peer-to-peer code collaboration platform where the forge infrastructure (issues, patches, code review) is stored as Git objects and replicated via a gossip protocol, with no central server required. Each repository is identified by a public key (the RID), and all collaboration artifacts — issues, comments, patch proposals — are stored as Git refs under refs/rad/ and signed with the author’s key.

The replication layer uses a custom protocol (over TCP or Tor) where nodes announce and pull refs they are seeding. There is no DHT; discovery is semi-manual (seed nodes, known peers). A “delegate” model handles repository governance: delegates are the public keys authorized to update the canonical branch set, enforced by signature verification at fetch time.

The patch/review workflow mirrors GitHub PRs but is stored entirely as signed Git objects: a patch is a ref pointing to a commit range plus a signed metadata object. Comments and review state are additional signed objects. All of this is content-addressed and append-only in practice (you can’t rewrite another user’s signed object).

The technical tradeoffs are real: without a central index, discovery and search are hard. The gossip-based replication means latency for propagating updates depends on how well-seeded a repository is. The signing model requires key management that most developers are not accustomed to. The implementation is in Rust, uses libgit2 for Git operations, and the protocol is documented in the repository.

This is a credible technical attempt at a decentralized forge, not vaporware — the CLI and seeding infrastructure are functional and in use by a small community.

Source: https://radicle.dev/


Porting 3D Movie Maker to Linux

Microsoft’s 3D Movie Maker (1995) is a children’s 3D animation tool whose source was released on GitHub in 2022. This post documents a full port to Linux, which required addressing several layers of Windows-specific dependencies.

The codebase uses a custom engine called “Brender” for 3D rendering (software rasterizer, predates Direct3D dominance), a proprietary audio system, and a UI toolkit built on MFC-adjacent Win32 abstractions. The port strategy was not Wine but a genuine recompile: replace Win32 API calls with SDL2 for windowing/input, rewrite the audio layer against OpenAL, and stub or reimplement the handful of COM-style interfaces used for plugin loading.

The most technically involved part was the Brender renderer: it uses x86 assembly for inner loops (fixed-point span rasterization, perspective-correct texture mapping without FPU). The author replaced these with portable C equivalents, accepting a performance regression that does not matter on modern hardware but required careful attention to the fixed-point arithmetic semantics — Brender uses a 16.16 fixed-point throughout, and the C replacement had to match overflow/truncation behavior to avoid rendering artifacts.

The Win32 threading model (fibers used as coroutines in a few places) was replaced with ucontext-based coroutines on Linux. The resource format (.3th, .chs files) is entirely custom binary and required no changes — the parser was C and ported cleanly.

The result runs natively on Linux with full functionality. The post is a good case study in porting legacy Windows applications: the graphics and audio layers are always the work, the application logic usually ports mechanically.

Source: https://benstoneonline.com/posts/porting-3d-movie-maker-to-linux/

Noteworthy New Repositories

raindrop-ai/workshop

A framework for writing and executing evaluations of coding agents, designed to close the feedback loop between agent behavior and measurable correctness. The core idea is that coding agents (LLM-driven code-generation loops) are notoriously hard to evaluate without harness infrastructure — workshop provides scaffolding for defining eval tasks, running agents against them inside sandboxed environments, and collecting structured pass/fail metrics. The repository exposes a CLI and a Python API for composing eval suites that can include unit tests, linting checks, and behavioral assertions. Evals are defined declaratively and can be parameterized across model backends, making it straightforward to compare agent performance across providers or prompt strategies. The architecture separates task specification from execution environment, so the same eval definition runs locally or in CI. The tool is particularly relevant for teams iterating on system prompts, tool use policies, or fine-tuned coding models who need reproducible regression baselines rather than ad-hoc vibe checks. Integration with existing test runners (pytest, etc.) means adoption friction is low for projects that already have test infrastructure.

Source: https://github.com/raindrop-ai/workshop


JuliusBrussee/cavemem

Persistent, compressed memory store designed to sit beneath coding assistants (Claude Code, Cursor, Copilot, etc.) and survive across sessions. The problem it solves: LLM coding agents have no memory between invocations, so context about project conventions, past decisions, and in-progress work must be re-injected manually or lost entirely. cavemem stores arbitrary key-value memory entries locally, compresses them (using standard byte-level compression), and exposes a fast retrieval interface so agents can query relevant context at the start of each session. The local-by-default design means no data leaves the machine, which matters for proprietary codebases. Retrieval is structured around semantic or keyword lookup rather than full-context replay, keeping prompt injection size bounded. The implementation is lightweight — no external database dependency — and the storage format is portable across agents that implement the MCP (Model Context Protocol) or compatible tool-call interfaces. The practical use case is preserving architectural decisions, ticket context, and debug history so a resumed session behaves like a continuation rather than a cold start.

Source: https://github.com/JuliusBrussee/cavemem


deeplethe/forkd

A microVM sandbox manager that exploits copy-on-write snapshotting to fork a pre-warmed parent VM in approximately 101 ms. The central insight is that cold-starting a microVM (e.g., Firecracker) for each sandboxed workload is slow — typically seconds — because the guest kernel, runtime, and dependencies all need to initialize. forkd solves this by keeping a parent VM running in a warmed state with dependencies pre-loaded, then forking it on demand using memory snapshot and CoW semantics, similar to how QEMU/KVM snapshots work but optimized for low-latency cloning. Each child VM inherits the parent’s memory pages without copying them until written, so startup cost reduces to the fork syscall overhead plus minimal divergence initialization. The 101 ms figure represents the measured fork-to-ready latency in their benchmark configuration. This architecture is directly applicable to code execution sandboxes (for LLM agents running untrusted code), CI runners, and any multi-tenant workload where isolation is required but cold-start latency is a bottleneck. The project targets Linux with KVM support.

Source: https://github.com/deeplethe/forkd


DioCrafts/OpenFoundry

A self-hosted, open-source alternative to Palantir Foundry covering the core data platform primitives: data source connectors, ontology modeling, pipeline authoring, dashboards, and an AI decision layer. The architecture is built around the concept of an ontology — a typed, relationship-aware schema that sits above raw data sources and provides a unified object model that pipelines and dashboards can reference without re-specifying schema per use case. Pipelines are constructed visually or via configuration and can execute transforms across heterogeneous sources. The AI layer allows attaching LLM-driven inference or decision logic to ontology objects, enabling patterns like automated anomaly flagging or natural-language querying over structured enterprise data. Being self-hosted eliminates the primary objection to Foundry adoption — vendor lock-in and opaque pricing — and makes it accessible to organizations that cannot push sensitive data to Palantir’s cloud. The stack is designed for organizations with data engineering teams who need a governed, lineage-aware platform without building one from scratch.

Source: https://github.com/DioCrafts/OpenFoundry


Evokoa/pgGraph

An extension layer that exposes graph database semantics over an existing PostgreSQL instance without migrating data or adding a separate graph engine. The approach uses PostgreSQL’s recursive CTEs and standard relational tables to represent nodes and edges, then wraps them in a query interface resembling graph query patterns (traversal, pathfinding, neighbor expansion). This sidesteps the operational burden of running Neo4j or similar alongside Postgres for workloads that need occasional graph queries — knowledge graphs, dependency resolution, social graph analytics — while keeping ACID guarantees and existing tooling. The implementation relies on edge and node tables with indexed foreign keys, and the query layer compiles graph traversal operations into optimized recursive SQL. Because it sits on top of Postgres rather than replacing it, existing data models can incrementally adopt graph queries by registering relationships without schema rewrites. The tradeoff relative to a native graph engine is that deep traversals over large graphs will be slower, but for moderate-scale graphs colocated with relational data, the operational simplicity is a significant advantage.

Source: https://github.com/Evokoa/pgGraph


orhun/ratty

A terminal emulator written in Rust that uses GPU rendering (via wgpu or a comparable graphics API) and adds native support for inline 3D graphics alongside standard terminal content. Most GPU-accelerated terminals (Alacritty, Wezterm) use the GPU solely for 2D glyph rasterization and compositing; ratty extends this to allow terminal applications to render 3D geometry inline within the terminal viewport. The technical mechanism involves treating portions of the terminal surface as framebuffer regions that accept 3D draw calls, integrated with the normal text rendering pipeline. This enables use cases like inline data visualization with 3D plots, model geometry inspection, or debugging 3D pipelines without leaving the terminal. The GPU rendering path also provides the standard benefits: sub-millisecond frame times for scrolling and text rendering, hardware-accelerated compositing, and accurate color rendering. Being written in Rust gives it memory safety and low overhead. The project is notable as an exploratory push on what terminal emulators can render natively rather than delegating to external GUI windows.

Source: https://github.com/orhun/ratty


Open-Less/openless

A push-to-talk voice input utility for macOS and Windows that captures speech while a hotkey is held, transcribes it, runs the transcript through an LLM for polish (grammar correction, punctuation, light rephrasing), and injects the result at the cursor position in whichever application is focused. The pipeline is: audio capture -> local or API-backed speech recognition (Whisper-class model) -> LLM post-processing -> simulated keyboard input via OS accessibility APIs. The key design choice is universality via cursor injection: because output is typed into the focused window rather than routed through a clipboard or browser extension, it works in any application — IDEs, email clients, terminals, chat apps — without per-app integration. The open-source nature allows substituting the transcription backend (local Whisper vs. cloud API) and the LLM polishing step (local model vs. hosted). Latency is dominated by transcription and LLM inference; local model paths trade accuracy for privacy and offline capability. The use case is faster prose input for users who dictate faster than they type but want grammatically clean output without manual editing.

Source: https://github.com/Open-Less/openless


Storybloq/storybloq

A cross-session context management system specifically for Claude Code, implemented as a CLI tool, an MCP server, and a /story skill (slash command). The problem: Claude Code sessions are ephemeral — ticket context, architectural decisions, in-progress work, and handover notes evaporate when a session ends. storybloq persists this context in a .story/ directory in the project repository, structured around tickets, issues, handovers, and roadmap entries. The MCP server component means Claude Code can read and write story entries natively through tool calls, so the agent can update its own context log mid-session and retrieve relevant history at session start without manual prompt engineering. Storing context in .story/ as flat files makes it version-controllable alongside code, enabling team handovers where one developer’s session context is readable by a colleague’s subsequent session. The architecture is deliberately minimal — no database, no external service — keeping it auditable and portable. The design is complementary to cavemem (above) but more structured around project management primitives rather than general key-value memory.

Source: https://github.com/Storybloq/storybloq