Daily AI Digest — 2026-05-23

Published

May 23, 2026

Hacker News Signals

A blueprint for formal verification of Apple corecrypto

Apple’s security engineering blog describes their approach to formally verifying corecrypto, the cryptographic library underpinning iOS, macOS, and related platforms. The core challenge: corecrypto is written in C and assembly, targets multiple architectures, and must be verified against both functional correctness and security properties (e.g., absence of timing side-channels, correct constant-time behavior).

The effort uses a combination of tools. SAW (Software Analysis Workbench) is used to verify C and LLVM bitcode against cryptographic specifications written in Cryptol, a domain-specific language for expressing bit-level cryptographic algorithms. For assembly-level verification, they use a custom framework that models the semantics of AArch64 and x86-64 instructions directly, allowing proofs to cover the optimized assembly paths that performance-critical code actually executes — not just the C source.

The verification pipeline roughly looks like: write a Cryptol spec for the algorithm (e.g., AES-GCM, HMAC-SHA2), compile the C implementation to LLVM IR, use SAW’s symbolic execution to generate proof obligations, then discharge those obligations with SMT solvers (primarily Z3 and ABC). Assembly verification requires additional lifting steps to get assembly into a form amenable to symbolic reasoning.

Key properties targeted include: cryptographic functional equivalence (the implementation matches the spec on all inputs), memory safety properties, and constant-time execution (critical for side-channel resistance). Constant-time proofs require modeling the microarchitectural observation model — which operations produce observable timing variation.

They acknowledge scale challenges: full verification of every primitive is not yet achieved, and proof maintenance against code changes is ongoing engineering work. The assembly verification in particular requires significant manual scaffolding per architecture.

This is a notable industrial deployment of formal methods beyond toy examples — covering production cryptographic code with real security stakes.

Source: https://security.apple.com/blog/formal-verification-corecrypto/

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

The paper addresses a concrete performance problem: transformer inference involves many small operations (layer norm, attention masking, activation functions, residual adds) that are memory-bandwidth-bound when executed as separate GPU kernels. CODA proposes representing these operations as epilogue programs fused directly into the output stage of matrix multiplication (GEMM) kernels, eliminating the separate kernel launches and the associated global memory round-trips.

The key insight is that CUTLASS and cuBLAS already expose “epilogue” hooks — computations applied to the GEMM output tile while it is still in shared memory or registers before being written back to DRAM. CODA formalizes a representation where the entire surrounding transformer computation (normalization, bias adds, nonlinearities, residual connections) is compiled into a sequence of epilogue instructions. This is essentially a small stack-based program that executes per output tile of the GEMM.

The compiler component takes a subgraph of the transformer’s computation DAG, identifies which nodes can be expressed as element-wise or reduction operations over GEMM output tiles, and generates the corresponding CUTLASS epilogue visitor tree. Reductions that span tiles (e.g., layer norm statistics) require a two-pass approach: the first GEMM pass computes partial statistics into shared memory, and the second normalizes using those statistics — both within the same kernel launch.

Results show significant speedups on A100 for standard transformer blocks: they report roughly 1.5-2x speedup on the combined attention + FFN computation versus PyTorch eager mode, and favorable comparisons against FlashAttention-2 on specific configurations. Compilation overhead is manageable due to CUTLASS’s template-based code generation.

Limitations: the approach is tightly coupled to CUTLASS’s epilogue API, making portability to non-NVIDIA hardware non-trivial. Very large models where GEMM is already compute-bound will see smaller gains.

Source: https://arxiv.org/abs/2605.19269

Multi-Stream LLMs: parallelizing prompts, thinking, and I/O

Standard autoregressive LLM inference processes everything sequentially: the system prompt, any chain-of-thought reasoning, tool call outputs, and generation all run in a single token stream. This paper proposes a multi-stream architecture where distinct logical components of the context — the fixed system prompt, the model’s internal reasoning (“thinking”), and external I/O results — are maintained as separate KV-cache streams that are processed concurrently where dependencies permit.

The dependency structure is the key technical contribution. A reasoning stream can begin processing the system prompt and partial user input immediately, while an I/O stream handles tool calls or retrieval asynchronously. The streams share attention only at explicitly defined synchronization points, implemented via masked attention: tokens in stream i attend to tokens in stream j only if a directed dependency edge i \to j exists in the stream graph. Formally, the attention mask M \in \{0,1\}^{T \times T} is defined such that M_{t,t'} = 1 iff token t’s stream has a dependency on token t'’s stream or t and t' share a stream.

This enables wall-clock latency reduction: the reasoning stream can prefill and begin generating while waiting for I/O, avoiding idle GPU cycles. The paper reports that on agentic benchmarks involving tool use, end-to-end latency drops by roughly 30-40% compared to sequential execution, with no degradation in task accuracy, because the model can interleave thinking with waiting rather than blocking.

Training requires multi-stream attention masks during fine-tuning or pretraining; the authors describe a data construction procedure for generating multi-stream training examples from existing chain-of-thought datasets.

Open question: how to handle cases where early reasoning conclusions must be revised after I/O results arrive — the paper’s synchronization model requires careful stream graph design to avoid stale reasoning.

Source: https://arxiv.org/abs/2605.12460

Reverse-engineering Docker Sandbox’s undocumented MicroVM API

Rivet’s engineering blog documents their reverse-engineering of the API that Docker’s sandbox product (used in Docker Desktop and related tooling) uses internally to manage Firecracker-based microVMs. Docker does not publicly document this API, but Rivet needed it to integrate microVM lifecycle management into their own platform.

The methodology is standard but well-executed: intercept HTTP traffic between the Docker Desktop frontend and its local daemon using a proxy, observe the REST API calls made during microVM create/start/stop/destroy operations, and infer the schema from the JSON payloads. They supplemented this with binary inspection of the daemon executable to identify endpoint routes and cross-reference with Firecracker’s own documented VMM API (which the Docker layer wraps).

Key findings: the internal API is a thin REST wrapper over Firecracker’s VMM API, with Docker-specific extensions for volume mounts and network namespace management. The microVM boot configuration exposes kernel boot args, root filesystem path (as a block device), memory and vCPU counts, and vsock configuration. Networking is handled via CNI plugins invoked outside the microVM API itself.

The post includes concrete curl examples for the full VM lifecycle, which is the practical value — you can provision a Firecracker microVM through Docker’s daemon without using Docker’s own CLI, enabling integration with custom orchestration layers. They note that because this is undocumented, it may break without notice across Docker Desktop versions.

The broader engineering point: Firecracker’s own API is public and well-documented; Docker’s wrapper adds relatively little, making the reverse-engineering straightforward once you know what to look for.

Source: https://rivet.dev/blog/2026-02-04-we-reverse-engineered-docker-sandbox-undocumented-microvm-api/

Models.dev: open-source database of AI model specs, pricing, and capabilities

Models.dev is a structured, open-source data repository (YAML/JSON) cataloging LLM and multimodal model metadata: context window size, input/output modalities, pricing per million tokens (input and output separately), rate limits, provider API identifiers, and capability flags (function calling, vision, streaming support, etc.). The project targets developers building applications that need to programmatically select or compare models.

The technical substance is in the schema design. Each model entry is a typed record with fields for the provider, model ID string (as it appears in the API), context length, pricing tiers (distinguishing prompt/completion costs, and in some cases cached/batch pricing), and a capabilities object. The repository is versioned and intended to be consumed as a dependency — either directly from the JSON files or via a thin wrapper library.

The value proposition over scraping provider documentation pages is freshness maintainability via community PRs and a consistent schema across providers (OpenAI, Anthropic, Google, Mistral, Cohere, and others). Provider docs use inconsistent terminology and update without notice; a community-maintained canonical source reduces per-project scraping logic.

Technically, this is a data engineering problem more than an ML one: the hard part is keeping pricing accurate (providers change it frequently, sometimes mid-month), handling discontinuous pricing tiers (some providers charge differently for prompts under vs. over a certain length), and modeling capability differences between model versions (e.g., gpt-4o vs. gpt-4o-mini differ not just in price but in supported features).

Limitations: no automated sync from provider APIs means data freshness depends entirely on contributor activity. There is currently no provenance field indicating when a price was last verified.

Source: https://github.com/anomalyco/models.dev

The foundations of a provably secure operating system (PSOS) (1979)

This 1979 SRI technical report by Neumann, Robinson, Levitt, Boyer, and others describes the design of PSOS (Provably Secure Operating System), one of the earliest serious attempts to apply formal verification methods to an operating system kernel. The context is the era immediately following Bell-LaPadula and the early DARPA-funded formal methods push — the same period that produced the Kernel design at UCLA and the formalization work on the Kernelized Secure Operating System.

PSOS is organized around a strict hierarchical layering of abstract machine levels, each defined by a formal specification in a precursor notation to what would later become languages like Z or VDM. Each level introduces new objects and operations on those objects, with the security policy (mandatory access control based on Bell-LaPadula confidentiality labels) enforced at the hardware/firmware interface and preserved by proof obligations between adjacent layers.

The key technical ideas: capability-based addressing for all object references (eliminating ambient authority), a type system enforced at the hardware level that prevents capability forgery, and a proof methodology where each layer’s implementation is shown to correctly implement its formal specification relative to the layer below. The verification approach predates modern tools like Coq or Isabelle — proofs were conducted semi-formally, with machine-checked components done using Boyer-Moore (early NQTHM).

What makes it historically significant: PSOS explicitly separates the security policy specification from the mechanism, and recognizes that verifying the mechanism implements the policy requires both a formal policy model and a formal system model. This decomposition is still the standard architecture for systems like seL4 decades later.

The report is candid about incompleteness — full machine-checked proofs were not achieved — but the conceptual framework it establishes remains influential.

Source: http://www.csl.sri.com/users/neumann/psos.pdf

AI has a multiplying effect on existing technical skills

Josh Comeau’s essay, framed around launching a course, makes a substantive engineering argument: AI coding assistants function as force multipliers on existing skill, not as skill substitutes. The core claim is empirical and worth engaging with technically.

The argument proceeds by example. A developer who understands CSS layout can use an LLM to generate complex grid or animation code quickly because they can evaluate the output, identify when the model halts at a wrong local optimum, and provide corrective prompts. A developer who does not understand CSS layout cannot effectively do this — they cannot distinguish a plausible-looking wrong answer from a correct one, and they cannot decompose the problem into prompt-friendly subproblems.

This maps onto a known property of LLM code generation: models produce locally coherent code that may be globally wrong (incorrect logic, subtle bugs, violated invariants). Catching these errors requires the domain knowledge the model ostensibly replaces. The multiplication metaphor is apt in the mathematical sense: multiply by zero (no existing skill) and you still get zero useful output, but multiply a large skill base and you get a substantially amplified throughput.

The practical implication for skill development: learning fundamentals remains high-value even in an AI-assisted workflow, possibly more so, because the marginal return on domain knowledge increases when AI handles mechanical production. This runs counter to the common framing that AI reduces the value of learning underlying technology.

The essay is anecdotal rather than empirical, but the mechanism it describes is consistent with how retrieval-augmented and prompt-chained systems actually fail in practice — users without domain knowledge cannot formulate the retrieval queries or intermediate prompts needed to correct model errors.

Source: https://www.joshwcomeau.com/email/wham-launch-005-elephant-2-p/

Slumber: a TUI HTTP client

Slumber is a terminal-based HTTP client written in Rust, positioned as a keyboard-driven alternative to Postman or Insomnia. The technical design centers on a YAML-based request collection format that is version-control friendly — collections are plain text files, diffable and committable alongside code.

The TUI is built using Ratatui (the maintained fork of tui-rs), which provides a retained-mode widget system over crossterm’s raw terminal I/O. The architecture follows a standard Elm-like message-passing model: user input and async HTTP response completions post messages to a central event loop, which drives state updates and re-renders. This sidesteps threading issues in terminal UIs where both keyboard input and network I/O must be multiplexed.

Request definitions support template variables with a {variable} syntax, chaining (extracting values from previous responses via JSONPath or regex and injecting them into subsequent requests), and profiles for environment switching (dev/staging/prod base URLs and auth tokens). Authentication schemes (Bearer, Basic, custom headers) are defined declaratively in the YAML.

The response viewer supports syntax-highlighted JSON/XML rendering, raw body view, and header inspection. Response history is persisted to a local SQLite database, allowing review of prior requests without re-execution — useful for debugging intermittent API behavior.

Compared to curl-based workflows, Slumber trades scriptability for interactivity; compared to GUI clients, it trades visual layout for terminal composability (tmux integration, SSH-remote use). The YAML collection format is the main differentiator over curl: it provides persistence and reuse without requiring a running GUI application.

Currently lacks support for WebSockets and gRPC, which limits applicability for non-HTTP/1.1-REST workflows.

Source: https://slumber.lucaspickering.me

Noteworthy New Repositories

facex-engine/facex

A complete face analysis pipeline compiled entirely to WebAssembly, running client-side with no server round-trips. The stack covers face detection, a 576-point 3D facial mesh, face recognition (embedding-based identity matching), liveness anti-spoofing, and smile detection. All inference runs in-browser via WASM, which eliminates network latency, avoids sending biometric data off-device, and removes backend infrastructure requirements entirely.

The 576-point 3D mesh is notably denser than the 468-point MediaPipe Face Mesh baseline, suggesting either a custom topology or an augmented landmark set for higher-fidelity geometry. Anti-spoofing at this level typically relies on texture analysis or depth cues inferred from the mesh, both of which are plausible given the landmark density. The recognition component presumably produces fixed-dimension embeddings for cosine or L2 identity comparison.

This is a credible choice for privacy-sensitive deployments (healthcare intake forms, local identity verification) where biometric data must not leave the user’s device, or for offline-capable progressive web apps. The Apache 2.0 license removes commercial friction. The main engineering constraint is WASM bundle size and inference latency on low-end mobile hardware — neither of which is benchmarked in the current README.

Source: https://github.com/facex-engine/facex

opensquilla/opensquilla

OpenSquilla is an AI agent framework oriented around token efficiency — the premise being that most agent loops waste context budget on redundant scaffolding, verbose tool outputs, and poorly compressed state. The project targets higher “intelligence density” per token by rethinking how context is allocated across agent steps.

Architecturally, this class of system typically achieves token reduction through aggressive summarization of intermediate results, selective retrieval rather than full-history replay, and structured state representations that compress working memory. Without access to a detailed technical spec, the distinguishing claim is that the same token budget yields more effective reasoning — implying either a more compact prompting discipline, a smarter memory manager, or both.

For practitioners building agents that hit context-window limits or incur high API costs on long-horizon tasks, token efficiency directly translates to cost and capability. Existing frameworks (LangChain, AutoGen, CrewAI) tend to prioritize expressiveness over frugality; a framework that treats token budget as a first-class constraint has a real niche. The 1.5k-star traction in early release suggests the positioning resonates. Worth evaluating against concrete long-horizon benchmarks (e.g., SWE-bench, GAIA) to validate the intelligence-density claim empirically.

Source: https://github.com/opensquilla/opensquilla

chorus-codes/chorus

Chorus implements multi-LLM peer review as a CLI-level wrapper. The workflow: you issue a code-writing or architecture decision command through your existing CLI tool, and Chorus intercepts the output, convenes 2-4 additional LLMs (configurable providers), and synthesizes their critiques before the result is committed. The “bring your own CLI” design means it is composable with Cursor, Aider, or any shell-based coding assistant without forking those tools.

The technical substance is in the aggregation layer. Naive voting across LLMs produces low-signal consensus; the interesting engineering question is how Chorus resolves disagreement — whether by weighted confidence, a meta-LLM judge, or structured diff-based critique synthesis. Multi-model ensembling for code review is known to catch classes of bugs that single-model review misses, particularly on edge cases where one model’s training data is sparse.

This is directly useful for high-stakes code paths where a single model’s blind spots are unacceptable. The cost is additional API latency and token spend proportional to the number of reviewer models. The design is honest about that tradeoff rather than obscuring it. Practically, it occupies the space between “one LLM writes code” and “full multi-agent debate loop,” with minimal workflow disruption.

Source: https://github.com/chorus-codes/chorus

tonbo-io/ursula

Ursula is a distributed event stream server that exposes an HTTP interface and uses S3-compatible object storage as its persistence backend. This positions it in the same architectural family as systems like Kafka Tiered Storage or Streamhouse, but with object storage as the primary (not tiered) store, which dramatically simplifies operational complexity at the cost of latency.

The HTTP-over-S3 design means: no ZooKeeper or Raft cluster to manage, horizontal read scaling via S3’s native replication, and straightforward multi-region durability. The tradeoff is that S3 PUT/GET latency (typically 10-100ms) sets a floor on stream ingestion and consumption latency — making this unsuitable for sub-millisecond event processing but reasonable for analytics pipelines, audit logs, or ML feature pipelines where seconds of lag are acceptable.

The project is from tonbo-io, the same organization behind the Tonbo embedded LSM storage engine in Rust, which suggests the implementation is likely Rust-based and inherits that ecosystem’s performance and correctness guarantees. For teams that want Kafka-like semantics without Kafka’s operational surface area, and whose workloads tolerate object-storage latency, this is a practically interesting alternative. The main open question is how compaction and retention are handled given S3’s lack of native range-delete semantics.

Source: https://github.com/tonbo-io/ursula

perplexityai/bumblebee

Bumblebee is a read-only static scanner that inventories on-disk developer artifacts — installed packages, browser/editor extensions, CLI tools, and IDE plugins — and cross-references them against known software supply-chain compromise records. It does not execute anything; it reads metadata from standard filesystem locations (npm’s node_modules, VS Code’s extension directory, pip’s site-packages, etc.) and emits a report of matched indicators.

The “read-only” constraint is a deliberate trust property: the tool cannot modify state, making it safe to run in CI or on production developer workstations without side-effect risk. The supply-chain compromise database it queries is the key differentiator — its freshness and coverage determine actual detection value, and how that database is maintained (and whether it can be updated offline) is the most important unanswered question.

This fills a practical gap. Existing SCA tools (Snyk, Dependabot, OSV-Scanner) focus on declared dependencies in manifest files; Bumblebee targets actually-installed artifacts including developer tooling that never appears in a package.json. That lateral surface — a compromised VS Code extension or a malicious npm global binary — is precisely the vector used in several high-profile 2023-2024 supply-chain attacks. Coming from Perplexity suggests ongoing maintenance. Suitable as a lightweight complement to heavier SAST/SCA pipelines.

Source: https://github.com/perplexityai/bumblebee

regent-vcs/re_gent

re_gent is a version control system designed specifically for the file-system interaction patterns of AI coding agents. Standard Git is poorly suited to agentic workflows: agents make many rapid, speculative edits across multiple files, often backtrack, and may run parallel exploratory branches simultaneously. Git’s commit-centric model with manual staging is too coarse and too human-oriented for this use pattern.

re_gent likely addresses this by providing finer-grained, automatically captured checkpoints (continuous or event-driven snapshotting rather than explicit commits), structured undo/redo semantics that agents can invoke programmatically, and possibly a branching model optimized for parallel speculative execution. The ability for an agent to atomically roll back a failed refactoring attempt or checkpoint state before running tests is directly valuable.

The broader engineering problem is real: as coding agents gain longer context windows and autonomy over larger codebases, the lack of reliable state management becomes a primary failure mode. Git hooks and worktrees are the current workaround, but they require manual scaffolding per project. A purpose-built VCS that exposes a programmatic API for checkpoint, branch, diff, and restore operations — with agent-friendly semantics — would meaningfully reduce failure recovery costs in agentic coding pipelines. The 592-star early interest is consistent with this being a recognized pain point.

Source: https://github.com/regent-vcs/re_gent

Avarok-Cybersecurity/atlas

Atlas is a pure-Rust neural network inference engine, positioned as a safe, dependency-light alternative to runtimes that wrap C/C++ backends (ONNX Runtime, TensorRT, llama.cpp). Pure-Rust inference engines offer memory safety guarantees by construction, simplified cross-compilation (including to WASM and embedded targets), and no FFI boundary overhead or unsafety.

The implementation likely covers a standard operator set for transformer and feed-forward architectures, with BLAS/SIMD acceleration via Rust crates (e.g., ndarray with BLAS backends, or custom SIMD via std::simd). The cybersecurity provenance of the organization (Avarok builds post-quantum networking tooling) suggests particular attention to correctness and auditability — properties that matter when inference is embedded in security-critical pipelines.

The main engineering tradeoff versus established runtimes is performance: hand-optimized C kernels and GPU backends in TensorRT or llama.cpp will outperform a pure-Rust CPU implementation for large models. Atlas’s niche is smaller models in environments where the Rust toolchain is already the deployment target, FFI is undesirable, or auditability of the inference path is a requirement. Relevant use cases include edge inference on constrained hardware, WASM-based deployment, and security tools that need embedded model inference without a C dependency chain.

Source: https://github.com/Avarok-Cybersecurity/atlas

mkbula/HideMyData

HideMyData is a native macOS application for on-device PII redaction from documents and images. The pipeline combines Apple’s Vision framework for OCR (running entirely on-device via Core ML) with an OpenAI-based privacy filter model to classify and remove sensitive entities — names, addresses, identifiers, financial data — from extracted text before it leaves the machine or is forwarded downstream.

The architectural split is intentional: Vision OCR keeps raw document content local, avoiding the most sensitive exposure point, while the OpenAI filter model handles the semantic classification task. Whether the OpenAI call sends redacted intermediate representations or full text is a critical implementation detail that determines the actual privacy guarantee — this is worth auditing in the source before trusting it with genuinely sensitive documents.

For macOS-native workflows involving legal documents, medical records, or financial statements, this is more ergonomic than scripting a combination of Tesseract and a local NER model. The Vision framework’s on-device OCR quality has improved substantially in recent macOS releases and handles handwriting and mixed layouts well. The main limitation is the OpenAI dependency for the classification step, which introduces a network call and API cost. A fully local alternative using a small on-device NER model (e.g., via Core ML export of a distilled BERT) would strengthen the privacy story considerably.

Source: https://github.com/mkbula/HideMyData