Zero copy data structures
https://www.hytradboi.com/2025/df37d71b-0552-47f9-af36-f53c9ee09f8f-zero-copy-data-structures
Keep an eye on the conference, https://www.hytradboi.com/2025
https://www.hytradboi.com/2025/df37d71b-0552-47f9-af36-f53c9ee09f8f-zero-copy-data-structures
Keep an eye on the conference, https://www.hytradboi.com/2025
Hytradboi
Zero copy data structures
Dear readers, you probably want to be subscribed to these link aggregator channels. Reading them for a few years, heartily recommend.
- https://www.tg-me.com/axisofordinary
- https://www.tg-me.com/j_links
- https://www.tg-me.com/axisofordinary
- https://www.tg-me.com/j_links
The Functional Machine Calculus
Willem Heijltjes
This paper presents the Functional Machine Calculus (FMC) as a simple model of higher-order computation with "reader/writer" effects: higher-order mutable store, input/output, and probabilistic and non-deterministic computation.
The FMC derives from the lambda-calculus by taking the standard operational perspective of a call-by-name stack machine as primary, and introducing two natural generalizations. One, "locations", introduces multiple stacks, which each may represent an effect and so enable effect operators to be encoded into the abstraction and application constructs of the calculus. The second, "sequencing", is known from kappa-calculus and concatenative programming languages, and introduces the imperative notions of "skip" and "sequence". This enables the encoding of reduction strategies, including call-by-value lambda-calculus and monadic constructs.
https://arxiv.org/abs/2212.08177
Willem Heijltjes
This paper presents the Functional Machine Calculus (FMC) as a simple model of higher-order computation with "reader/writer" effects: higher-order mutable store, input/output, and probabilistic and non-deterministic computation.
The FMC derives from the lambda-calculus by taking the standard operational perspective of a call-by-name stack machine as primary, and introducing two natural generalizations. One, "locations", introduces multiple stacks, which each may represent an effect and so enable effect operators to be encoded into the abstraction and application constructs of the calculus. The second, "sequencing", is known from kappa-calculus and concatenative programming languages, and introduces the imperative notions of "skip" and "sequence". This enables the encoding of reduction strategies, including call-by-value lambda-calculus and monadic constructs.
https://arxiv.org/abs/2212.08177
Collapsing Towers of Interpreters
Nada Amin, Tiark Rompf
Given a tower of interpreters, i.e., a sequence of multiple interpreters interpreting one another as input programs, we aim to collapse this tower into a compiler that removes all interpretive overhead and runs in a single pass. In the real world, a use case might be Python code executed by an x86 runtime, on a CPU emulated in a JavaScript VM, running on an ARM CPU. Collapsing such a tower can not only exponentially improve runtime performance, but also enable the use of base-language tools for interpreted programs, e.g., for analysis and verification. In this paper, we lay the foundations in an idealized but realistic setting.
https://dl.acm.org/doi/10.1145/3158140
Nada Amin, Tiark Rompf
Given a tower of interpreters, i.e., a sequence of multiple interpreters interpreting one another as input programs, we aim to collapse this tower into a compiler that removes all interpretive overhead and runs in a single pass. In the real world, a use case might be Python code executed by an x86 runtime, on a CPU emulated in a JavaScript VM, running on an ARM CPU. Collapsing such a tower can not only exponentially improve runtime performance, but also enable the use of base-language tools for interpreted programs, e.g., for analysis and verification. In this paper, we lay the foundations in an idealized but realistic setting.
https://dl.acm.org/doi/10.1145/3158140
Proceedings of the ACM on Programming Languages
Collapsing towers of interpreters | Proceedings of the ACM on Programming Languages
Given a tower of interpreters, i.e., a sequence of multiple interpreters interpreting
one another as input programs, we aim to collapse this tower into a compiler that
removes all interpretive overhead and runs in a single pass. In the real world, a
use ...
one another as input programs, we aim to collapse this tower into a compiler that
removes all interpretive overhead and runs in a single pass. In the real world, a
use ...
Specializing Data Access in a Distributed File System (Generative Pearl)
P. Das, A. Xhebraj, T. Rompf
https://dl.acm.org/doi/10.1145/3689484.3690736
https://www.youtube.com/watch?v=YeWCPRXWIuA
Modern distributed file systems partition and replicate data amongst a cluster of participating nodes to enable processing of larger-than-memory datasets and to provide fault tolerance. Typically, these systems maintain designated coordinator nodes that track the state of the cluster and route user operations to the nodes holding the requested data.
One way to improve performance in this scenario is to provide the client with the cluster topology and data distribution information so that they can bypass the designated coordinator and directly access the data. Click Me Load More the files from the underlying file system instead of having to go through high level APIs can lead to significantly better performance <...>
In this paper, we propose a system, DDLoader, that embeds the information about the data distribution into the language using the Lightweight Modular Staging metaprogramming framework for Scala. Multi-stage programming techniques allow developers to build applications while delegating the task of generating low-level file loading details to the generated code. This hides the implementation details of the underlying distributed file system while retaining the performance of handwritten examples. Further, metaprogramming erases the distinction between the data loading and data processing phases of the applications typically built over distributed file systems, allowing whole program optimization, which was not possible previously.
P. Das, A. Xhebraj, T. Rompf
https://dl.acm.org/doi/10.1145/3689484.3690736
https://www.youtube.com/watch?v=YeWCPRXWIuA
Modern distributed file systems partition and replicate data amongst a cluster of participating nodes to enable processing of larger-than-memory datasets and to provide fault tolerance. Typically, these systems maintain designated coordinator nodes that track the state of the cluster and route user operations to the nodes holding the requested data.
One way to improve performance in this scenario is to provide the client with the cluster topology and data distribution information so that they can bypass the designated coordinator and directly access the data. Click Me Load More the files from the underlying file system instead of having to go through high level APIs can lead to significantly better performance <...>
In this paper, we propose a system, DDLoader, that embeds the information about the data distribution into the language using the Lightweight Modular Staging metaprogramming framework for Scala. Multi-stage programming techniques allow developers to build applications while delegating the task of generating low-level file loading details to the generated code. This hides the implementation details of the underlying distributed file system while retaining the performance of handwritten examples. Further, metaprogramming erases the distinction between the data loading and data processing phases of the applications typically built over distributed file systems, allowing whole program optimization, which was not possible previously.
ACM Conferences
Specializing Data Access in a Distributed File System (Generative Pearl) | Proceedings of the 23rd ACM SIGPLAN International Conference…
FP²: Fully in-Place Functional Programming
Anton Lorenzen, Daan Leijen, Wouter Swierstra
As functional programmers we always face a dilemma: should we write purely functional code, or sacrifice purity for efficiency and resort to in-place updates? This paper identifies precisely when we can have the best of both worlds: a wide class of purely functional programs can be executed safely using in-place updates without requiring allocation, provided their arguments are not shared elsewhere.
We describe a linear _fully in-place_ (FIP) calculus where we prove that we can always execute such functions in a way that requires no (de)allocation and uses constant stack space. Of course, such a calculus is only relevant if we can express interesting algorithms; we provide numerous examples of in-place functions on datastructures such as splay trees or finger trees, together with in-place versions of merge sort and quick sort.
We also show how we can generically derive a map function over _any_ polynomial data type that is fully in-place. Finally, we have implemented the rules of the FIP calculus in the Koka language. Using the Perceus reference counting garbage collection, this implementation dynamically executes FIP functions in-place whenever possible.
https://dl.acm.org/doi/abs/10.1145/3607840
Anton Lorenzen, Daan Leijen, Wouter Swierstra
As functional programmers we always face a dilemma: should we write purely functional code, or sacrifice purity for efficiency and resort to in-place updates? This paper identifies precisely when we can have the best of both worlds: a wide class of purely functional programs can be executed safely using in-place updates without requiring allocation, provided their arguments are not shared elsewhere.
We describe a linear _fully in-place_ (FIP) calculus where we prove that we can always execute such functions in a way that requires no (de)allocation and uses constant stack space. Of course, such a calculus is only relevant if we can express interesting algorithms; we provide numerous examples of in-place functions on datastructures such as splay trees or finger trees, together with in-place versions of merge sort and quick sort.
We also show how we can generically derive a map function over _any_ polynomial data type that is fully in-place. Finally, we have implemented the rules of the FIP calculus in the Koka language. Using the Perceus reference counting garbage collection, this implementation dynamically executes FIP functions in-place whenever possible.
https://dl.acm.org/doi/abs/10.1145/3607840
Proceedings of the ACM on Programming Languages
FP²: Fully in-Place Functional Programming | Proceedings of the ACM on Programming Languages
As functional programmers we always face a dilemma: should we write purely functional
code, or sacrifice purity for efficiency and resort to in-place updates? This paper
identifies precisely when we can have the best of both worlds: a wide class of ...
code, or sacrifice purity for efficiency and resort to in-place updates? This paper
identifies precisely when we can have the best of both worlds: a wide class of ...
Denotational design with type class morphisms
Conal Elliott
Type classes provide a mechanism for varied implementations of standard interfaces. Many of these interfaces are founded in mathematical tradition and so have regularity not only of types but also of properties (laws) that must hold. Types and properties give strong guidance to the library implementor, while leaving freedom as well. Some of this remaining freedom is in how the implementation works, and some is in what it accomplishes.
To give additional guidance to the what, without impinging on the how, this paper proposes a principle of type class morphisms (TCMs), which further refines the compositional style of denotational semantics. The TCM idea is simply that the instance’s meaning is the meaning’s instance. This principle determines the meaning of each type class instance, and hence defines correctness of implementation. It also serves to transfer laws about a type’s semantic model, such as the class laws, to hold for the type itself. In some cases, it also provides a systematic guide to implementation, and in some cases, valuable design feedback.
The paper is illustrated with several examples of types, meanings, and morphisms.
http://conal.net/papers/type-class-morphisms/
Conal Elliott
Type classes provide a mechanism for varied implementations of standard interfaces. Many of these interfaces are founded in mathematical tradition and so have regularity not only of types but also of properties (laws) that must hold. Types and properties give strong guidance to the library implementor, while leaving freedom as well. Some of this remaining freedom is in how the implementation works, and some is in what it accomplishes.
To give additional guidance to the what, without impinging on the how, this paper proposes a principle of type class morphisms (TCMs), which further refines the compositional style of denotational semantics. The TCM idea is simply that the instance’s meaning is the meaning’s instance. This principle determines the meaning of each type class instance, and hence defines correctness of implementation. It also serves to transfer laws about a type’s semantic model, such as the class laws, to hold for the type itself. In some cases, it also provides a systematic guide to implementation, and in some cases, valuable design feedback.
The paper is illustrated with several examples of types, meanings, and morphisms.
http://conal.net/papers/type-class-morphisms/
conal.net
Denotational design with type class morphisms
https://www.youtube.com/watch?v=g3qd4zpm1LA
A brief overview of two new features for "OCaml 6.0" that Jane Street is finalizing.
The first one is unboxed types with "levity polymorphism" (more or less) and stack allocation.
Safe stack allocation requires tracking values that don't escape scope (and scope inference) which largely amounts to linearity. This leads to the second feature: race-free parallel (concurrent) programming.
Apparently, it requires tracking more properties than just linearity, but once you've added kinds and modes to your type system, it's hard to stop half-way... 😁
A brief overview of two new features for "OCaml 6.0" that Jane Street is finalizing.
The first one is unboxed types with "levity polymorphism" (more or less) and stack allocation.
Safe stack allocation requires tracking values that don't escape scope (and scope inference) which largely amounts to linearity. This leads to the second feature: race-free parallel (concurrent) programming.
Apparently, it requires tracking more properties than just linearity, but once you've added kinds and modes to your type system, it's hard to stop half-way... 😁
YouTube
Making OCaml Safe for Performance Engineering
Jane Street is a trading firm that uses a variety of high-performance systems built in OCaml to provide liquidity to financial markets worldwide. Over the last couple of years, we have started developing major extensions to OCaml’s type system, with the primary…
Harnessing the Universal Geometry of Embeddings
We present the first method to translate text embeddings across different spaces without any paired data or encoders.
Our method, vec2vec, reveals that all encoders—regardless of architecture or training data—learn nearly the same representations!
We demonstrate how to translate between these black-box embeddings without any paired data, maintaining high fidelity.
Using vec2vec, we show that vector databases reveal (almost) as much as their inputs.
Given just vectors (e.g., from a compromised vector database), we show that an adversary can extract sensitive information (e.g., PII) about the underlying text.
Strong Platonic Representation Hypothesis (S-PRH)
We thus strengthen Huh et al.'s PRH to say:
The universal latent structure of text representations can be learned and harnessed to translate representations from one space to another without any paired data or encoders.
https://arxiv.org/abs/2505.12540
We present the first method to translate text embeddings across different spaces without any paired data or encoders.
Our method, vec2vec, reveals that all encoders—regardless of architecture or training data—learn nearly the same representations!
We demonstrate how to translate between these black-box embeddings without any paired data, maintaining high fidelity.
Using vec2vec, we show that vector databases reveal (almost) as much as their inputs.
Given just vectors (e.g., from a compromised vector database), we show that an adversary can extract sensitive information (e.g., PII) about the underlying text.
Strong Platonic Representation Hypothesis (S-PRH)
We thus strengthen Huh et al.'s PRH to say:
The universal latent structure of text representations can be learned and harnessed to translate representations from one space to another without any paired data or encoders.
https://arxiv.org/abs/2505.12540
arXiv.org
Harnessing the Universal Geometry of Embeddings
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach...
Unpublished lecture notes in a secret place. All those moments will be lost in time, like tears in rain. Enjoy!
https://www.cs.cmu.edu/~rwh/courses/atpl/pdfs/
https://www.cs.cmu.edu/~rwh/courses/atpl/pdfs/
Scalable Pattern Matching in Computation Graphs
Luca Mondada
Pablo Andrés-Martínez
Graph rewriting is a popular tool for the optimisation and modification of graph expressions in domains such as compilers, machine learning and quantum computing. The underlying data structures are often port graphs—graphs with labels at edge endpoints. A pre-requisite for graph rewriting is the ability to find graph patterns. We propose a new solution to pattern matching in port graphs. Its novelty lies in the use of a pre-computed data structure that makes the pattern matching runtime complexity independent of the number of patterns. This offers a significant advantage over existing solutions for use cases with large sets of small patterns.
Our approach is particularly well-suited for quantum superoptimisation. We provide an implementation and benchmarks showing that our algorithm offers a 20x speedup over current implementations on a dataset of 10000 real world patterns describing quantum circuits.
https://arxiv.org/abs/2402.13065
Luca Mondada
Pablo Andrés-Martínez
Graph rewriting is a popular tool for the optimisation and modification of graph expressions in domains such as compilers, machine learning and quantum computing. The underlying data structures are often port graphs—graphs with labels at edge endpoints. A pre-requisite for graph rewriting is the ability to find graph patterns. We propose a new solution to pattern matching in port graphs. Its novelty lies in the use of a pre-computed data structure that makes the pattern matching runtime complexity independent of the number of patterns. This offers a significant advantage over existing solutions for use cases with large sets of small patterns.
Our approach is particularly well-suited for quantum superoptimisation. We provide an implementation and benchmarks showing that our algorithm offers a 20x speedup over current implementations on a dataset of 10000 real world patterns describing quantum circuits.
https://arxiv.org/abs/2402.13065
arXiv.org
Scalable Pattern Matching in Computation Graphs
Graph rewriting is a popular tool for the optimisation and modification of graph expressions in domains such as compilers, machine learning and quantum computing. The underlying data structures...
PONV Daily
https://fixupx.com/PLDI/status/1937100088960246054
1. Destabilizing Iris (https://doi.org/10.1145/3729284)
2. Tree Borrows (https://doi.org/10.1145/3735592)
3. Principal Type Inference under a Prefix: A Fresh Look at Static Overloading (https://doi.org/10.1145/3729308)
4. Verifying General-Purpose RCU for Reclamation in Relaxed Memory Separation Logic (https://doi.org/10.1145/3729246)
5. Practical Type Inference with Levels (https://doi.org/10.1145/3729338)
6. AWDIT: An Optimal Weak Database Isolation Tester (https://doi.org/10.1145/3742465)
2. Tree Borrows (https://doi.org/10.1145/3735592)
3. Principal Type Inference under a Prefix: A Fresh Look at Static Overloading (https://doi.org/10.1145/3729308)
4. Verifying General-Purpose RCU for Reclamation in Relaxed Memory Separation Logic (https://doi.org/10.1145/3729246)
5. Practical Type Inference with Levels (https://doi.org/10.1145/3729338)
6. AWDIT: An Optimal Weak Database Isolation Tester (https://doi.org/10.1145/3742465)
Proceedings of the ACM on Programming Languages
Destabilizing Iris | Proceedings of the ACM on Programming Languages
The separation logic framework Iris has been built on the premise that all assertions
are stable, meaning they unconditionally enjoy the famous frame rule. This gives Iris—and the numerous program logics that build on it—very modular reasoning
...
are stable, meaning they unconditionally enjoy the famous frame rule. This gives Iris—and the numerous program logics that build on it—very modular reasoning
...