Publications

See also my Google Scholar user page

2025-07-09: Paralegal: Practical Static Analysis for Privacy Bugs [poster] [artifact] [preprint]

Justus Adam, Carolyn Zech, Livia Zhu, Sreshtaa Rajesh, Nathan Harbison, Mithi Jethwa, Will Crichton, Malte Schwarzkopf, Shriram Krishnamurthi in 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI'25)
Show abstract:
Finding privacy bugs in software today usually requires onerous manual audits. Code analysis tools could help, but existing tools aren't sufficiently practical and ergonomic to be used.

Paralegal is a static analysis tool to find privacy bugs in Rust programs. Key to Paralegal's practicality is its distribution of work between the program analyzer, privacy engineers, and application developers. Privacy engineers express a high-level privacy policy over markers, which application developers then apply to source code entities. Paralegal extracts a Program Dependence Graph (PDG) from the program, leveraging Rust's ownership type system to model the behavior of library code. Paralegal augments the PDG with the developers' markers and checks privacy policies against the marked PDG.

In an evaluation on eight real-world applications, Paralegal found real privacy bugs, including two previously unknown ones. Paralegal supports a broader range of policies than information flow control (IFC) and CodeQL, a widely-used code analysis engine. Paralegal is fast enough to deploy interactively, and its markers are easy to maintain as code evolves.
2023-07-10: K9db: Privacy-Compliant Storage For Web Applications By Construction

Kinan Dak Albab, Ishan Sharma, Justus Adam, Benjamin Klimnik, Aaron Jeyaraj, Raj Paul, Artem Agvanian, Leonhard Spiegelberg, Malte Schwarzkopf in 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23)
Show abstract:
Data privacy laws like the EU's GDPR grant users new rights, such as the right to request access to and deletion of their data. Manual compliance with these requests is error-prone and imposes costly burdens especially on smaller organizations, as non-compliance risks steep fines.

K9db is a new, MySQL-compatible database that complies with privacy laws by construction. The key idea is to make the data ownership and sharing semantics explicit in the storage system. This requires K9db to capture and enforce applications' complex data ownership and sharing semantics, but in exchange simplifies privacy compliance. Using a small set of schema annotations, K9db infers storage organization, generates procedures for data retrieval and deletion, and reports compliance errors if an application risks violating the GDPR.

Our K9db prototype successfully expresses the data sharing semantics of real web applications, and guides developers to getting privacy compliance right. K9db also matches or exceeds the performance of existing storage systems, at the cost of a modest increase in state size.
2020-11-19: Type checking beyond type checkers, via Slice & Run [preprint]

Justus Adam, Stephen Kell in 11th Workshop on Tools for Automatic Program Analysis
Show abstract:
Type checkers are the most commonly used form of static analysis, but their design is coupled to the rest of the language, making it hard or impossible to bring new kinds of reasoning to existing unmodified code.

We propose a novel approach to checking advanced type invariants and properties in unmodified source code, while retaining the speed and ease of simple syntax directed type checkers. The insight is that by combining a relatively deep program analysis (symbolic execution) with an unusual choice of program abstraction (based on program slicing), it appears possible to reconstitute type-checking as a special case of a more general analysis, which the abstraction reduces mostly to 'type-level' reasoning, largely discarding the 'value-level' work of the program.

We implement a first simple prototype that demonstrates this idea by checking the safety of generic pointers in C, potentially bringing benefits roughly comparable with Milner's polymorphism.
2019-12-12: Ohua-powered, Semi-transparent UDF's in the Noria Database [pdf] [extended abstract]

Masters Thesis Justus Adam at Technische Universität Dresden
Show abstract:
The Structured Query Language (SQL) is the de facto standard for database programming. SQL is a declarative language, letting programmers write short queries easily and allowing for aggressive performance optimizations.

However this style diverges greatly from the widely used sequential programming style. This makes complex queries hard to write and understand [21].

SQL also does not allow for the use of state. Algorithms that use state to achieve efficiency are impossible to implement [12]. State can only be used in so called User Defined Functions (UDF) which come with a significant performance penalty, because the optimizations are no longer available.

In this work we improve upon the integration of imperative User Defined Functions. We use a the parallelizing compiler Ohua to generate query fragments from the UDF's. These fragments run like native SQL queries and can benefit from database optimizations, such as parallelizing.

The fragments contain operators generated by the compiler, which are state enabled. This allows us to express efficient stateful queries, as in [12].
2019-07-23: STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism

Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon in Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell
Show abstract:
Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflow execution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution.

Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations.

In this paper, we propose a programming model that enables the composition of stateful computations and still exposes opportunities for parallelization. We also introduce smap, a higher-order function that can exploit parallelism in stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic concepts from functional reactive programming can be built on top of our programming model with little effort. We compare these implementations to a state-of-the-art approach using monad-par and LVars to expose parallelism explicitly and reach the same level of performance, showing that our programming model successfully extracts parallelism that is present in an algorithm. Further evaluation shows that smap is expressive enough to implement parallel reductions and our programming model resolves short-comings of the stream-based programming model for current state-of-the-art big data processing systems.
2018-02-24: Supporting Fine-grained Dataflow Parallelism in Big Data Systems.

Sebastian Ertel, Justus Adam, Jeronimo Castrillon in PMAM@ PPoPP
Show abstract:
Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as increased network bandwidth, into a higher throughput. This is particularly the case for applications that have inherent sequential, computationally intensive phases. In this paper, we analyze the data processing cores of state-of-the-art big data systems to find the cause for these scalability problems. We identify design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance. As a proof of concept, we rewrite parts of the Hadoop MapReduce framework in an implicit parallel language that exploits this parallelism without adding code complexity. Our experiments on a data analytics workload show throughput speedups of up to 3.5x.
2018-02-24: Compiling for concise code and efficient I/O

Sebastian Ertel, Andrés Goens, Justus Adam, Jeronimo Castrillon in Proceedings of the 27th International Conference on Compiler Construction
Show abstract:
Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.
2018: Level Graphs

Andrés Goens, Sebastian Ertel, Justus Adam, Jeronimo Castrillon
Show abstract:
Benchmarks are needed in order to test compiler and language-based approaches to optimize concurrency. These have tobe varied, yield reproducible results and allow comparisonbetween different approaches. In this paper, we propose aframework for generating synthetic benchmarks that aims atattaining these goals. Based on generating code from randomgraphs, our framework operates at a high level of abstraction.We test our benchmarking framework with a usecase, wherewe compare three state-of-the-art systems to optimize I/Oconcurrency in microservice-based software architectures.We show how using our benchmarks we can reliably comparebetween approaches, and even between the same approachusing different coding styles.
2017-05-09: Control Flow and Side Effects support in a Framework for automatic I/O batching

Bachelors Thesis Justus Adam at Technische Universität Dresden
Show abstract:
The largest source of latency for many modern systems, particularly services, is the inherent latency of I/O operations. Disk I/O, database access and network bound communication is frequent, especially in data processing systems and distributed service infrastructures. Making I/O efficient is cumbersome, involves asynchronous programming and caching and hinders modularity when attempting to perform batch jobs. There have been a few solutions proposed to automate this, first and foremost Haxl, a Haskell library. We proposed an alternate, improved implementation called Ÿauhau. It leverages dataflow to produce even better performance while being independent from any particular code writing style in a minimal Embedded Domain-Specific Language (EDSL). In this thesis we first explore the techniques used by Ÿauhau to make I/O more efficient. Secondly we identify, explain and solve issues in Ÿauhau arising from control flow. Furthermore we suggest new optimisations for the compiler and implement safety guards for handling side effects.