See also my Google Scholar user page.

Type checking beyond type checkers, via Slice & Run [preprint]

Justus Adam, Stephen Kell in 11th Workshop on Tools for Automatic Program Analysis 2020-11-19

Show abstract:

Type checkers are the most commonly used form of static analysis, but their design is coupled to the rest of the language, making it hard or impossible to bring new kinds of reasoning to existing unmodified code.

We propose a novel approach to checking advanced type invariants and properties in unmodified source code, while retaining the speed and ease of simple syntax directed type checkers. The insight is that by combining a relatively deep program analysis (symbolic execution) with an unusual choice of program abstraction (based on program slicing), it appears possible to reconstitute type-checking as a special case of a more general analysis, which the abstraction reduces mostly to ‘type-level’ reasoning, largely discarding the ‘value-level’ work of the program.

We implement a first simple prototype that demonstrates this idea by checking the safety of generic pointers in C, potentially bringing benefits roughly comparable with Milner’s polymorphism.

Ohua-powered, Semi-transparent UDF’s in the Noria Database [pdf]

Masters Thesis Justus Adam at Technische Universität Dresden 2019-12-12

2-page Extended Abstract as short introduction

Show abstract:

The Structured Query Language (SQL) is the de facto standard for database programming. SQL is a declarative language, letting programmers write short queries easily and allowing for aggressive performance optimizations.

However this style diverges greatly from the widely used sequential programming style. This makes complex queries hard to write and understand [21].

SQL also does not allow for the use of state. Algorithms that use state to achieve efficiency are impossible to implement [12]. State can only be used in so called User Defined Functions (UDF) which come with a significant performance penalty, because the optimizations are no longer available.

In this work we improve upon the integration of imperative User Defined Functions. We use a the parallelizing compiler Ohua to generate query fragments from the UDF’s. These fragments run like native SQL queries and can benefit from database optimizations, such as parallelizing.

The fragments contain operators generated by the compiler, which are state enabled. This allows us to express efficient stateful queries, as in [12].

STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism

Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon in Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell 2019-07-23

Show abstract:

Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflow execution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution.

Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations.

In this paper, we propose a programming model that enables the composition of stateful computations and still exposes opportunities for parallelization. We also introduce smap, a higher-order function that can exploit parallelism in stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic concepts from functional reactive programming can be built on top of our programming model with little effort. We compare these implementations to a state-of-the-art approach using monad-par and LVars to expose parallelism explicitly and reach the same level of performance, showing that our programming model successfully extracts parallelism that is present in an algorithm. Further evaluation shows that smap is expressive enough to implement parallel reductions and our programming model resolves short-comings of the stream-based programming model for current state-of-the-art big data processing systems.

Supporting Fine-grained Dataflow Parallelism in Big Data Systems.

Sebastian Ertel, Justus Adam, Jeronimo Castrillon in PMAM@ PPoPP 2018-02-24

Show abstract:

Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as increased network bandwidth, into a higher throughput. This is particularly the case for applications that have inherent sequential, computationally intensive phases. In this paper, we analyze the data processing cores of state-of-the-art big data systems to find the cause for these scalability problems. We identify design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance. As a proof of concept, we rewrite parts of the Hadoop MapReduce framework in an implicit parallel language that exploits this parallelism without adding code complexity. Our experiments on a data analytics workload show throughput speedups of up to 3.5x.

Compiling for concise code and efficient I/O

Sebastian Ertel, Andrés Goens, Justus Adam, Jeronimo Castrillon in Proceedings of the 27th International Conference on Compiler Construction 2018-02-24

Show abstract:

Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.

Level Graphs

Andrés Goens, Sebastian Ertel, Justus Adam, Jeronimo Castrillon 2018

Show abstract:

Benchmarks are needed in order to test compiler and language-based approaches to optimize concurrency. These have tobe varied, yield reproducible results and allow comparisonbetween different approaches. In this paper, we propose aframework for generating synthetic benchmarks that aims atattaining these goals. Based on generating code from randomgraphs, our framework operates at a high level of abstraction.We test our benchmarking framework with a usecase, wherewe compare three state-of-the-art systems to optimize I/Oconcurrency in microservice-based software architectures.We show how using our benchmarks we can reliably comparebetween approaches, and even between the same approachusing different coding styles.

Control Flow and Side Effects support in a Framework for automatic I/O batching

Bachelors Thesis Justus Adam at Technische Universität Dresden 2017-05-09

Show abstract:

The largest source of latency for many modern systems, particularly services, is the inherent latency of I/O operations. Disk I/O, database access and network bound communication is frequent, especially in data processing systems and distributed service infrastructures. Making I/O efficient is cumbersome, involves asynchronous programming and caching and hinders modularity when attempting to perform batch jobs. There have been a few solutions proposed to automate this, first and foremost Haxl, a Haskell library. We proposed an alternate, improved implementation called Ÿauhau. It leverages dataflow to produce even better performance while being independent from any particular code writing style in a minimal Embedded Domain-Specific Language (EDSL). In this thesis we first explore the techniques used by Ÿauhau to make I/O more efficient. Secondly we identify, explain and solve issues in Ÿauhau arising from control flow. Furthermore we suggest new optimisations for the compiler and implement safety guards for handling side effects.