Parallel processing at a very large scale, with OCaml.

14/4/2023

Speaker

Louis Roché

Abstract

Ahrefs crawls the entire web 24/7 (much like search engines do) storing petabytes of information about live websites — how they link to each other and what keywords they rank for in search results. With now 10 years of data, extracting useful information to produce meaningful reports is a challenge. We will present how we leverage close to 3000 servers, 200PB of SSDs, 2.5PB of ram, and simple technical solutions with the constraints of a small engineering team. We will also explain why we committed to OCaml as our main tool.

Bio

Louis has been working as an OCaml developer for 8 years. He is involved in a variety of projects in the data processing team as well as tooling, mentoring, and open source projects.