Hi

I’m a computational biologist with a passion for biology and nature. My interests lie at the intersection of genomics and data science, and I’m excited to share some of my projects here. I write about things I learn and find interesting. This blog is my way of getting some hands on time with a wider range of topics.

Immich - ML supported tagging plugin

For the last decade or so, I used Seafile, Owncloud and then Nextcloud to self-host my data on a small homeserver. This has worked wonderfully, and I have nothing but respect for the community that built these wonderful and powerful tools. But one thing that never worked as smoothly as I wanted it to was the photo upload from my smartphone to Nextcloud. The upload works, and it rarely fails, but it’s never instant....

December 12, 2025 · 8 min

MHC and Viruses - Molecular Mimicry

I saw this article, “Molecular mimicry as a mechanism of viral immune evasion and autoimmunity”, and I got immediately interested in reproducing Figure 1b. In there, the authors investigate peptide similarity between viruses and the human proteome. They say certain viruses might have adjusted their peptide use to match peptides found in the human proteome, so that they can evade the MHC recognition. And this is a biologically cool mechanism, and the method they used is rather simple....

November 30, 2025 · 19 min

Face Detection with Python

In the past I have explored what I can do with image embeddings and used it to train a very usable set of classifiers that sort out random photos and nature photos from my camera roll. If you want to read about that you can find the blog post here: openpaul.github.io/posts/2025-04-06-image-sorting and here a small intro into embeddings: openpaul.github.io/posts/2024-09-28-image-embeddings/ Recently I became interested in detecting faces and identifying people in my photos locally....

October 19, 2025 · 19 min

Python et al. - Getting to a scientific plot on a new machine

From time to time you and I are lucky enough to start fresh. A new MacBook, a new Linux laptop or maybe a new server? And of course, we need to quickly get it up and running to create lovely plots. With Anacoda, Conda, Mamba, UV, Python, Virualenvs and more it can get confusing quickly. While all of this will be changing over time, today I want to disentangle this status quo as of Summer 2025 and maybe create a bit of order in this chaos....

June 12, 2025 · 9 min

Netxflow and nf-core

When analysing data, especially when analysing complicated genomics data, one quickly learns to appreciate the benefits of well-written workflows. In the past, I have developed my own bash, Snakemake and Nextflow pipelines. But since then some people from the bioinformatics community have put in enormous effort to create general standardized pipelines that anyone can use. For Snakemake this effort is called workflow catalogue and for Nextflow it is called nf-core....

May 21, 2025 · 5 min

Composing with Plotnine

Composing plots with plotnine has just become possible. Well, not quite yet. As of the writing of this blog post, the latest development version of plotnine is plotnine==v0.15.0a1. And there is a discussion issue open where the feature has been teased: https://github.com/has2k1/plotnine/discussions/929 Copying the example from that issue, we can reproduce the tiling mentioned in the post: from plotnine import * from plotnine.data import mtcars p1 = ggplot(mtcars) + geom_point(aes("wt", "mpg")) + labs(tag="a)") p2 = ( ggplot(mtcars) + geom_boxplot(aes("wt", "disp", group="gear")) + labs(tag="b)") ) p3 = ggplot(mtcars) + geom_smooth(aes("disp", "qsec")) + labs(tag="c)") p4 = ggplot(mtcars) + geom_bar(aes("carb")) (p1 | p2 | p3) / p4 /home/paul/miniforge3/envs/post_plotnine/lib/python3....

May 17, 2025 · 5 min

Plotting With Python

Plotting with Python is a nightmare. At least that’s what I thought after almost a decade of plotting experience with R. In R ggplot2 is the undefeated champion of plotting. I learned ggplot2 syntax in 2014 and it is beautiful. Like building a tower, in ggplot2 one builds layer on layer until the plot is done. All in one expression. And ggplot2 has sane defaults: You add a color to the data, and you get a legend....

April 17, 2025 · 5 min

Sorting Images with ML

I got a lot of positive feedback about the last blog post where I showed that CLIP embeddings are a cool tool to use for image classification. As such I thought I share how you can use that knowledge to actually sort your images into your own categories on your own machine. So in this post, I share the script that I currently use to train my own classifier and then sort images based on it....

April 6, 2025 · 4 min

Playing With Embeddings

Embeddings are very cool. Today I would like to share how I used embeddings and classical machine learning to bring order into my picture library. I, like many other people, use my phone camera quite liberally. I take pictures I want to keep, and I send pictures of price tags to friends and family for comparison. I document successful recipes and take pictures of documents as a digital copy to file away....

September 28, 2024 · 12 min

Processing Single-Cell data from Mouse

Single cell RNA sequencing (scRNA-Seq) is a fascinating way of getting insights into the molecular processes guiding an individual cell. While RNA won’t provide the full picture on the inner workings of a cell, proteins, hormones and nutrients will have a say in that too, it certainly is a part of the puzzle. For quite some time, researchers have had the ability to not only look at one cell, but hundreds or thousands of cells....

September 23, 2024 · 14 min