Plotting with Python is a nightmare. At least that’s what I thought after almost a decade of plotting experience with R. In R ggplot2 is the undefeated champion of plotting.

I learned ggplot2 syntax in 2014 and it is beautiful. Like building a tower, in ggplot2 one builds layer on layer until the plot is done. All in one expression. And ggplot2 has sane defaults: You add a color to the data, and you get a legend. If you use a shape, you get a legend.

Matplotlib and seaborn might be the Python plotting library of your choice, but they cant hold a candle to ggplot2 in my opinion. Their syntax is complicated. Everything is very cumbersome and they simply do not behave as I want them to. I really could not get used to them.

So I was very happy to learn that I am very much not alone: Plotnine is a project bringing the ggplot2 syntax to Python. What a blessing.

So today I thought I use plotnine to style a plot and show how it is to be used.

Plotnine: The Basics

After installing plotnine you have a choice: Really commit to the bid and import all features directly or only import plotnine as a module:

from plotnine import ggplot

Or

import plotnine

I prefer importing an aliased import of plotnine, which I think keeps the code clean and avoids collisions:

import plotnine as p9

And thats what I will use for this blogpost. I just wanted to make you aware that we do have a choice.

Plotnine integrates tightly with Pandas and expects the DataFrame to be a Pandas Object or something exposing the .to_pandas() function, like a Polars DataFrame.

A simple plot

To dive into plotnine I chose to recreate the style of the panel C from Figure 1 of the AlphaFold3 paper (Abramson et al (2024), www.nature.com/articles/s41586-024-07487-w). As the goal is not to recreate the analysis I will invent data that matches the figure roughly, simply for the purpose of this blog post.

AlphaFold3 Figure 1c_1_2

I create a new conda environment and installed plotnine and polars:

mamba create -n post_plotnine python==3.12 uv
mamba activate post_plotnine
uv pip install polars==1.27.1 plotnine==0.14.5 pyarrow==19.0.1

Then to simulate some data:

import polars as pl
import plotnine as p9
import numpy as np

def simulate_results(median: int, variance: int, n: int) -> np.ndarray:
    return np.random.normal(loc=median, scale=np.sqrt(variance), size=n)
    
ligands_df_summary = pl.DataFrame({
    "model": ["AF3", "AutoDock", "RoseTTAFold"],
    "median": [78, 52, 43],
    "variance": [10, 20, 30],
    "n": [428, 428, 427]
})


ligands_simulated: list[pl.DataFrame] = []
for row in ligands_df_summary.iter_rows(named=True):
    median = row["median"]
    variance = row["variance"]
    n = row["n"]
    simulated_data = simulate_results(median, variance, n)
    
    ligands_simulated.append(pl.DataFrame({
        "model": row["model"],
        "simulated_data": simulated_data
    }))
ligands_simulated_df = pl.concat(ligands_simulated)
ligands_simulated_df.head()

Now that the data has been simulated, I can get started on the plotting. Lets start with a default plot using nothing but the default seetings of plotnine. I chose a boxplot, as it comes closest to representing the data they want to show:

(
    p9.ggplot(ligands_simulated_df, p9.aes(x="model",y = "simulated_data", fill="model")) 
    + p9.geom_boxplot()
)

png

That was easy. I hope you can appreciate how many things plotnine did for us. We have axis labels, we have a legend. We have a decent color scheme. But besides that, the plot is missing a few key things. To name the obvious:

  • The colors are all wrong
  • We dont have the n values
  • We are missing the title

To get the style match correctly, we will need to dive into theming. I recommend having a look at the documentation of ggplot or plotnine for the theme options. For example here: plotnine.org/reference/theme.html

In the following code section I have done a lot of things. I will not dive into the details, as I think reading the code and playing with it, will be the best teacher.

Below I am discussing a few points as a conclusion:

model_colors = {
    "AF3": "#9cdcff",
    "AutoDock": "#c3c3c3",
    "RoseTTAFold": "#c3c3c3",
}

significance_data = pl.DataFrame({
    "y": [90, 70],
    "x": [1, 2],
    "xend": [2, 3],
    "label": ["***", "**"],
}).with_columns((pl.col("xend")+(pl.col("x")-pl.col("xend"))/2).alias("x_center"))

(
    p9.ggplot(ligands_simulated_df.group_by("model").agg(pl.col("simulated_data").mean().alias("mean")), 
              p9.aes(x="model",y = "mean", fill="model")) 
    + p9.geom_col(width=0.7)
    + p9.scale_fill_manual(values=model_colors)
    + p9.labs(
        title = "Ligands PoseBusters set",
        x = "",
        y = "Success (%)",
    )
    + p9.geom_errorbar(
        mapping=p9.aes(x="model", y="mean", ymin="ymin", ymax="ymax"),
        data = (ligands_simulated_df
                .group_by("model")
                .agg(pl.len().alias("n"),
                     pl.col("simulated_data").mean().alias("mean"),
                     pl.col("simulated_data").std().alias("sd"),
                     )
                .with_columns([
                    (pl.col("mean") - 1.96 * pl.col("sd")).alias("ymin"),
                    (pl.col("mean") + 1.96 * pl.col("sd")).alias("ymax")]
                )
        
        ),
        width=0.001,
        size=0.6,
        position=p9.position_dodge(0.9),
    )
    + p9.scale_y_continuous(
        breaks=[0,20,40,60,80,100],
        limits=(0,100),
        expand=(0,0),
    )
     + p9.scale_x_discrete(
        breaks=["AF3", "AutoDock", "RoseTTAFold"],
        labels=["AF3\n2019 cut-off\n" + r"$\it{n}$ = " + ligands_df_summary.filter(pl.col("model") == "AF3").get_column("n").to_numpy()[0].astype(str),    
                "AutoDock\nVina\n" + r"$\it{n}$ = " + ligands_df_summary.filter(pl.col("model") == "AutoDock").get_column("n").to_numpy()[0].astype(str),    
                "RoseTTAFold\nAll-Atom\n" + r"$\it{n}$ = " + ligands_df_summary.filter(pl.col("model") == "RoseTTAFold").get_column("n").to_numpy()[0].astype(str),   
        ],
        expand=(0.04, 0.04))
    + p9.geom_segment(
        data=significance_data,
        mapping=p9.aes(x="x", y="y", xend="xend", yend="y"),
        inherit_aes=False,
        color="black",
        size=0.5,
    )
    + p9.geom_text(
        data=significance_data,
        mapping=p9.aes(x="x_center", y="y", label="label"),
        inherit_aes=False,
        nudge_y=1.5,
        size=11,
        color="black",
    )
    + p9.theme_bw()
    + p9.theme(
            panel_grid=p9.element_blank(),          # Remove any grid lines
            panel_border=p9.element_blank(),        # Remove the border around the plot
            axis_line_x=p9.element_line(color="black"),
            axis_line_y=p9.element_line(color="black"),
            axis_text=p9.element_text(linespacing=1.2,
                                      color="black",
                                      size=11),
            axis_ticks_pad_major_x=5,
            axis_ticks_pad_major_y=1,
            axis_ticks_length=7,
            legend_key_size=8,
            legend_title=p9.element_blank(),
            legend_position="inside",
            legend_position_inside=(1,1),           # Set to (0,1) for top left
            legend_text=p9.element_text(size=11),
            figure_size=(3.9, 4),
            dpi=100,
            plot_title=p9.element_text(size=11, ha="center"),
            
    )
)

png

The recreated plot comes very close to the style shown in the paper. A few details are off. The x-axis labels in the original plot are wider than the x-axis. This is not the case for my version. Maybe with a bit more work, this would be possible.

My error bars are also completely the wrong data and are more placeholders. But they are done in such a way that it would be possible to use other data sources if they are available.

Overall I am very happy with how the final plot looks and how I got there. I hope this post showed that with a bit of theming and a vision, plotnine can be a powerful plotting tool that can produce publication-grade plots.

In a further blog post I hope to see if we can create publication grade figures as well, meaning can we make panels, labels and how flexible is it?