I got a lot of positive feedback about the last blog post where I showed that CLIP embeddings are a cool tool to use for image classification. As such I thought I share how you can use that knowledge to actually sort your images into your own categories on your own machine.
So in this post, I share the script that I currently use to train my own classifier and then sort images based on it. While this approach works well for me, it does not make use of actually fine-tuning the underlying model, which would likely result in even higher accuracy.
Just the script
If you only want the script, you can find it here: github.com/openpaul/photosort
Sharing Python code
Sharing Python code has always come with challenges the moment one has external dependencies. I will not go over all these issues and the options to solve that problem. For this blog post, I only highlight the most recent and most helpful solution to this: The new Python environment manager uv.
In short: uv is a fast Python environment manager, kin to pyenv or conda. I have been using it for serval months now, and can comfortably say: If all you need is Python and dependencies, uv is the best way to handle your dependencies. It’s amazing. Fast, easy to use and it just works.
If you have non-Python dependencies, you might still need to use conda. But for this project I suggest you obtain a copy of uv: github.com/astral-sh/uv.
The reason why I recommend getting us is the PEP-723 standard that allows inline dependencies. Others have written extensively about this and I recommend you read up on it here: thisdavej.com/share-python-scripts-like-a-pro-uv-and-pep-723-for-easy-deployment/
In short, this allows you to share a single file, which contains dependencies and UV will execute it and create a virtual environemnt and you wont have to worry about it any more. So nice.
To run the script (and install all dependencies), simply install uv and then execute:
mv photo_sort.py $HOME/.local/bin/photosort
chmod +x $HOME/.local/bin/photosort
photosort -h
Due to the first line in the script:
#!/usr/bin/env -S uv run --script
The script will be executed using uv, which in turn will install all dependencies.
Depencencies
PyTorch
As I am using ROCM on an AMD GPU, you might need to adjust the torch dependencies to match your machine. For that, I recommend reading docs.astral.sh/uv/guides/integration/pytorch/.
FFmpeg
The script also includes calls to FFmpeg, which is not a Python dependency and you will need to install that separately if you want to classify videos as well.
Training
The script is set up in such a way that for training the classifier it expects examples for each class sorted into a folder structure where the folder name will be the class label:
data
├── cars
├── horses
├── class_3
├── ...
└── class_N
You can name those folders as you want. I suggest to have at least 50 images for each class. If there is an imbalance between classes, the training will upsample underrepresented classes, to correct for that.
The training can then be started by simply running:
photosort -v train ./data
This will launch a full training run. Embeddings and models are stored in your home folder under .local/share/imageclassifier
. This folder you’ll have to remove to “uninstall” the script fully. Also, run uv cache clean
to remove all environments.
Classification
Well now that the model was trained it is easy to classify a folder of images. The script will try to create this folder structure:
2025
└── 01
├── cars
├── horses
├── class_3
└── class_N
Images are then moved or copied into the folders.
If that is not what you want, you can simply adjust the script to do what you need it to do.
The classifier is run by:
photosort -v classify -i "./input_folder" -o "./output_folder" --move
See help for more options.
Automation
As my images are constantly uploaded from my phone to my server, I have a cronjob running this script, which makes the experience pretty seamless and I always have my photos sorted as I like them:
*/14 * * * * flock -n /tmp/photosort.lock photosort -vv classify -m classic -i "./input_folder" -o "./output_folder" --move >> $HOME$/.classify.log 2>> $HOME/.classify_error.log
Using flock will ensure that only one sorting is run at the same time to avoid any conflicts.
Conclusion
There are many things about this script that are very ad-hoc. For example, the video classification is a majority voting of stills, which works well for me but it is not part of the actual training of the model. So there certainly is a better way of handling that.
Also, I extract the date information from EXIF and as a fallback from the filename. For my cameras those rules work, but I expect they are not perfect for everyone.