-
📝 Workshop on medical image segmentation
teaching university students about medical image segmentation in Python
Me and Alexandre Calado had the opportunity to teach a small course on medical image segmentation at a small student conference organised by the Núcleo de Estudantes do Departamento de Física at Universidade de Coimbra.
It was a great opportunity to brush up on what we know and how we can best communicate it. Given that it was supposed to be a hands-on workshop, I thought that a small Kaggle notebook could be a great opportunity of getting everyone’s hands dirty with a club classic - the Medical Segmentation Decathlon. So I coded a small tutorial on medical image segmentation. It’s free to use with an Apache License 2.0, so feel free to use it with the appropriate attribution 🤗
-
📖 Implementing academ.ai, a local retrieval system for academic papers
Semantic and hybrid retrieval for academic papers - look across >100,000 papers and find the relevant ones without losing your mind with keyword-based search.
Curious? Read more
-
📖 LLMs can't innovate
They sure can write gooder than me. But is innovation really their strongest suit? And most importantly - why does that matter?
Curious? Read more
-
📝 Auto-METRICS - a proof of concept
Automatic assessment of methodological quality in radiomics research
Recently, I received my first (obvious) LLM peer review. It was quite blatant. What’s worse: it wasn’t good - at all! Funnily enough, I had been working on something related: Auto-METRICS, a tool for automatic standardised assessment of scientific research quality in radiomics research using the METRICS framework.
To show its utility, we make use of two unique, recent datasets on reproducibility in radiomics studies - Akinci D’Antonoli et al. (2025) and Kocak (2025). Together, they feature really good set of METRICS ratings - for different levels of expertise and training - for more than 50 publications. This allowed us to systematically compare human and LLM raters.
The main takeaways:
- Human raters agree with LLMs at the same rate that they agree with other human raters ✅
- Prompt iterations: clarifying radiomics guidelines can lead to better agreement with human raters. However these improvements were quite limited! 📈
- Too nice: LLM ratings tended to be slightly higher than those offered by human raters 😇
I tested our tool - Auto-METRICS - here (all you need is a free Google Gemini API key) and found it really helpful to get an initial assessment for METRICS which I can easily confirm. The key? Enhance, don’t replace - having good initial ratings was super helpful in getting a final, human-based classification.
Curious? Read more about Auto-METRICS at medRxiv.