flow

notes 📝 and articles, here 📖 and elsewhere 🔗

📖 Thinking critically about teaching critical thinking

Is a pedagogy of critical thinking possible? I tried to find out

Curious? Read more

Jan 9, 2026

•

science
📝 Workshop on medical image segmentation

teaching university students about medical image segmentation in Python

Me and Alexandre Calado had the opportunity to teach a small course on medical image segmentation at a small student conference organised by the Núcleo de Estudantes do Departamento de Física at Universidade de Coimbra.

It was a great opportunity to brush up on what we know and how we can best communicate it. Given that it was supposed to be a hands-on workshop, I thought that a small Kaggle notebook could be a great opportunity of getting everyone’s hands dirty with a club classic - the Medical Segmentation Decathlon. So I coded a small tutorial on medical image segmentation. It’s free to use with an Apache License 2.0, so feel free to use it with the appropriate attribution 🤗

Nov 27, 2025

•

teaching
📖 Implementing academ.ai, a local retrieval system for academic papers

Semantic and hybrid retrieval for academic papers - look across >100,000 papers and find the relevant ones without losing your mind with keyword-based search.

Curious? Read more

Sep 16, 2025

•

tech apps
📖 LLMs can't innovate

They sure can write gooder than me. But is innovation really their strongest suit? And most importantly - why does that matter?

Curious? Read more

Jun 17, 2025

•

tech
📝 Auto-METRICS - a proof of concept

Automatic assessment of methodological quality in radiomics research

Recently, I received my first (obvious) LLM peer review. It was quite blatant. What’s worse: it wasn’t good - at all! Funnily enough, I had been working on something related: Auto-METRICS, a tool for automatic standardised assessment of scientific research quality in radiomics research using the METRICS framework.

To show its utility, we make use of two unique, recent datasets on reproducibility in radiomics studies - Akinci D’Antonoli et al. (2025) and Kocak (2025). Together, they feature really good set of METRICS ratings - for different levels of expertise and training - for more than 50 publications. This allowed us to systematically compare human and LLM raters.

The main takeaways:
- Human raters agree with LLMs at the same rate that they agree with other human raters ✅
- Prompt iterations: clarifying radiomics guidelines can lead to better agreement with human raters. However these improvements were quite limited! 📈
- Too nice: LLM ratings tended to be slightly higher than those offered by human raters 😇
I tested our tool - Auto-METRICS - here (all you need is a free Google Gemini API key) and found it really helpful to get an initial assessment for METRICS which I can easily confirm. The key? Enhance, don’t replace - having good initial ratings was super helpful in getting a final, human-based classification.

Curious? Read more about Auto-METRICS at medRxiv.

Apr 22, 2025

•

apps

flow

notes 📝 and articles, here 📖 and elsewhere 🔗

📖 Thinking critically about teaching critical thinking

📝 Workshop on medical image segmentation

📖 Implementing academ.ai, a local retrieval system for academic papers

📖 LLMs can't innovate

📝 Auto-METRICS - a proof of concept