-
📝 Auto-METRICS - a proof of concept
Automatic assessment of methodological quality in radiomics research
Recently, I received my first (obvious) LLM peer review. It was quite blatant. What’s worse: it wasn’t good - at all! Funnily enough, I had been working on something related: Auto-METRICS, a tool for automatic standardised assessment of scientific research quality in radiomics research using the METRICS framework.
To show its utility, we make use of two unique, recent datasets on reproducibility in radiomics studies - Akinci D’Antonoli et al. (2025) and Kocak (2025). Together, they feature really good set of METRICS ratings - for different levels of expertise and training - for more than 50 publications. This allowed us to systematically compare human and LLM raters.
The main takeaways:
- Human raters agree with LLMs at the same rate that they agree with other human raters ✅
- Prompt iterations: clarifying radiomics guidelines can lead to better agreement with human raters. However these improvements were quite limited! 📈
- Too nice: LLM ratings tended to be slightly higher than those offered by human raters 😇
I tested our tool - Auto-METRICS - here (all you need is a free Google Gemini API key) and found it really helpful to get an initial assessment for METRICS which I can easily confirm. The key? Enhance, don’t replace - having good initial ratings was super helpful in getting a final, human-based classification.
Curious? Read more about Auto-METRICS at medRxiv.
-
📝 How to develop a radiomics signature
Presentation and workshop on applying machine-learning to radiology data to medical doctors + biomedical researchers
In 2024, I participated as faculty in the “How to develop a radiomics signature” course organised for the European Society of Gastrointestinal and Abdominal Radiology. Here, I gave a short presentation on machine-learning model development and tuning, as well as two practical programming sessions: one on radiomic feature extraction and the other on model development and (fine-)tuning. Both are freely available.
-
📝 Some notes on programming and coding with AI assistants
Most IDEs come pre-packaged with their own AI assistant for coding. But who should use them?
Recently, I have been wondering about AI coding assistants (AICA). A lot of what I do day-to-day is coding, so having a convenient assistant is good.
I have tried a few and my general impression is simple: it works well, but it frequently introduces bugs which I have to fix. Ultimately, I have a hard time understanding whether I see an improvement in my coding ability or not - on one hand I can code faster; on the other I have to fix more bugs.
So I took a quick dive into the literature (some early works trying to predict productivity gains are available, but, alas!, behind these keystrokes lies an empiricist):
- GitClear conducted their own study into this, showing that post-AICA code has a higher churn rate (i.e. code requiring maintenance further down the line) 1
- Some works focus on understanding whether AICA solutions are good without verification - they are not2 3, with a recent work showing that only 30% of code suggestions are accepted 4
- In terms of productivity, the same work claims a 30% increase in productivity (!) 4. However, productivity is measured as the acceptance rate of AICA suggestions, which, despite being a somewhat dumb metric, is associated with perceived productivity 5
From other works on the impact of coding assistants on productivity, we get a heterogenous picture:
- a study asking participants to implement an HTTP server in JS as quickly as possible showed that people using AICA were able to get the job done 50% faster 6
- a study from Uplevel (on 800 of its customers) saw a 40% increase (!) in bugs with very little impact on efficiency (PR cycles reduced by 1.7 minutes) and no impact on burnout rate 7
- a study shows an increase in PRs but also a decrease in build success rates 8
- a study surveying developers determined that they are generally positive about AICA, but are somewhat fearful that more junior developers won’t have the opportunity to own up to their skills if they rely too heavily on these tools; additionally they were more keen to boast its efficiency boosts when tasks are easy or repetitive (i.e. the usual boilerplate code) which leaves more time for learning and creative thinking 9
- a study on developers in the public sector highlights a similar trend, with developers feeling that they have more time to focus on more meaningful and rewarding tasks if they use AICA 10
- showed that GH CoPilot has a more positive impact when used by experts and that it can be harmful for novices as it suggests buggy/complicated code which novices cannot fully comprehend 11
My take on this: it can be useful, but a lot of the benefits are overstated and novice developers benefit greatly from steering clear from it while learning. Benefits appear to lie in a speed-quality tradeoff curve, so each position will have a specific position on this.
-
Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality ↩
-
Exploring the Verifiability of Code Generated by GitHub Copilot ↩
-
Sea Change in Software Development: Economic and Productivity Analysis of the AI-Powered Developer Lifecycle ↩ ↩2
-
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot ↩
-
The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers ↩
-
Understanding the impact of an AI coding assistant, GitHub’s Copilot, on developers and their work experiences ↩
-
Harnessing the Potential of Gen-AI Coding Assistants in Public Sector Software Development ↩
-
🔗 Para onde foi o microbioma do cancro?
Um artigo científico publicado em 2020 prometia revolucionar o diagnóstico de cancro. Quando outras equipas não obtiveram os mesmos resultados, foram lançadas dúvidas sobre a investigação e sobre a empresa que se fundou inspirada nas conclusões.
Published in Shifter
-
🔗 As ferramentas mudam mas as lutas continuam – a Inteligência Artificial e o que quer dizer para o teu trabalho
Os novos modelos de Inteligência Artificial captaram a atenção de tudo e todos. Com a sua capacidade para gerar texto como nunca antes visto, surgiram medos de que possam destruir empregos em massa. Será essa ameaça real? Ou os pressupostos desse medo são mais sociais e políticos do que tecnológicos? Escrito com João Gabriel Ribeiro
Published in Shifter