March 3, 2026

Episode 507: Not All Hammers Are Equal: Benchmarking AI for AL Code

Episode 507: Not All Hammers Are Equal: Benchmarking AI for AL Code
Apple Podcasts podcast player iconSpotify podcast player iconYoutube Music podcast player iconRSS Feed podcast player iconAmazon Music podcast player iconAudible podcast player iconYouTube podcast player icon
Apple Podcasts podcast player iconSpotify podcast player iconYoutube Music podcast player iconRSS Feed podcast player iconAmazon Music podcast player iconAudible podcast player iconYouTube podcast player icon

One developer decided to stop guessing which AI model is best for AL coding — and built a system to find out. In this episode of Dynamics Corner, Brad and Kristoffer sit down with Torben Leth, the creator of CentralGage, an open-source benchmarking tool that ranks LLMs specifically on their ability to write AL code for Business Central.

Torben walks through how he built an automated testing pipeline that gives each model multiple passes, compiles the output, runs pre-built AL tests, and scores everything from zero to 100. What he discovered about why developers swear by completely different models might change how you think about your own setup — and he's found a way to use those failures to patch a cheaper model's blind spots so it rivals the top performers.

Plus: gamertags embroidered on wedding suits, chili plants managed by Home Assistant, and the philosophical question nobody in this space can seem to answer — should we even try to keep up?

Find Torben: blog.sshadows.dk | LinkedIn | X: @Sshadows

CentralGage: ai.sshadows.dk

Send a text

Support the show

#MSDyn365BC #BusinessCentral #BC #DynamicsCorner

Follow Kris and Brad for more content:
https://matalino.io/bio
https://bprendergast.bio.link/