Ai on Alexander Junge's website

Ai on Alexander Junge's website https://www.alexanderjunge.net/tags/ai/ Recent content in Ai on Alexander Junge's website Hugo -- gohugo.io en-US Sun, 15 Feb 2026 00:00:00 +0000 Stingray racing and the rising bar for software https://www.alexanderjunge.net/blog/stingrays-software/ Sun, 15 Feb 2026 00:00:00 +0000 https://www.alexanderjunge.net/blog/stingrays-software/ This weekend, our five-year-old son really wanted us to make our own racing game and what he wanted was super clear: only stingrays (his favorite pet animal) racing each other, and they should say “oh my days” when they crash. Using a vibe coding tool, we pretty much zero-shot a working game. Amazing that you can just do this in 10min these days! You can see the result above. Show me the prompt: PydanticAI https://www.alexanderjunge.net/blog/show-me-the-prompt-pydanticai/ Sun, 08 Dec 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/show-me-the-prompt-pydanticai/ I am a happy user of pydantic and instructor, the latter being a well scoped tool with a well-defined surface area to use pydantic for structured outputs and validation for large language models. This week saw the release of PydanticAI that looks like a nice abstraction to build agents on top of pydantic. I think this personal preference is largely because of my preexisting bias to use pydantic and instructor already. After evals, flywheels https://www.alexanderjunge.net/blog/after-evals-flywheels/ Mon, 08 Apr 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/after-evals-flywheels/ Let’s assume you have a few evals for your AI product in place that allow you to get a good idea of how well the underlying (set of) model(s) is doing and attract a few first users. And now what? Start building data flywheels! Flywheels are mechanisms that allow you to leverage the data you have to improve your model and attract more users. They are the key to scaling your AI product and the mechanism looks like this: Recommended tutorial on achieving Structured Outputs in DSPy https://www.alexanderjunge.net/blog/structured-outputs-dspy-tutorial/ Fri, 05 Apr 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/structured-outputs-dspy-tutorial/ I am a big fan and active user of instructor to make interactions with inherently probabilistic LMs more structured and reliable in the form of their output. Without approaches like this, making LMs work reliably in certain production settings would be a true nightmare. This is a really cool intro video by Connor Shorten (of Weaviate) to structured outputs in general and specifically using instructor and DSPy. DSPy is another cool project and idea that, simply put, makes working with LM-based systems feel again like ML engineering and not like a game of tipping LMs to produce the right output or threatening to harm a kitten (aka prompt engineering). Recommended read: Your AI Product Needs Evals by Hamel Husain https://www.alexanderjunge.net/blog/starting-evals-hamel/ Wed, 03 Apr 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/starting-evals-hamel/ I really like this blog post by Hamel Husain entitled ‘Your AI Product Needs Evals’. In particular, what sets this one apart from other posts on the topic is that it is very practical and actionable (by centering the post around a specific case study). It helps that this post is not written by an LLM tool provider that primarily tries to sell their new fancy tool to you and only secondarily provides some useful advice (not mentioning any names here). Fine-tuning LMs as a way to move compute back from inference to training https://www.alexanderjunge.net/blog/finetuning-inference-training/ Mon, 01 Apr 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/finetuning-inference-training/ As I was reading this review paper on tool use for language models (LMs) over the Easter holidays, a thought crossed my mind: There is an interesting trend when working with LMs in production to perform more and more computations at inference time. For example, tool-using agents, multiagent systems, elaborate state machines, and ever more complicated RAG systems like CRAG are becoming popular since they tend to give better responses. The goal is often to provide deliberative System-2-like responses, rather than instinctive System-1-like responses (following the System 1 and System 2 distinction coined by the late Daniel Kahneman and Amos Tversky; excuse the anthropomorphism here). The Berkeley Function-Calling Leaderboard https://www.alexanderjunge.net/blog/function-calling-leaderboard/ Sun, 24 Mar 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/function-calling-leaderboard/ Function-calling (aka tool-use) is essential to enable LLMs to run internet searches, write and execute code, generate images, use a calculator, … whenever that makes sense to solve the current task. The Berkeley Function-Calling Leaderboard gives a good overview of which LLMs perform best on function-calling benchmarks. Here is what the leaderboard looks like currently: Noteworthy A few things stand out to me: an open source model, Gorilla OpenFunctions v2, that is Apache 2. RAG 2.0? https://www.alexanderjunge.net/blog/rag20maybe/ Sat, 23 Mar 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/rag20maybe/ This blog post on “RAG 2.0” by Contextual AI got me thinking. Not sure, if it makes sense for anyone to be “announcing” (or even defining) RAG 2.0 but there are a few tidbits in this post hinting towards a potentially more powerful, general approach to RAG they are working on. The article is light on technical details and heavy on claimed “state-of-the-art” results on various benchmarks. However, I very much agree that a) defining evaluation datasets first, and b) then end-to-end optimizing RAG performance is the right approach to improve (pre-)production systems. Building AI tools for an audience of one https://www.alexanderjunge.net/blog/ai-tools-for-one/ Fri, 22 Mar 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/ai-tools-for-one/ With modern AI tools just being an API call away for most developers, the bottleneck to creating entirely new, powerful experiences is no longer a lack of access to technology but a lack of human creativity, understanding of the problem space, and an inability to translate what is technically possible to user value. As a developer in this space, it is super important to me to build tools just for myself. Short: Differential privacy in a RAG setting https://www.alexanderjunge.net/blog/short-diff-privacy-rag/ Thu, 21 Mar 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/short-diff-privacy-rag/ Why The usefulness of modern AI systems dramatically increases when the underlying AI models have access to recent, relevant data in addition to the information captured in the models’ internal parameters. This is the core idea behind both in-context learning and Retrieval Augmented Generation. Here is an interesting LlamaIndex blog post looking at a scenario where three parties want to share data but cannot to do so freely for privacy reason. Short: RAFT https://www.alexanderjunge.net/blog/raft/ Wed, 20 Mar 2024 00:00:00 +0000 https://www.alexanderjunge.net/blog/raft/ RAFT: Adapting Language Model to Domain Specific RAG From the Gorilla LLM project, Retrieval Aware Fine-Tuning (RAFT) combines retrieval-augmented generation and fine-tuning to adapt language models to domain-specific knowledge. Blog post: here Paper: here Why Retrieval Augmented Generation (RAG) and fine-tuning are two of the most important concepts in the NLP domain when it comes to exposing large language models to recent, domain-specific information. The Retrieval Aware Fine-Tuning (RAFT) model is a combination of both of these concepts and generalizes Retriever Aware Training (RAT). Major news: I am co-founding amass https://www.alexanderjunge.net/blog/starting-amass/ Wed, 06 Sep 2023 00:00:00 +0000 https://www.alexanderjunge.net/blog/starting-amass/ Major personal news today: I have joined amass as a co-founder solving challenges every life sciences researcher and R&D organization is facing: How to keep up with scientific knowledge, in all its shapes and sizes, both internally and externally? How to make optimal decisions by synthesizing new insights based on this knowledge? To solve these, amass is creating a scientific memory relying on recent advances in AI such as LLMs and multimodal representations.