Stop Trying to Convince Your LLM to Be a Workflow Automation Tool

Let’s clear the air on AI for a moment.

GPT stands for Generative Pre-Trained Transformer. If you want to know more about that kind of Transformer, read the paper “Attention Is All You Need”. GPTs like Claude, ChatGPT, and Gemini predict the next most likely word based on their corpus of knowledge and your prompt — that’s it. In my opinion, general-purpose LLMs have peaked. Chasing the last 10% of “intelligence” is a fruitless race when what we need are purpose-driven tools.

Please don't use LLMs for math, they're not designed for mathematical precision, they're designed for pattern matching. Some LLM as a service providers like OpenAI, Anthropic, and more are starting to put checks in place to try and interpret mathematical requests and route to a proper handler for your request, but they're not reliable.

If I need to programmatically solve for an equation, I’d call a Python script to do it. The LLM’s job should be to understand a request and route it to the right function or tool. Like taking a prompt to search a database and turning it into a structured and validated database query. Let it interpret the prompt and pass the data on to a tool built specifically to handle that data.

I’m not going to try and convince you that in most cases despite the llama-3.1-8b model costing something like 0.15% of what GPT-5 cost to train that they’re basically comparable at interpreting intent, generating language, and summarizing data, but you can try yourself at lmarena.ai. I use llama-3.1-8b at home for general purpose tasks and frankly installing dual paned windows saved me more on electricity in one month than running that model has in a full year. In addition to how efficient that already is comparatively, with the popularization of quantization for inference models can use even less memory. While that may sound sci-fi, it’s more or less adhering to a basic database design principle, use the precision size that fits your use case. For instance, going from float32 to int8 could cut the memory demand for inference in half for some models. For most of the tasks I use LLMs for, the response is usually so structured I couldn’t tell you the difference between a model with quantization and one without.

If you’re using LLMs I would wager you probably don’t need the level of precision you’re paying for. Identify where you need natural language processing and natural language responses. That’s where you should use LLMs. Try out different model sizes, start low and work your way up till you consistently get a reliable response. There’s lots of FOSS solutions for interacting with them in different ways. Focus your LLM by using detailed system prompts or .md context files, requiring structured output, and utilizing RAG based architectures where possible.

If you want your automations to think without overcomplicating them, stop forcing LLMs to be everything. Let them be our babel fish for computers. Combine them with real workflow tools like n8n.io.

🔗 Dive deeper into how LLMs and n8n can work together efficiently in my post: Read more here