How I use AI as a Data Engineer

Written by Darko Monzio Compagnoni | 26 Feb, 2026

I am writing this blog between projects. It's one of those "bench" moments that I've learned to use for reflection and skill-building. This time, I want to share something I've been thinking about for months: how AI has changed the way I work as a data engineer.

This isn't a think piece about whether AI will replace us. It won't. But it has changed what a productive day looks like for me, and I think it's worth being honest about how.

AI is not a search engine

The biggest mistake I used to make with AI tools is treating them like Google. You type a question, you expect a perfect answer. That's not how it works.

I treat AI as a peer. A very fast, very knowledgeable colleague who sometimes confidently tells me things that are wrong. Just like a real colleague, the quality of the conversation depends on how well I communicate what I need.

This means I don't just ask "write me a dbt model." I provide context: the schema, the business rules, the constraints. I ask for an outline before the code. I correct its assumptions. I iterate. A good result usually takes 3-5 exchanges, not one.

High-value output is negotiated, not requested.

Where AI helps me

Here are the tasks where AI saves me the most time on a project:

Refactoring legacy SQL

On my last project, I worked with dbt and Snowflake at a large logistics company. The codebase had plenty of SQL that had grown organically over time. Triple-nested joins, hardcoded schemas, and no modularity.

AI is great at this kind of grunt work. I paste in a messy query, explain the target architecture (staging → intermediate → mart), and ask it to propose a decomposition plan. It gives me a starting point in minutes that would've taken me an hour to sketch out manually. I then review, adjust, and iterate.

The key word here is starting point. I never deploy AI-generated code without reviewing it. But having that first draft to react to? That's where the time savings come from.

Writing tests and YAML documentation

This is probably where AI has the highest ROI for me. Writing schema.yml files with proper column descriptions, not_null tests, unique tests, and relationship tests is important but repetitive. AI handles it well because the patterns are predictable.

I provide the model SQL, and I ask for the full YAML block with business-value descriptions. Not just "the order date" but "the UTC timestamp when the customer completed checkout, used as the primary date dimension in revenue reporting." That level of description matters for the team, and AI can generate it faster than I can type it.

Generating Jinja macros

On the same project, I built a dbt macro that selected the Snowflake warehouse size based on the data timeframe to process. The logic wasn't complex, but getting the Jinja syntax right with all the edge cases (NULL handling, default values, variable scoping) took some back-and-forth with AI.

I've found that AI is particularly helpful when I know what I want to build but need help with the how of a specific syntax or framework.

Debugging and code review

When I'm stuck on a query that doesn't behave as expected, I've started asking AI to "explain lines 15-20" rather than staring at the code for twenty minutes. This targeted approach works much better than asking "what's wrong with this code?"

I also use it as a first-pass code reviewer. I ask it to act as a skeptical reviewer and find three reasons the logic might fail in production. It doesn't catch everything, but it catches the obvious things. Missing WHERE clauses, potential NULL issues, non-idempotent operations, which frees up my human reviewers to focus on the business logic.

Where AI falls short

Being honest about the limitations is just as important as knowing where to use it.

It's a yes-man. AI tools are trained to be helpful and agreeable. If I ask "is this the best approach?" it will almost always say yes. I've learned to never give it my solution first. I ask for its approach, then compare. The quality of feedback improves dramatically this way.

It hallucinates. On my last project, the team used GitHub Copilot because the premium version doesn't use customer code for training (a strict company policy). Even with Copilot, I've seen it suggest functions that don't exist or were deprecated versions ago. You always need to verify against the actual documentation.

It drifts. In long conversations, AI gradually forgets the rules you set at the beginning. I told it to use snake_case consistently, and by message 10, it had switched to camelCase. The workaround is simple: either use a custom system prompt (like Gemini Gems) to lock in your standards, or add a periodic reminder to the conversation.

Security is your responsibility. I never paste API keys, passwords, or PII into any AI tool. For sensitive projects, I anonymize table and column names before sharing code. This is a non-negotiable habit.

My workflow in practice

If I had to summarize how AI fits into my daily work, it looks like this:

I start a task by dumping context into the AI: schema, business rules, constraints. Then I ask for an outline or approach, not the final code. I review the approach, correct what's wrong, and only then ask for the implementation. Once the code is there, I refine it iteratively: fix a join key here, add idempotency there, improve the formatting last.

This process (sandbox, refine, chain) consistently produces better results than a single "write me the code" prompt. It takes 10-15 minutes of iteration, but it replaces what would have been 1-2 hours of writing from scratch.

What this means for my career

In my previous blog post, I mentioned a conversation with an in-house Data Architect who recommended gaining more development experience to eventually move into a strategic role. AI is accelerating that path.

By automating the repetitive parts of development (boilerplate code, test generation, documentation) I have more time to focus on architecture decisions, stakeholder conversations, and understanding the business problem. These are the skills that lead to a strategic role, and AI is freeing up the bandwidth to develop them on the job rather than only during bench time.

I don't see AI as a threat. I see it as a tool that shifts my value from typing code to making decisions about code. And that feels like the right direction.

Practical advice for data engineers starting with AI

If you're a data engineer who hasn't yet integrated AI into your workflow, here's what I'd suggest based on my experience:

Start with refactoring, not greenfield. AI performs best when it has existing code to react to. Pasting in a messy SQL query and asking for a modular rewrite is a better first use case than asking it to build something from nothing.
Provide context, always. The difference between a generic answer and a useful one is the context you provide. Schema, environment (Snowflake, Postgres, etc.), naming conventions, business rules. The more you give, the better the output.
Verify everything. AI is a draft generator, not a deployment pipeline. Review every line. Test every query. Don't let speed create technical debt.
Learn to iterate. One prompt rarely gives you what you need. Treat it as a conversation. The skill isn't in writing the perfect prompt. It's in knowing how to refine the output through 3-5 focused exchanges.

Fun fact: traffic on StackOverflow has seen sharp declines since the release of GitHub Copilot in October 2021 and ChatGPT in November 2023. The way we find answers is changing. The engineers who learn to work with AI, while keeping their critical thinking sharp, will be the ones who thrive.

At DDBM, we help companies build modern data architectures using tools like Snowflake, dbt, and Matillion. If you're curious about how AI-augmented data engineering could help your team, feel free to reach out.

View full post