How to screen CVs with AI without hallucinated qualifications
· 4 min read · Michal Juhas
The way to stop AI from hallucinating candidate qualifications is to require evidence: every claim in the screening report must carry a verbatim quote from the CV, and “no evidence found” must be a valid, visible answer. This one rule turns AI CV screening from a liability into the most reliable step of your pipeline.
Here’s why hallucinated qualifications happen, why the usual fixes don’t work, and how the evidence-quoting method works in practice.
Why AI invents qualifications
Large language models are completion machines: they produce the most plausible continuation of the text they’re given. Ask one “does this candidate have GCP experience?” about a data engineer’s CV, and “Yes, the candidate has experience with GCP” is an extremely plausible sentence (data engineers often do). The model isn’t lying; it’s pattern-matching, and the pattern says “probably yes.”
This is the single biggest trust-killer in AI recruiting. One invented certification that surfaces in a client call, and your team (rightly) stops trusting every AI-written screen. The damage isn’t the one bad answer: it’s that you now have to re-verify everything, which erases the time the AI was supposed to save.
The fixes that don’t work
- “Be accurate, don’t make things up.” Prompt-level pleading reduces hallucination at the margins but cannot eliminate it. The model still has no structural reason to distinguish found in the CV from plausible for this kind of candidate.
- Asking for confidence scores. Models are poorly calibrated about their own accuracy; a confident “95%” attached to an invented skill is worse than no score at all.
- Manually re-checking everything. This works — and deletes the entire point of automating the screen.
The evidence-quoting method
The structural fix changes what the model is asked to produce. Instead of “assess whether the candidate meets each requirement,” the task becomes:
For each must-have requirement, quote the exact passage from the CV that demonstrates it. If no passage demonstrates it, write “No evidence found in CV” and flag it for the interview.
Three things change immediately:
- Claims become checkable in seconds. A recruiter reads the quote next to the claim, glances at the CV, done. Verification goes from “re-do the screen” to “spot-check the quotes.”
- Hallucination becomes structurally hard. The model can’t quote a passage that doesn’t exist, and if it tries, the fabricated quote fails the two-second check. Plausibility stops being good enough, because the output format demands provenance.
- Gaps become useful instead of dangerous. “No evidence found: flag for interview” is genuinely valuable output. A missing GCP certification isn’t a rejection; it’s the first interview question. The screen stops pretending to certainty it doesn’t have.
A typical evidence-backed screening row looks like this:
✓ 5+ years building data pipelines “…led the Airflow migration of 40+ ETL pipelines (2019–2025)…”
⚠ GCP certification No evidence found in CV: flag for interview.
Making it consistent: from method to workflow
The evidence-quoting method only protects you if it’s applied on every screen, including the Friday-afternoon one, by the intern, on candidate #47. That’s a workflow problem, not a prompting problem: the method needs to be encoded once, in a shared workflow, rather than re-remembered in every chat session.
This is exactly how the CV Screener workflow in Calyflow is built. You attach the job description, intake notes, and CVs; it screens every candidate against the must-haves with a verbatim quote for each claim, flags every gap, and shows the run cost at the bottom of the report. Because Calyflow runs on your own AI key, the same workflow works whether your team uses Claude, GPT, Gemini, or a local model. And because it’s open source, you can read exactly what the screening prompt does with every CV.
Where this fits in the pipeline
Evidence-backed screening pays off most when the rest of the search feeds it good inputs: a sharp JD defines must-haves the screen can actually test, and a good sourcing map fills the pipeline with candidates worth screening. The output (an evidence-quoted report per candidate) drops straight into a client-ready submission pack, which is why teams on the upper rungs of the AI Adoption Ladder treat screening as the first workflow to standardize.
The takeaway
Don’t ask AI whether a candidate is qualified. Ask it to prove each qualification with a quote, and to say plainly when it can’t. Evidence in, trust out — that’s the whole method.
Try the evidence-quoting CV Screener on your next role: create a free account. Free to start, your own API key, no credit card.
Related posts
Why we built Calyflow open source
Recruiting AI handles careers and client trust, so Calyflow is AGPL-3.0: inspectable code, self-hostable, and your data is never held hostage.
The AI Adoption Ladder for recruiting teams
A five-rung maturity model for recruiting teams: from no AI, to personal chat use, to shared prompts, to team workflows, to an AI-operated search lifecycle.
AI sourcing maps: from JD to boolean strings in one run
A sourcing map turns a job description into target companies, talent pools, and ready-to-paste boolean strings, in one AI run instead of an afternoon.
Ready to run this as a workflow?
Calyflow turns this playbook into a repeatable workflow on your own AI, data, and tools.
Create free account