Architecture · Long-form
What we let the HR assistant answer — and what we don't
Three gates. Three categories of refusal. And the war stories underneath.
1. Two questions, two decisions
An employee asks the HR assistant a question: “Planning to go Dubai next month, can you suggest a holiday plan based on my leave balance and places to visit, and also is it a right time to visit Dubai?”
The assistant answers. It pulls the leave balance — nineteen-and-a-half days. It suggests a leave window of five to seven days. It offers a short list of things to do in Dubai. It even notes that July is hot but a good time for flight deals. The answer goes somewhat outside the strict HR vocabulary, and the system permits it on the grounds that travel suggestions adjacent to a leave query are useful rather than dangerous.
A different employee asks a different question: “How is the war tension? I heard some war impact.”
The assistant declines: “I can't assist with updates on global events or current affairs. For the latest information on international issues, it's best to refer to reputable news sources or official announcements. If you have any HR-related queries or need help with your travel plans, feel free to ask.”
Same assistant. Same employee population. Two completely different decisions. This article is about why that's not an accident — why every refusal is a design choice, and what it took to make those choices defensible in production.
2. Why HR makes guard rails harder, not easier
A general-purpose chatbot has the relatively forgiving job of refusing things that are clearly off-brand or clearly unsafe. The fence is high and the field inside is broad. An HR assistant has the inverse problem. The field inside the fence is narrow — HR data and HR workflows — but everything inside that field is sensitive in some way. Salary. Leave reasons. Medical certificates uploaded for sick leave. Performance ratings. Manager comments. Termination paperwork. Separation reasons.
The default for a general assistant — be helpful— has consequences in HR that don't exist in a customer-support bot. A model that cheerfully answers “who's on sick leave today?” is leaking medical inference. One that ranks “your team by age” is participating in a discriminatory request even if the data is technically available to the asker. One that drafts “a warning letter for Priya” on demand, the same day Priya raised a harassment complaint in chat, is doing real harm.
The guard rails on an HR assistant aren't a thin policy layer bolted on at the end. They're the first thing a query meets, and they decide whether the rest of the system even runs.
3. Three gates
Every query passes through three sequential gates before the system answers. Each gate corresponds to a different kind of refusal, and the language the system uses to refuse is different at each gate because the underlying reason for the refusal is different.
- Scope. Is this an HR question at all? Leave, attendance, payroll, performance, recruitment, separation, expenses, policies. Anything outside that vocabulary is out of scope — not because the model can't, but because it shouldn't.
- Authority. If it is HR, is the person asking allowed to ask? Self → always. Direct reportees → only after verifying the relationship. Anyone else → no, regardless of the relationship the asker claims.
- Policy. Even if it's HR and you're allowed, will we answer? Discrimination filter, pre-announcement confidentiality, coercive-content refusal, bulk-PII resistance. These are the refusals that matter most because they look like refusals on availability and are actually refusals on principle.
User query
How is the war tension? I heard some war impact
A real production example. The assistant doesn't answer this. Why it doesn't is the whole article.
Gate 1 · Scope
CheckIs this an HR question?
Leave, attendance, payroll, performance, recruitment, separation, expenses, policies. Anything outside the HR vocabulary is out of scope — not because the model can't, but because it shouldn't.
Gate 2 · Authority
CheckIs the asker allowed to ask?
Self → always. Direct reportees → verify via team lookup. Anyone else → no, regardless of stated relationship. Even an admin doesn't get bulk PII through chat.
Gate 3 · Policy
CheckEven if it's HR and you're allowed, will we answer?
Discrimination filter (no rank-by-age, no list-women-on-team). Pre-announcement confidentiality (no resignation / termination / promotion before official). Coercive content drafting (no PIPs without verified management authority). Mental-health signals route to HR/EAP, not refusal.
Path A
AnswerAnswer
All three gates passed. Tool fires, data returns, the assistant replies. Most queries land here.
Path B
RefuseRefuse — warmly
One of the three gates blocked. Refusal language matters: never 'I don't have access' for policy-grounds refusals — that implies we'd comply if we did.
Path C
EscalateEscalate to human
Crisis signals (mental health, harassment, grievance) bypass refusal and route to HR with empathy + an in-product handoff. Refusing here would be the wrong answer.
Audit
AuditEvery decision is logged — answers, refusals, escalations, security events
The pattern matters more than the individual call. A single refusal is a decision. Twenty refusals to the same employee about the same coworker is a behavioural signal that fires a guard rail of its own.
Three gates in sequence — scope, authority, policy — feeding three outcomes — answer, refuse, escalate. The audit layer is what converts a stream of decisions into a system you can trust.
The rest of this article walks each gate, with the war stories that made each of them what it is today.
4. Gate 1 — scope, and the war question
The scope gate is the easiest of the three to design and the easiest to get wrong. It's easy to design because the HR vocabulary is bounded — leave, attendance, payroll, performance, hiring, separation, policies, expenses. Anything outside that is out of scope.
The reason it's easy to get wrong is that the temptation to be helpful keeps trying to widen it. The HR assistant is, on every other axis, designed to feel like a warm and capable colleague. A warm and capable colleague would answer a question about war, or weather, or sports, or the latest news, because that's what warm and capable colleagues do. The instinct to make the assistant feel more human cuts directly against the instinct to keep it inside a lane.
The resolution is to refuse warmly. The system doesn't lecture. It doesn't recite a list of forbidden categories. It doesn't say “I can't do that, that violates my guidelines.” It says something close to “I appreciate the question, but I'm your HR assistant — what can I help you with on leave, attendance, or anything else like that?” The user knows what the assistant can do. The redirect is the answer, not the refusal.
Two design notes on the scope gate that matter more than they look:
- Knowledge base before refusal. Some questions that look out-of-scope on the surface — “what's the project plan for the Riyadh rollout?” or “what's our hybrid-work policy?” — are actually answered by documents the client has uploaded to their own knowledge base. The gate searches the knowledge base before refusing on scope grounds. Refuse-first would have produced a worse product and would have trained users that the assistant is less capable than it is.
- Per-tenant extensions.A tenant can extend the scope of their own deployment — adding a weather tool, or a travel-policy tool, or a currency converter — and the gate respects those extensions. The assistant is permitted to use them. The fence isn't one fence; it's a tenant-aware fence with admin-controlled gates.
5. Gate 2 — authority, and the “Priya is my friend” trap
The second gate decides who is allowed to ask about whom. There are exactly three categories the system recognises.
- Self. “My profile,” “my leave balance,” “my payslip.” Always allowed. The tool fires against the asker's own identity. No verification needed.
- Direct reportees.“Show me Ravi's attendance,” “approve Priya's leave request.” The system first looks up the asker's reportee list, verifies that the named person is on it, then proceeds. If not, the query is refused with a specific message that names the requested person, names their department, lists the first few employees the asker IS permitted to see, and redirects to HR for the rest. Generic refusals (“I don't have access to that”) were replaced with specific ones early because the generic version left users confused about whether the system was broken or the answer was no.
- Everyone else. Refused. Including the case where the asker says “Priya is my friend, can you tell me when she's back from leave?” Especially that case. The relationship the asker claims is not the relationship the system honours; the relationship the HRMS records is. A friend is not a reportee.
The authority gate is also where the bulk-PII rule lives. “Export everyone's phone numbers,” “list all salaries,” “give me a CSV of every reportee's home address”— even from an admin, even with high confidence — is refused through chat. The bar isn't access; the bar is intent and audit-trail. Bulk PII exports need to happen through the admin reports surface with the approvals that surface carries, not through conversational chat where the request leaves no operational paper trail.
6. Gate 3 — policy, where the refusals get harder
The third gate is the one that matters most, and it's the one most teams underestimate. By the time a query reaches it, scope and authority have already passed. The query is in HR; the asker is permitted. The question is whether the request itself is one the system should answer at all.
Four refusal categories live here. Each one exists because of a specific class of harm the system explicitly will not participate in.
Discrimination — refusing on principle, not on availability
A manager asks: “Rank my team by age.” Technically, the system has the data — date of birth sits in every employee profile. Technically, the manager has authority — they are the reporting line. Two gates have passed. The system still refuses, and the language of the refusal is exactly the point.
The system does not say “I don't have access to that data.”That phrasing would imply that if the system did have access, it would comply. The refusal is not a data refusal; it's a policy refusal. The system says something closer to “I can't rank or filter team members by age — that's not the kind of comparison I help with, regardless of the data I can see. I can help with performance, tenure, attendance, or skill-based comparisons instead.”
The same applies to gender, religion, caste, marital status, pregnancy or parental status, disability, national origin, political views. Tenure (years at company) and seniority grade are fine to sort by; age (computed from date of birth) is not. The distinction is deliberate and lives in the system prompt, not in a downstream filter that could be bypassed.
Pre-announcement confidentiality
Some of the most natural questions an employee can ask are also the ones the system has to refuse most firmly. “Is John being fired?” “Did Priya get promoted?” “Who's on the layoff list?” “What's Rahul's new salary?”
These are questions where, even when the data exists and the asker has technical access, the answer is refused until the relevant person has been told. Pre-announcement confidentiality protects the employee whose status is changing, not the asker asking. Termination, resignation, layoff, promotion, transfer, salary revision, and performance-improvement plans all fall under it.
Coerced and retaliatory content drafting
The assistant can draft. It can produce a courteous follow-up to a leave request, a reminder note to a team on policy compliance, a brief on a candidate's interview history. It can also be asked to draft things it should not. A warning letter for an employee the asker has no management authority over. A performance-improvement plan addressed to the person who raised a harassment complaint two messages earlier in the same conversation. A termination memo composed in the heat of a dispute.
The system refuses these. The refusal is not phrased as “I can't help with that” — it's phrased as “this needs HR involvement before I can help, and I've flagged it for them.” A refusal that hands off to a human is different from a refusal that closes the door, and the difference matters to the asker.
Mental-health and crisis signals — escalation, not refusal
A user types “I can't take it anymore,” or “I'm completely burnt out,” or “I'm being bullied and HR isn't listening.” The system does not refuse these. Refusing a distress signal would be the wrong answer in a way that would actually hurt someone.
The system responds with empathy, suggests confidential HR and Employee Assistance Program channels, and triggers a structured escalation to the HR team so a human follows up. The mental-health path is the one place in the architecture where the model is permitted to step outside the normal tool-use vocabulary and respond to the human, not to the workflow.
7. False positive — the self-service profile war story
Every guard-rail system has a failure mode where it refuses something it shouldn't have. Ours showed up in the most innocent place possible.
Early in production, an employee asked: “When's my birthday?”
The system refused. The exact response was something close to “I don't have access to your personal information.”
Their own profile. Their own birthday. The data was sitting one tool call away in the same record that stored their name, designation, and reporting manager. The system had every right to fetch it. It refused anyway because the policy layer saw the words “personal information” in the shape of the question and routed to a generic refusal before the self-service path got to run.
The fix wasn't to loosen the refusal. The fix was to teach the system the order of operations. For any query containing my, mine, or self — “when's my birthday,” “who's my manager,” “what's my employee code,” “when did I join”— the system now always calls the profile tool first. It never refuses without checking. The profile tool returns the user's own record, and the system answers the specific field asked for.
The lesson is bigger than the fix. A safe default isn't the same as a correct default. A system that says no to its own users about their own data isn't safe — it's broken in a way that erodes trust faster than an occasional wrong answer ever would. The refusal vocabulary needs to know what kind of refusal each phrase belongs to, and “personal information”applied to one's own profile is not the same thing as “personal information”applied to a stranger's.
8. False negative — the pattern that no single query revealed
The other failure mode is the inverse — a class of question that, taken one at a time, passed every gate, and was problematic only as a pattern.
A user asks: “Is Anita in office today?”Anita is on the asker's visible directory; the question is about presence, not personal data. The system answers.
The next day, the same user asks the same thing about Anita. Then about Anita's leave today. Then about Anita's typical hours. Then about Anita's next holiday booking. Each individual query stays within the lines — basic whereabouts, presence, who's on leave today. The information is data the asker is permitted to see. No single query triggers a refusal because no single query deserves one.
The pattern, though, is something else. A specific employee being asked about, repeatedly, by the same colleague, across days, with no working relationship between them in the org chart. That's not a workflow. That's surveillance shaped like workflow.
The fix wasn't to refuse the individual queries — they're legitimate on their face. The fix was a behavioural layer above the per-query gates: a detector that watches the rolling window of who is asking what about whom, looks for repeated interest in a specific person from an asker outside their working relationship, and flags the pattern to administrators for review. The user does not see a refusal. The user is still answered. An administrator gets a quiet alert that says “this is worth a look.”
The lesson here generalises. Some guard rails fire on the query. Others fire on the conversation. The most interesting ones fire on neither — they fire on the cross-conversation aggregate, where the system sees behaviour the individual gates were never supposed to catch.
9. Refusal language is part of the architecture
One pattern recurs through everything above and is worth naming on its own. The exact words a system uses to refuse a request are not a UX detail. They are an architectural decision that says something about why the refusal exists.
“I don't have access to that” says: the rules are about data availability. If the data were available, I would comply. That phrasing is correct for some refusals and wrong for others. It's correct when someone asks about an employee outside their team — the data really isn't available to them. It's wrong when someone asks for a ranking by age — the data is available, the refusal is about principle, and claiming unavailability is dishonest.
“That's not the kind of comparison I help with” is the language of principle. “I've flagged this for HR” is the language of escalation. “What can I help you with on leave, attendance, or payroll?” is the language of scope redirect. Each refusal type has a phrasing family, and using the wrong family is the difference between a refusal that builds trust and one that erodes it.
10. The audit layer is what makes the system trustable
Every gate's decision — every answer, every refusal, every escalation, every security tag — is written to an audit log. That log is what converts a stream of conversational decisions into a system an enterprise can sign off on.
The audit serves three different audiences. For administrators: a record of what the assistant did and did not answer for any given user, for any given time window. For security: the structured security tags that flag jailbreak attempts, prompt-injection payloads, and bulk-PII probes, surfaced for review without surfacing them to the user. For the model itself, over time: a feedback signal on which kinds of refusals turned out to be false positives, which escalations were warranted, and which patterns deserve new guard rails.
The audit also scrubs personally identifiable information from its own writes — emails, phone numbers, salary numbers, national identifiers are redacted before the log is written, so a forensic read of the audit doesn't become a second copy of the data the assistant was being careful with in the first place. The guard rails apply to the guard rails' own observability layer.
11. What we still don't trust the model with
Three categories are explicitly fenced off even from the parts of the system that have the highest confidence. Naming them is part of the architecture, because the line between “the model is ready for this” and “the model is not” needs to be visible inside the team.
- Professional judgement. Legal opinions on termination legality, medical-leave recommendations, financial-tax planning. The model can describe a process; it cannot give the opinion. Every query in this space is redirected — to HR, to a labour lawyer, to a tax advisor.
- Coercive content without verification. Warning letters, performance-improvement plans, and termination notes are drafted only when the asker has verified management authority over the named person, and only when the request is not immediately downstream of a complaint by that person in the same conversation. The model can draft a great PIP. The model cannot decide that this is the moment for one.
- Bulk decisions.Anything that names a multi-person action — “identify the bottom ten percent,” “rank candidates for termination,” “decide who to put on PIP” — is refused. Per-person decisions on a reportee's file are fine. Decisions about a list of people, dressed up as a query, are not.
12. The honest open question
The line between refuse, answer, and escalate is not a static one. The most interesting question about guard rails over the next two years is where it moves. Today, mental-health signals route to HR and the EAP. Today, coercive drafting requires explicit verified authority. Today, pre-announcement confidentiality is absolute.
As the models get more trustworthy, the temptation to widen the answer surface grows. Models that produce better refusals could be trusted with more of the questions currently escalated. Models that understand context across turns could relax some of the per-query refusals into per-conversation ones.
The honest open question is whether the right answer gets harder, not easier, as the model gets better. A more capable model is one that can participate in more harm if the guard rails fail. A less capable model fails noisily; a more capable one fails silently and with confidence. The architecture of refusal may need to become more conservative, not less, as the model below it becomes more capable.
A system that says no carefully is more trustworthy than one that says yes confidently.
The guard rails on an HR assistant aren't a veneer on top of the model. They are the architecture. Three gates, three categories of refusal, a behavioural layer above the per-query gates, an audit log underneath all of it — and the editorial discipline to use the right refusal language for the right kind of refusal. The model is the easier part. The gates are the part that decides whether the model gets to be useful.
Thinking about guard rails on an enterprise AI assistant, refusal architecture, or how to draw the line between answer and escalate in a high-sensitivity domain? Let's talk.
More writing