Listen

All Episodes

Verified Truth for Airline AI: Why LLMs and RAG Aren’t Enough

In this episode of Inside PX42, Charles Skamser is joined by Catherine Spencer and Edward Hamilton for a candid discussion about one of the most important issues facing airline transformation leaders: why generic AI tools alone are not enough for high-consequence airline customer service.

The conversation goes beyond the hype around large language models and retrieval-augmented generation to explain why airlines need a verified truth layer early in the workflow—one that connects authoritative policy, live operational context, customer entitlements, and auditable decisioning. The hosts discuss how this approach applies to refunds, rebooking, baggage servicing, loyalty exceptions, disruption recovery, complaint handling, and regulatory response.

Designed for airline executives, digital leaders, customer-service leaders, and transformation teams, this episode ties the technical architecture to business outcomes: lower servicing cost, fewer repeat contacts, better compliance, stronger customer trust, and more defensible automation at scale.


Chapter 1

Why airline executives should care about trusted AI now

Charles Skamser

Welcome to Inside PX42, the industry-leading podcast that provides a roadmap for building the intelligent enterprise with AI Agents. I'm Charles Skamser, co-founder and CEO of PX42 Consulting, and with me, as always, are my PX42 Senior Consultants and AI Experts, Catherine Spenser and Edward Hamilton. Today, we are going to address the challenges of how major airlines can verify that their AI solutions are "telling the truth"!! I want to start with a scene a lot of airline executives know in their bones: it's 6:12 a.m., a bank of flights starts slipping, weather is hitting one hub, a maintenance issue hits another, the app is lighting up, the call center queue stretches, airport agents are improvising, and somewhere in that mess a customer asks a very simple question -- "Am I entitled to a refund, or not?"

Catherine Spencer

And that question sounds small until you remember what sits underneath it. Not just service. Revenue protection, regulatory exposure, loyalty, brand trust, operational recovery... all compressed into one answer delivered in one moment.

Edward Hamilton

Yes, and that's the crucial thing. The market keeps celebrating AI that sounds polished. But in an airline, polished language is almost the least interesting capability. The issue is whether the answer is CORRECT under the actual policy, the actual customer context, and the actual state of operations at that moment.

Charles Skamser

Exactly. And that's why we're doing this episode of Inside PX42. We're talking about establishing verified and trusted AI for the airline industry -- not AI as a generic technology play, not another chatbot demo, but AI that can operate in a high-volume, heavily regulated, margin-sensitive environment where bad decisions scale just as fast as good ones do. I've spent more than two decades advising C-level leaders in major airlines on technology strategy, operating performance, customer experience, and transformation. And what has changed -- really changed -- is that service, operations, compliance, and brand trust are no longer separate conversations. They're one conversation now.

Catherine Spencer

Charles, when you say you've watched that convergence over twenty-plus years, what's different in 2026 versus, say, even five years ago? Because airlines have always been complex.

Charles Skamser

They have. But the difference now is the speed and scale of digital interaction. Customer service used to be easier to quarantine as a support function. Now it flows across mobile, web, call centers, airport operations, partner ecosystems, and post-travel complaint handling. So when a cancellation happens, you are not merely solving a passenger interaction. You're touching fare rules, inventory, loyalty economics, refund systems, airport constraints, regulatory obligations, and financial exposure -- all at once.

Edward Hamilton

The phrase that sticks for me there is "all at once." Because that is precisely where generic AI tends to come unstuck. It can handle one thing at once rather well. It struggles when truth is distributed across reservation systems, policy documents, live flight-status feeds, exception rules, and, frankly, old institutional habits.

Catherine Spencer

And airline executives should care now because the deployment pressure is real. Across the market, everyone's being pitched copilots, conversational assistants, intelligent workflow layers, retrieval-based service platforms, autonomous agents. Faster service, lower cost, better containment. Some of that value is real. But if you automate the wrong policy answer at scale, you haven't transformed service -- you've industrialized error.

Charles Skamser

That's beautifully put. Industrialized error. A wrong answer in airline customer service is not a harmless hallucination. It can deny a refund that should've been approved, miss a compensation obligation, mishandle a baggage claim, overlook a loyalty entitlement, botch an accessibility case, or create avoidable legal exposure in complaint response. So the core question for airline leaders is not, "Can AI talk fluently?" The question is: how do we reduce cost and improve service WITHOUT automating policy mistakes at scale?

Edward Hamilton

And I think it's worth saying bluntly -- if an airline cannot explain why a system made a refund decision, it has no business letting that system make one. That's the threshold. Explainability is not decorative in aviation-facing service operations. It's table stakes.

Catherine Spencer

Especially because in an airline, trust is cumulative and fragile. One bad service recovery interaction during disruption can undo years of loyalty messaging. A premium customer will forgive weather sooner than they forgive inconsistency.

Charles Skamser

Right. So today's conversation is really about a different foundation for enterprise AI in airlines: Verified Truth and Verified Policy-as-Code. If you're an airline CIO, COO, head of customer care, chief digital officer, loyalty executive, or transformation leader, this is the architecture question underneath the AI question. And if you get this wrong, you may get a flashy pilot... and a very expensive operating mess.

Chapter 2

The real airline problem is not AI fluency, it is policy truth

Charles Skamser

Let's make the problem concrete. Airline customer service is not one queue with one answer base. It spans refunds, rebooking, baggage, travel credits, vouchers, fee waivers, accessibility accommodations, unaccompanied minor handling, loyalty recognition, complaint intake, regulatory response, and irregular operations at scale. And in each of those domains, the answer rarely depends on one policy or one system.

Edward Hamilton

"One policy or one system" -- that's the bit people underestimate. Because a missed connection, for example, may depend on fare conditions, whether the ticket was partially used, whether a partner carrier is involved, whether the disruption was controllable, what inventory exists now, and whether the passenger has some elite entitlement layered on top.

Catherine Spencer

And if even one of those elements is wrong -- "partially used ticket," let's grab that one -- the answer can sound perfectly sensible and still be operationally false. That's what makes airline service so unforgiving. Plausible is not safe.

Charles Skamser

Exactly. Most airlines do not have one clean, authoritative policy layer governing all of this. They have fragments: knowledge bases, PDF manuals, airport bulletins, legal interpretations, operations playbooks, CRM scripts, refund-system rules, loyalty guidance, training materials, and, let's be honest, institutional memory. Some sources are current. Some are partially current. Some conflict by geography or channel. Some were never properly retired.

Catherine Spencer

Ah yes -- the famous policy source called "that's how the team in Frankfurt has always done it." Every enterprise has a version of that, but in airlines the consequences are sharper.

Edward Hamilton

Quite. And this is why generic chatbots and conventional RAG systems are insufficient for truth-critical servicing. RAG can retrieve relevant passages. Useful, certainly. But retrieved passages are not the same thing as runtime decision integrity. The model may retrieve the general rule and miss the exception. Or retrieve two conflicting passages and have no principled way to decide which governs.

Charles Skamser

That's a big one. The market keeps acting as if better search equals better decisioning. It doesn't. Better search answers, "What content can I find?" Airlines need a system that answers, "What is TRUE in this specific scenario according to the authoritative policy, the live operating facts, and the applicable constraints?" That's a completely different requirement.

Catherine Spencer

Charles, give me a practical failure mode. Not theoretical -- the kind of thing an executive should picture happening on a Tuesday.

Charles Skamser

Sure. Here's one. A refund policy gets retrieved from an outdated knowledge article instead of the authoritative current policy model. The assistant denies the refund. That denial then flows downstream as if it were true -- into complaint handling, finance, reporting. Maybe loyalty doesn't recognize the customer should've received a higher-touch recovery. Maybe the complaint team later inherits the case and never sees that the root determination came from stale policy. By the time someone catches it, you've created repeat contacts, dissatisfaction, supervisory rework, maybe regulatory exposure.

Edward Hamilton

"Flows downstream as if it were true" -- that's the terrifying phrase. Because once one agent, human or machine, labels something as established fact, downstream agents begin treating it as ground reality. In a multi-agent environment, one incorrect eligibility determination can cascade across refunds, loyalty, case closure, complaint handling, and reporting.

Catherine Spencer

And that cascade is where labor-savings slides become fantasy. Because the model on the spreadsheet counts a cheaper first interaction, but not the second call, the supervisor intervention, the complaint, the remediation expense, or the premium customer who quietly decides never again.

Charles Skamser

Yes! That's the hidden cost structure. Another failure mode: conflicting sources. One source states the general rule. Another captures the exception that actually governs the case. A traditional RAG implementation may retrieve one but not the other, or both without understanding priority. The output sounds complete. It is still wrong. And because it sounds confident, it can be MORE dangerous than an obvious failure.

Edward Hamilton

Then there's live operational state. A rebooking or refund rule may be perfectly encoded in principle, but if the system isn't connected to reservation state, flight operation, inventory position, payment state, and customer history, the policy can be "correct" abstractly and applied incorrectly practically. That's a marvellous way to create very expensive nonsense.

Catherine Spencer

So what airline executives actually need is not a prettier front end on top of fragmented truth. They need a runtime architecture that can reconcile policy, customer state, operational state, and business consequence in the moment of decision.

Charles Skamser

That's it. And that is why I keep saying the airline industry does not need another thin generative layer sitting on top of stale SOPs and loosely governed workflow logic. It needs a truth-centered transformation architecture. Not digital self-service theater. Trusted servicing at scale.

Chapter 3

What Verified Truth means in practice

Edward Hamilton

All right, let's define the thing properly. Verified Truth, as Charles uses it, is not a slogan. It's an operating requirement. It means AI-generated recommendations, customer responses, workflow decisions, and automated actions are grounded in validated enterprise truth before they are delivered or executed.

Catherine Spencer

And there are five dimensions, yes? Let's do them carefully, because this is the spine of the whole conversation.

Charles Skamser

Five dimensions. First, Source Truth. The system must know which authoritative source supports the answer or action. Not "some retrieved passage." An approved policy, a fare rule, a tariff, a regulatory obligation, a contract term, a customer entitlement, a live operational record. Second, Policy Truth. The policy applied must be the RIGHT policy, the correct version, in force at that time, and actually applicable to the scenario.

Edward Hamilton

"Correct version, in force at that time" -- that's where effective dates matter, yes? Because a policy may be valid in April and invalid in June. If the system cannot bind a decision to the policy version that actually governed on the date of the event, the audit trail is already compromised.

Charles Skamser

Precisely. Third is Context Truth. What is actually true about the customer and event at runtime? Itinerary status, disruption condition, fare family, ticket attributes, loyalty tier, airport context, partner involvement, payment state, service history, prior adjustments, exception conditions. Fourth is Decision Truth. The system must explain why it reached the answer -- the reasoning path, the facts used, the rules and constraints applied. Fifth is Action Truth. If the system takes action -- refund, rebook, waive a fee, escalate a case, trigger compensation, route a complaint -- that action must be authorized, compliant, recorded, and auditable.

Catherine Spencer

Let me try to play that back in a more operational way. Source Truth is: where did this come from? Policy Truth is: was it the correct rule? Context Truth is: was it the correct rule for THIS passenger in THIS moment? Decision Truth is: can we reconstruct why? Action Truth is: was the outcome actually permitted and logged?

Charles Skamser

That's exactly right.

Edward Hamilton

And if any one of those five is missing, you don't have trusted AI. You have a nice interface sitting atop ambiguity.

Charles Skamser

Let's use airline examples. Refund eligibility: a premium customer on a partially used itinerary experiences a cancellation on the final leg due to a controllable maintenance issue. A basic chatbot might look at a general nonrefundable fare rule and offer a credit. A verified-truth architecture validates the disruption cause, recognizes the controllable nature of the event, identifies the partially used ticket state, checks loyalty tier, references the governing refund and reaccommodation policy, evaluates alternate inventory and partner options, and produces a policy-compliant set of actions.

Catherine Spencer

"Partially used final leg due to controllable maintenance" -- that exact combination is the difference between generic AI and enterprise AI. It's not one fact. It's the intersection.

Edward Hamilton

Baggage gives us another one. A delayed bag involving a partner segment. One system says the bag is in transit. Another says not loaded. An outdated reimbursement article says one thing; current liability rules imply another. Verified Truth reconciles chain-of-custody facts, carrier relationship, itinerary context, and applicable reimbursement policy before telling either the employee or the customer what can actually be offered.

Catherine Spencer

And accessibility cases are where the human stakes become especially obvious. If a system handles disability-related servicing incorrectly, that is not merely a workflow defect. That's a trust and governance failure with potentially serious legal consequences. So human oversight there is not optional window dressing.

Charles Skamser

Which brings us to production design. A truth-centered architecture has several layers working together. At the top is the experience layer -- web assistant, mobile assistant, contact-center copilot, airport-agent guidance, complaint tools. Beneath that is orchestration -- specialized agents for refunds, rebooking, loyalty, baggage, complaints, escalation, approvals. At the CENTER sits the Verified Truth Layer. That's where verified facts, provenance, truth relationships, and conflict reconciliation live.

Edward Hamilton

And next to that, crucially, the Verified Policy-as-Code layer. Policy source registry, canonical policy objects, decision tables, effective-date control, version control, exception libraries, simulation, scenario testing, rollback capability, runtime telemetry. In other words: not just encoded rules, but governed encoded rules.

Catherine Spencer

Charles, I want to grab "canonical policy objects" because that's jargon people may nod at without truly hearing it. What do you mean in plain English?

Charles Skamser

Fair challenge. It means we do NOT try to treat every SOP and narrative document as the master execution layer. Instead, we identify authoritative policy sources, normalize their key propositions into structured policy objects -- the rules, conditions, exceptions, approvals, obligations -- and then use that same truth foundation to generate both human-readable SOP guidance and machine-executable policy logic. One truth base, multiple outputs.

Catherine Spencer

So instead of rewriting the entire library of enterprise folklore first, you identify what is authoritative, normalize it, validate it, and then let both people and machines consume the synchronized version.

Charles Skamser

Yes. Because trying to manually rewrite every internal SOP before capturing AI value is often transformation drag. Airlines change too fast -- regulations evolve, airport procedures shift, waivers appear, partner arrangements move, loyalty rules change, operational contingencies vary with weather and seasonality. By the time the rewrite is finished... parts of it are already stale.

Edward Hamilton

That's an important point. The objective is not perfect information everywhere. It's authoritative information where decisions matter most first. That's a much more rational readiness model.

Charles Skamser

And this is also why PX42 typically recommends a human-in-the-loop rollout. Start with truth-based copilots. Let the system advise, show its reasoning, expose which facts and policy objects it used, and route higher-risk actions through approval bands. Over time, as accuracy and auditability are demonstrated, selected routines can move into tightly governed automation.

Catherine Spencer

Which is how trust should be earned in an airline. Not announced. Earned.

Chapter 4

Business value, ROI, and the hidden cost of getting AI wrong

Catherine Spencer

Let's talk money, because if this remains a philosophy conversation it won't survive the budget process. The business case in airline terms is lower contact-center cost, yes, but also better digital containment, fewer repeat calls, less rework, lower complaint burden, improved premium-customer retention, and reduced legal and regulatory exposure.

Charles Skamser

Exactly. And I want to emphasize something that gets lost all the time: the real value driver is not call deflection by itself. It's ACCURATE containment. If you reduce contact volume while increasing refund errors, repeat contacts, complaint escalation, loyalty dissatisfaction, or inconsistent answers across channels, you haven't transformed service. You've shifted cost and created hidden liabilities.

Edward Hamilton

"Accurate containment" is the phrase airline executives should probably write down. Because a superficially cheaper AI model can be much more expensive once you price in the downstream consequences of being wrong.

Catherine Spencer

Take irregular operations. During weather, maintenance disruptions, airport congestion, crew issues, partner failures -- contact volume surges. If the service AI misinterprets disruption rules or lacks live operating context, repeat contacts multiply, escalations rise, and the airport teams end up cleaning up digital mistakes under maximum stress. That hidden downstream cost can exceed the labor saving that justified the AI in the first place.

Charles Skamser

That's right. In disruption servicing, Verified Policy-as-Code creates reusable consistency. It encodes disruption rules and service-recovery logic into machine-executable form that can be validated, tested, updated, and reused across web, mobile, contact center, and airport operations. That protects service quality when inconsistency is most costly.

Edward Hamilton

Baggage is another quiet value pool. People often think baggage is merely a tracking issue. It isn't. It's chain of custody, compensation obligations, fee refund rules, service-recovery options, partner coordination, liability exposure. Even modest gains in reimbursement accuracy and resolution consistency can produce meaningful avoided cost -- and, perhaps more importantly, reduce the emotional intensity of the experience for the passenger.

Catherine Spencer

And loyalty servicing may be the most undercounted category of all. Elite recognition, waivers, companion benefits, upgrade exceptions, premium recovery. A single interaction may not look enormous financially, but across high-value segments it affects retention and lifetime value. Mis-handle one premium traveler often enough and the P&L eventually notices.

Charles Skamser

Complaint handling and regulatory response are also major value areas. When a complaint enters the system, especially in a regulated category, the airline doesn't just need polished prose. It needs a defensible reconstruction: what happened, which policy applied, what the customer was entitled to, how the case was handled, and whether the resolution was appropriate. A verified architecture can support fact gathering, policy linkage, explanation generation, and replay-ready case reconstruction.

Edward Hamilton

"Replay-ready" matters enormously. If legal, compliance, or a regulator asks later, "Why did you deny this?" the answer cannot be, "Well, the model looked at some documents and seemed fairly confident."

Catherine Spencer

So how should leaders measure success? Because average handle time alone is obviously not enough.

Charles Skamser

PX42 recommends measuring across five dimensions together. First, customer performance: digital containment, first-contact resolution, repeat-contact reduction, complaint volume, resolution speed, satisfaction, consistency across channels. Second, financial performance: cost per contact, avoided servicing expense, refund and compensation accuracy, reduced rework, reduced leakage, premium-customer retention effects, ROI. Third, operational performance: handle time, escalation rates, supervisor intervention, baggage-resolution cycle time, disruption-resolution efficiency, throughput during high-volume events.

Edward Hamilton

And fourth is policy and compliance performance -- policy adherence, decision replay success, complaint-response timeliness, audit completeness, traceability of exceptions, override patterns. Fifth is AI and truth-layer performance -- decision accuracy against test scenarios, drift detection, provenance completeness, conflict detection, truth-validation coverage, confidence thresholds for moving from advisory to semi-automated and then automated execution.

Catherine Spencer

I like that you grabbed all five, Edward Hamilton, because if an airline measures only the first three, it can congratulate itself on efficiency while quietly accumulating risk in the back room.

Charles Skamser

Exactly. The hidden cost of weak customer-service AI is often larger than the visible cost of human labor. One wrong answer can create a second contact, a supervisor review, a complaint, a compensation dispute, a refund rework cycle, a loyalty dissatisfaction event, maybe even a regulatory response obligation. That's why the economic model has to be layered: direct service economics, policy-integrity economics, and enterprise-value economics.

Edward Hamilton

So, not "the bot saved two minutes," but "the system reduced cost without poisoning the rest of the enterprise." A slightly sterner KPI, but a far better one.

Chapter 5

Why PX42 and Reliath matter—and who Charles wants to work with next

Charles Skamser

This is where PX42's role becomes important. We are not approaching the airline opportunity as a one-size-fits-all chatbot deployment. We frame it as a truth-centered decision architecture for customer service and policy execution. That means strategic framing, operating model design, policy normalization, truth architecture, governance, phased adoption, and a very practical path from experimentation to production.

Catherine Spencer

And that "phased" part matters. Because the sensible roadmap is not: buy platform, ingest documents, declare transformation. It's current-state assessment, truth-gap analysis, authoritative-source mapping, canonical policy foundation, then a Verified Policy-as-Code pilot in one or two high-value domains, then copilot deployment with human approval structures, then multichannel rollout, then continuous optimization.

Edward Hamilton

And our partner Reliath sits at the center as the truth-based foundation, yes? Let's be specific about why that matters.

Charles Skamser

Yes. Reliath provides a truth-based AI platform that models verified facts rather than relying on token prediction as the primary unit of intelligence. It builds what it calls Truth Profiles. In its architecture, the basic unit is the factoid, not the token. It preserves provenance, assigns truth values to facts, distinguishes verified information from hypothetical or fabricated information, reconciles conflict, and supports reasoning from a truth-oriented foundation.

Edward Hamilton

"Factoid, not token" -- that's the architectural distinction listeners should catch. Most enterprise AI stacks still rely on retrieval and prompt construction to approximate truth. This approach makes truth structure central to the model itself.

Catherine Spencer

And for airlines, that means Reliath can serve as the Verified Truth Layer inside customer service and Policy-as-Code architecture -- supporting refunds, rebooking, baggage, disruption management, complaint handling, regulatory response, and other high-consequence interactions where provenance and replay are not optional luxuries.

Charles Skamser

Exactly. Each customer gets its own model and source of truth, with private deployment options including SaaS or on-premises depending on requirements. No cross-training between customers, no leakage of one client's information into another client's model. For airline environments, where privacy, control boundaries, and enterprise isolation matter, that's important.

Catherine Spencer

Charles, I'd also underline something from the PX42 perspective. Your differentiation isn't just the technology stack. It's the willingness to define the initiative correctly in the first place. Not "how do we deploy AI fast?" but "how do we build trusted decisioning in a domain where policy integrity, live context, and business outcome must stay aligned?"

Charles Skamser

That's exactly how we think about it. PX42 helps clients identify where AI can reduce cost, improve service quality, increase resilience, protect revenue, strengthen compliance, and improve executive visibility -- but we tie those outcomes to architecture and operating discipline. We don't believe the market needs more disconnected pilots or superficial generative overlays that fall apart when they hit enterprise complexity.

Edward Hamilton

And to be fair, Charles, there will be executives listening who wonder whether this is too rigorous, too heavy, too slow.

Charles Skamser

I get that question all the time. My answer is simple: in high-consequence airline servicing, rigor is cheaper than scaling bad logic. The price of safe automation is source mapping, version control, effective-date control, owner approval, testing of edge cases, conflict detection, simulation, rollback, replay. That may sound heavy in a demo culture. In production, it's what keeps you out of trouble.

Catherine Spencer

And it creates a more modern organization, frankly. One where policy ownership doesn't disappear -- it becomes operationalized. Human expertise, legal interpretation, service design... those become more visible and reusable, not less important.

Charles Skamser

That's right. So let me close with this very clearly. I am looking to work with airline executives who are genuinely committed to leveraging AI and fully understand the concept and value of Verified Truth. Not an airline executive willing to settle for another flashy demo. Not an airline executive who wants to bolt a chatbot onto fragmented policy and call it transformation. I want to work with an airline leadership team that is ready to build trusted AI, trusted decisions, and trusted automation the right way.

Edward Hamilton

A small number of airline executives who want to lead. Executives who are genuinely interested in establishing a legacy of being AI thought leaders.

Catherine Spencer

And for the right executives, the upside is substantial: lower cost, better service, stronger policy integrity, better recovery during disruption, more defensible complaint handling, and a foundation for governed automation that can actually scale.

Charles Skamser

Exactly. The future of airline AI will not be won by whoever has the flashiest interface. It will be won by the airline that can prove its systems are grounded in the right facts, the right policy, the right context, and the right control path. That's the move from AI that sounds impressive to AI that can actually be trusted. And in airlines, I think that's the difference that will define the next several years.

Catherine Spencer

Thanks for listening.

Edward Hamilton

Until next time.