First Principles of AI

A Practical Guide for Business Professionals

| 30 min read
Stefanos Damianakis
Stefanos Damianakis

President, Zaruko

Table of Contents

Introduction

Why This Article

AI is everywhere. It shows up in product pitches, investment decks, board presentations, and every other headline. The problem is that most explanations fall into one of two categories: too technical to be useful, or so simplified they're misleading.

Meanwhile, hype and snake oil are everywhere. Companies slap "AI-powered" on products that are barely automated. Vendors promise predictions that violate basic principles of what's learnable. Investors chase trends without understanding what the technology can actually do.

This article grew out of talks I gave over the past few years at regional business and technology events. The audiences were mostly business professionals: executives, investors, and business leaders trying to make sense of AI. The conversations after those talks convinced me that what people need isn't more hype or more jargon. They need a framework for thinking clearly about what AI can and cannot do. I've seen too many people get burned by AI claims that sounded plausible but violated basic principles.

You need a way to think for yourself about AI claims. That's what this article provides.

Why First Principles

First principles are foundational assumptions you treat as true for reasoning. In physics: conservation of energy. In geometry: two points define a line. You build everything else on top of these. They are the starting point.

First principles thinking means reasoning up from these foundations rather than by analogy or convention. Instead of asking "how have others solved this?" you ask "what is fundamentally true, and what can I build from there?"

Elon Musk described this approach using batteries as an example. A decade ago, battery packs cost several hundred dollars per kilowatt-hour, and conventional wisdom held that this was simply what batteries cost. First principles thinking asks a different question: what are batteries actually made of? Musk described a mental exercise where he broke battery packs down to their raw materials, priced on commodity markets: cobalt, nickel, aluminum, carbon, polymer separators. The notional materials cost came to roughly $80 per kilowatt-hour.

The gap between raw materials and finished product wasn't physics. It was manufacturing complexity, legacy processes, and supply chain inefficiencies. First principles revealed the opportunity that others missed because they accepted the status quo. Over the following decade, battery costs fell dramatically, though raw materials are only one component of pack cost.

The power of this approach: you can reason about situations you've never seen before. You don't memorize facts. You derive answers from foundations. When someone makes a claim, you check it against fundamentals. You spot nonsense because it violates first principles.

Applied to AI

AI is surrounded by hype, jargon, and marketing. Most people learn about AI through analogies: "it's like a brain," "it thinks," "it understands." These analogies mislead more than they help. They anthropomorphize a technology that works nothing like human cognition.

First principles cut through the fog. Once you understand how AI actually works at a fundamental level, you can evaluate any claim. You don't need to understand the math. You need to understand the core truths that constrain what's possible.

What You'll Get From This Article

By the end, you'll have a small set of foundational truths about how AI actually works. You'll have mental models to evaluate any AI product, pitch, or headline. You'll know the right questions to ask. And you'll have a framework that won't become obsolete as the technology evolves, because first principles don't change even when implementations do.

Who This Is For

This article is for business leaders evaluating AI products and vendors. It's for investors assessing AI companies and claims. It's for executives deciding where AI fits in their organizations. It's for anyone who wants to understand AI rather than simply accept what they're told about it.

What This Is Not

This is not a technical manual. It's not a guide to building AI systems. It's not a prediction of where AI is headed. It's the fundamentals, explained clearly, so you can reason for yourself.

---

Executive Summary

This one-page summary captures the core ideas. Read this first, then dive deeper into the sections that matter most for your situation.

The Ten Principles (One-Liners)

  1. AI learns from examples, not instructions. It finds patterns in data; nobody programs the rules.
  2. AI finds patterns, not meaning. It detects correlations without understanding cause or context.
  3. A model only knows what's in its training data. No data, no capability. Biased data, biased model.
  4. No signal, no learning. If the pattern doesn't exist or is buried in noise, AI can't find it.
  5. Generalization is the whole game. Memorizing training data is easy; performing on new data is hard.
  6. Different problems need different models. There's no single 'AI.' Wrong tool, wrong results.
  7. Confidence scores are not truth scores. A model can be 99% confident and completely wrong.
  8. AI can't predict the future (unless physics applies). Markets, politics, and black swans resist forecasting.
  9. Scale matters, but can't overcome missing signal. More data helps only if there's something to learn.
  10. Trust but verify. Use AI to accelerate work, not replace judgment.

Five Red Flags in Vendor Pitches

  • "99% accuracy" on complex, real-world problems
  • Predictions for inherently low-signal domains (stock prices, political events)
  • No explanation of training data or methodology
  • Demo-only results with no production track record
  • "AI-powered" used as pure marketing with no specifics

Five Questions to Ask in Any AI Demo

  • What was this trained on, and how recent is the data?
  • How was performance measured, and on what data?
  • What happens when it encounters something outside its training?
  • Has this been validated in production, not just in the lab?
  • What's the plan for handling drift and model decay over time?
---

Part I: Understanding AI

The Paradigm Shift

The Old Way: Programming

For decades, software worked the same way. Humans wrote explicit rules. Computers followed those rules exactly. If you wanted a computer to recognize spam, you wrote rules about what spam looks like: "if the subject line contains 'FREE MONEY' and the sender is unknown, mark as spam."

This approach still works well for many problems. Accounting software, database queries, word processors: these are all built on explicit rules that humans wrote down.

But some problems have too many rules to write by hand. How do you write rules to recognize a cat in a photo? A cat can be any color, any size, any position, partially hidden, photographed from any angle, in any lighting. The number of rules you'd need is effectively infinite. Humans can recognize cats instantly, but we can't articulate the rules we're using.

The New Way: Model Training

The breakthrough of modern AI is a different approach. Instead of writing rules, you provide examples. Instead of telling the computer what spam looks like, you show it thousands of emails labeled "spam" or "not spam." The computer finds the patterns itself.

Those learned patterns become the implicit "rules." Nobody wrote them down. Nobody can fully explain them. You can't look inside the model and read the rules; the model is a collection of numbers that are often impossible for humans to interpret, aside from small, trivial models. But they work.

This is the paradigm shift worth understanding: we went from writing code that does something to writing code that learns to do something. This single idea underpins everything else in this article.

Why This Matters

This shift means we can now solve problems that were impossible before. Image recognition, speech-to-text, language translation, game playing at superhuman levels: all of these became possible once we stopped trying to write rules and started training systems on examples.

But it also changes what "working" means. With traditional software, you can trace exactly why you got a result. With trained models, the rules are implicit in millions of numerical parameters. The model works, but nobody can fully explain why.

And critically: the model is only as good as the examples it learned from. This point cannot be overstated: bad training data will produce a bad model, no matter how sophisticated the algorithm or how powerful the hardware. Garbage in, garbage out still applies, but with higher stakes. Bad training data doesn't just produce bad outputs; it can produce a model that seems to work but fails catastrophically when deployed. The model learned the wrong patterns, and you may not discover this until it's too late. No amount of algorithmic cleverness can compensate for flawed training data.

What Didn't Change

Computers still just execute instructions. There's no magic, no understanding, no consciousness. It's math at enormous scale.

The instructions changed from "follow these business rules" to "adjust these parameters to minimize errors on training examples." But it's still computation. Unlike traditional software, many AI systems, especially generative models, can produce different outputs for the same input. This non-deterministic behavior is a feature, not a bug, and we'll explore what it means shortly.

Why this matters: AI isn't a thinking entity that might surprise us with genuine insight. It's a pattern-matching system that can only reflect patterns present in its training data.

How Training Works

The Basic Process

Before diving into the mechanics, a warning: the process I'm about to describe is straightforward, and that's precisely the trap. It's easy to think "I have data, I'll train a model, problem solved." This mindset leads to most AI failures. The questions you should be asking come before you ever start training: Is there a learnable pattern in this data? Do my inputs actually contain information that predicts my outputs? Is the signal strong enough to learn? We'll cover these questions in detail. But first, the mechanics (and where you should not start).

Training an AI model follows a consistent process. You start with a problem you want to solve: classify images, predict prices, generate text, whatever the task is.

Then you gather labeled examples. If you want to classify images of cats versus dogs, you need thousands of images labeled "cat" or "dog." If you want to predict house prices, you need historical data on houses and what they sold for.

You choose a model architecture appropriate to the problem. Different problems need different structures, which we'll cover later.

Then you train. The model makes predictions on your training examples, compares its predictions to the correct answers, and adjusts its internal parameters to reduce errors. This happens millions of times. Gradually, the model gets better at the task.

Finally, you test. You check performance on data the model hasn't seen during training. This is the moment of truth.

The Goal: Generalization

The goal of training isn't to memorize your examples. It's to learn the underlying pattern so the model can handle new situations it wasn't explicitly trained on.

This is called generalization. A model that generalizes has learned something real about the problem. It can classify cats it's never seen before because it learned what "cat-ness" looks like, not just what those specific training images looked like.

Generalization is the whole game. Without it, you just have an expensive lookup table.

The Failure Mode: Overfitting

The most common failure is overfitting. The model learns the training data too well. It picks up noise, quirks, and coincidental patterns specific to that particular data set rather than the underlying pattern.

An overfit model performs brilliantly on training data and terribly on new data. It memorized the answers instead of learning the pattern.

This is where misleading AI demos hide. A vendor shows you impressive accuracy numbers, but those numbers are on the training data or a test set that's too similar to the training data. In the real world, the model falls apart.

The Other Failure: Underfitting

The opposite problem is underfitting. The model hasn't learned enough. It's too simple for the problem, or it hasn't trained long enough. Performance is poor everywhere, training data and new data alike.

Underfitting is usually easier to spot and fix. Overfitting is the sneaky one.

The Testing Trap

Standard practice is to hold out some data for testing. Train on 80%, test on the remaining 20%. If the model performs well on the test set, you have some evidence it generalizes. In practice, teams use validation sets, cross-validation, and time-based splits for time series data to get more robust estimates.

But test sets can be flawed. If your test set is too similar to your training set, you're not really testing generalization. The real world is always messier than any test set.

And this is where things usually break.

Distribution shift happens when the data you encounter in production differs systematically from training data. Customer behavior changed. Market conditions shifted. You're serving a different population than you trained on.

One common form of distribution shift is non-stationarity: the world changes over time. Customer preferences evolve. Markets shift regimes. Competitors adapt. Regulations change. A model trained on last year's data learned last year's patterns. If those patterns no longer hold, the model fails, often without warning. "It worked before" is never a guarantee it will work tomorrow.

There's another subtle trap that catches even experienced teams: feedback loops. Once deployed, a model can change the very behavior it's trying to predict. A recommendation system influences what people buy, which changes the purchase data, which changes what the model learns. A fraud detection system catches certain patterns, so fraudsters adapt. A hiring algorithm affects who gets interviewed, which changes the pool of successful hires. These feedback loops cause models to drift, reinforce biases, or optimize for the wrong outcomes without anyone noticing.

"Works in the lab" doesn't mean "works in production." This gap has killed more AI projects than any technical limitation.

Signal and Noise

The Core Concept

Every dataset contains signal and noise. Signal is the pattern you want to learn. Noise is random variation that obscures the signal.

The ratio between them determines how learnable a problem is. High signal-to-noise means the pattern is clear and learning is possible. Low signal-to-noise means the pattern is buried or doesn't exist, making learning hard or impossible.

This is perhaps the most important concept for evaluating AI claims. Some problems are fundamentally learnable. Others are not. No amount of AI sophistication can overcome the absence of signal.

Why Some Problems Are Easy

Recognizing a cat in a photo is a high-signal problem. Cats have consistent features: ears, whiskers, body shape, movement patterns. The visual information reliably indicates "cat" or "not cat." The signal is strong, and with enough examples, a model can learn to detect it.

Speech recognition is similar. The acoustic patterns that correspond to words are consistent. There's noise in the form of accents, background sounds, and recording quality. But the signal is strong enough to cut through.

Why Some Problems Are Hard

Predicting tomorrow's stock price is a low-signal problem. Stock prices are influenced by countless factors: company performance, market sentiment, geopolitical events, interest rates, random human decisions, news that hasn't happened yet. Historical patterns don't reliably predict future movements because the underlying dynamics are too complex and too influenced by unknowable factors.

The signal, if it exists at all, is weak. Random noise dominates. You can find patterns in historical data, but those patterns don't reliably repeat because they were mostly noise to begin with.

Predicting undersea terrain from surface satellite data faces the same problem. The signal is low (the surface reveals little about what's far below) and the noise is high (hundreds of meters of water obscure the terrain details). The information you need simply isn't in the data you have.

The Spectrum of Learnability

Problems exist on a spectrum based on their signal and noise characteristics.

High signal, low noise (AI works well): Image classification, speech recognition, language translation, spam detection, medical imaging analysis. These problems have strong patterns and minimal interference.

Medium signal, medium noise (AI can help, with caveats): Demand forecasting with stable patterns, credit risk scoring, predictive maintenance with good sensor data, recommendation systems. There's signal here, but it's noisier. Results are useful but not definitive.

Low signal, high noise (AI struggles or fails): Stock price prediction, long-term weather forecasting, predicting individual human behavior, forecasting political events, predicting complex systems from limited observations. The signal is too weak or nonexistent.

Low signal, low noise (AI still struggles): Sometimes the data is clean but the effect you're looking for barely exists. Marketing campaigns with less than 1% uplift, rare events with few predictive features. There's nothing wrong with the data—there's just almost nothing to find.

Definitions:

  • Signal = stable, learnable structure that generalizes over time
  • Noise = irreducible randomness, measurement error, label ambiguity, and adversarial behavior

Signal and Noise: Impact on AI Success

Signal Noise Expected AI Success Representative Examples
High Low High Image recognition with well-labeled data, spam detection, OCR
High High Medium Speech recognition in far-field or multi-speaker environments, vision in poor lighting, real-time traffic prediction during disruptions
Medium Low Medium Demand forecasting for stable products, credit scoring under regulatory constraints
Medium High Low Weather forecasting beyond ~10-14 days (skill drops steeply for most decision-relevant variables)
Low Low Low Weak effects in clean data (e.g., marketing uplift of <1%, rare events with few predictive features)
Low High Very Low Short-horizon stock price movements, long-range political or geopolitical event forecasting

The Snake Oil Connection

Many AI scams exploit low-signal problems. Here's how it works: someone trains a model on historical data and finds patterns that seem predictive. The model performs well on the test set. They claim predictive power and sell it.

The problem? Those patterns were noise, not signal. They happened to appear in both the training and test data by coincidence. In production, when the model encounters genuinely new data, it fails. The patterns don't repeat because there was never a real underlying relationship to learn.

"We trained a model" doesn't mean "there was something to learn." This distinction is critical.

When evaluating AI claims, ask yourself: is there a plausible reason the inputs would predict the outputs? Do the inputs actually contain the information needed? If you can't explain why the signal should exist, be skeptical.

More data and bigger models cannot create signal that isn't there.

Probabilistic vs. Deterministic

Traditional Software is Deterministic

Traditional software is designed to be deterministic: given the same inputs and environment, you get the same output every time. 2 + 2 = 4, always. A spreadsheet formula produces the exact same result on every run. You can trace exactly why you got a result. Bugs are reproducible because the same inputs produce the same outputs.

This determinism is what makes traditional software trustworthy for critical applications. You can test it, verify it, and know it will behave the same way in production.

AI Models are Probabilistic

AI models work differently. The same input can yield different outputs, especially with generative models when sampling is enabled. Even when outputs are consistent, they're based on statistical likelihood, not certainty.

The model gives you its best guess, not the right answer. "Best guess" means it could be wrong.

A classification isn't "this is spam." It's "this is probably spam with 94% confidence." A generated answer isn't "the truth." It's "a plausible response given patterns in training data."

Why This Confuses People

We're used to computers being right. Software either works or crashes. It doesn't guess. When your calculator says 2 + 2 = 4, you don't wonder if it's confident enough to trust.

AI presents guesses with the same authority as facts. There's no warning label, no "I'm not sure," just an answer. The interface doesn't distinguish between high-confidence and low-confidence outputs. It all looks the same.

This creates a dangerous illusion of certainty.

The Practical Implication

Never treat AI output as ground truth without verification. Build processes that account for errors. The question isn't "is AI perfect?" It's "is it good enough for this use case, with appropriate checks?"

For some applications, 90% accuracy is transformative. For others, it's useless. Context matters. Human oversight matters. Verification matters.

Large Language Models

Why LLMs Get Their Own Section

Large language models like ChatGPT and Claude are the most visible AI today. They're what most people picture when they hear "AI." They also have specific characteristics and failure modes that the general framework doesn't fully cover.

What LLMs Actually Are

LLMs are trained to predict the next word (i.e., token). Given a sequence of text, what word is most likely to come next? That's the core task. They do this prediction over and over, each time feeding their output back as input to generate the next word. That's all they do, and they do it remarkably well.

They learned from massive amounts of text: books, websites, articles, code, conversations. Billions of pages. Through this training, they developed the ability to generate fluent, contextually appropriate text on almost any topic.

They are not databases or search engines. And they are certainly not reasoning machines. Some researchers call them "stochastic parrots": systems that probabilistically repeat and recombine patterns from their training data without any understanding of what the words actually mean. They have no model of how the physical world works, no understanding of cause and effect, no concept of space, time, or objects. All they have is a language model that excels at predicting what word comes next.

What Makes Them Seem Smart

LLMs have seen so much text that they can mimic expertise in almost any domain. They're fluent, confident, and articulate. They can synthesize information across topics. They're excellent at format and structure.

This creates a powerful impression of intelligence. When a system can discuss quantum physics, write poetry, explain legal concepts, and debug code, it's hard not to think of it as "understanding" these things. But they don't understand any of this, the text patterns they generate simply fit the language model they have.

The Reasoning Illusion

This is perhaps the most important misconception to correct: LLMs do not reason.

They can produce step-by-step explanations. They can show their work. They can walk through logic, math problems, and arguments. This looks like reasoning.

But here's what's actually happening: the model learned patterns from text written by humans who were reasoning (i.e., the training data include many text examples of step-by-step reasoning). It's reproducing the texture of reasoning, not performing reasoning. It predicts what reasoning text should look like, word by word. There's no internal logic engine, and no persistent memory beyond the limited context window. It doesn't self-verify.

Here's where it gets dangerous: the model can produce flawless-looking reasoning that's completely wrong. It can skip steps, make logical errors, or contradict itself without noticing. It has no way to check its own work. I've seen executives fooled by beautifully structured arguments that fell apart under scrutiny. Looking like reasoning and being reasoning are different things.

The test: real reasoning is consistent. Change the framing, get the same answer. LLMs are sensitive to framing. Change how you ask, get different answers. Real reasoning can catch its own errors. LLMs often double down on mistakes when challenged.

What Makes Them Fail

Hallucinations: LLMs generate plausible-sounding false information. They'll cite papers that don't exist, quote statistics they made up, and describe events that never happened. They do this confidently, with no indication they're fabricating.

No fact-checking mechanism: They're trained to sound right, not to be right. A well-written wrong answer looks identical to a well-written right answer.

Knowledge cutoff: They don't know what happened after their training data ended. They can't access current information unless given tools to do so.

Unreliable on math and logic: Despite appearances, they struggle with multi-step reasoning, arithmetic, and logical consistency.

Prompt Sensitivity

Small changes in how you phrase a question can yield very different answers. This is both a feature and a bug. It's why "prompt engineering" exists as a skill: getting good results from LLMs often depends on how you frame your request.

This sensitivity is a sign that the system isn't reasoning from principles. A human expert gives consistent answers regardless of how you phrase the question. An LLM's answer depends heavily on wording.

The Plausibility Trap

LLMs optimize for plausible, not true. Their training objective rewards generating text that sounds like the training data. Truth is correlated with plausibility but isn't the same thing.

This creates a trap: the better LLMs get at sounding good, the harder it is to spot errors. A poorly written wrong answer is easy to dismiss. A beautifully articulated wrong answer is persuasive.

How to Use Them Well

LLMs are excellent for drafts, brainstorming, synthesis, formatting, and getting unstuck. They're not reliable for facts, citations, or anything requiring precision.

Verify anything that matters. Treat output as a starting point, not a finished product. Use them to accelerate your work, not to replace your judgment.

---

Part II: First Principles of AI

Everything we've covered distills into these ten principles. Each one has practical implications for how you evaluate AI claims.

How to Use These Principles

Treat these as a checklist in vendor meetings, investment pitches, and project evaluations. When someone makes an AI claim, run through the principles: What was it trained on? Is there signal? Does it generalize? Can it handle distribution shift? Is the confidence warranted? You don't need to ask all ten questions every time. But knowing them gives you the foundation to spot overclaiming and ask the questions that matter.

Note: The "In practice" examples throughout this section are simplified composites based on real-world patterns, not specific case studies.

Principle 1

1. AI learns from examples, not instructions

This is the paradigm shift. AI systems are trained on examples, not programmed with rules. The model finds patterns in data; nobody writes those patterns down explicitly.

In practice: A healthcare company tried to build a diagnostic tool by having doctors write rules for identifying conditions. After years of effort, the rule-based system covered only a fraction of cases. They switched to training a model on thousands of labeled patient records and achieved broader coverage in months.

Implication

Always ask "what was it trained on?" The training data determines what the model can and cannot do.

Principle 2

2. AI finds patterns, not meaning

AI detects statistical correlations. It doesn't understand what those correlations mean. It can't reason about cause and effect. It matches patterns it's seen before.

In practice: A loan approval model learned that applicants who filled out forms in all-caps were higher default risks. The model was detecting correlation (certain demographics typed in caps), not causation. When the form was redesigned to normalize text input, the model's performance collapsed.

Implication

AI can be confidently wrong in ways that seem absurd to humans, because it's matching patterns without understanding context.

Principle 3

3. A model only knows what's in its training data

No training data, no capability. If something wasn't in the training set, the model can't know about it. Biased data produces biased models. Incomplete data produces incomplete models.

In practice: A resume screening model trained on a company's historical hiring data learned to favor candidates similar to past hires. Since the company had historically hired few women in technical roles, the model penalized resumes with women's college names and women's professional organizations.

Implication

Data quality matters more than algorithm sophistication. Ask about the data before asking about the model.

Principle 4

4. No signal means no learning; too much noise means no learning

The pattern must exist in the data for AI to find it, and it must be detectable above the noise. Some problems have weak or nonexistent signal. Others have signal buried under so much noise it can't be extracted. Either way, the result is the same: the model can't learn what isn't learnable.

There's another trap: signal that exists today may not exist tomorrow. Non-stationarity, where the underlying patterns in the world change over time, is one of the most common reasons AI systems fail after deployment. The model learned real patterns, but those patterns stopped being true. Markets shifted. Behavior changed. The signal vanished.

In practice: A telecom company built a churn prediction model that worked well for two years, then accuracy dropped sharply. Investigation revealed that a competitor had launched an aggressive promotion, fundamentally changing customer behavior. The patterns the model learned no longer existed.

Implication

Ask whether the inputs can actually predict the output, whether the signal is strong enough to rise above the noise, and whether the patterns are stable enough to persist. If any of these fail, the AI can't help you.

Principle 5

5. Generalization is the whole game

Memorizing examples is easy; learning the underlying pattern is hard. A model that doesn't generalize is useless in production, no matter how well it performs on training data.

In practice: A retailer's demand forecasting model showed 95% accuracy in testing. When deployed to a new region, accuracy dropped to 60%. The model had learned patterns specific to the original region's demographics, weather, and local competitors. It had memorized, not generalized.

Implication

Demand evidence of performance on truly new data, not just test sets that resemble training data.

Principle 6

6. Different problems need different models

There's no single "AI." Classification, regression, generation, reinforcement learning: these are different tools for different jobs. Using the wrong tool guarantees bad results.

In practice: A company tried to use a large language model to predict equipment failures because "it's the most advanced AI." LLMs are designed for text generation, not time-series sensor data. A simpler anomaly detection model trained on equipment sensor readings significantly outperformed the expensive LLM approach.

Implication

Be suspicious of one-size-fits-all claims. Ask what type of problem this is and why this approach fits.

Principle 7

7. Confidence scores are not truth scores

A model can output 99% confidence and be completely wrong. Confidence reflects how well the input matches patterns the model has seen, not whether the output is correct.

In practice: An image classification model trained on wildlife photos was shown a picture of a stuffed animal. It classified it as a real bear with 97% confidence. The model had never seen stuffed animals in training, so it matched the input to the closest pattern it knew. High confidence, completely wrong.

Implication

Never treat AI output as ground truth without verification. High confidence doesn't mean high accuracy.

Principle 8

8. AI can't predict the future (unless physics applies)

AI can predict well when the underlying system has learnable structure: tomorrow's weather, equipment failures from sensor patterns, demand for stable products. But it struggles with domains where signal is weak, where predictions change behavior (reflexivity), or where genuine novelty dominates. Stock prices, political outcomes, economic shifts: these involve too many unknowable factors. In liquid public markets, persistent predictive edges are hard to sustain, and many apparent wins vanish after transaction costs and regime changes. If a vendor claims reliable short-horizon price prediction, demand rigorous out-of-sample validation and a live track record net of costs. Black swan events, regime changes, and genuinely new situations are outside what any model can handle.

In practice: Many AI-powered trading systems have shown impressive backtested returns, only to lose money in live trading. The models found patterns in historical data that didn't persist. The past is not a reliable guide to the future when the underlying dynamics change.

Implication

"Predicts the future" is almost always overclaiming. Be especially skeptical of AI claims about prediction.

Principle 9

9. Scale matters, but can't overcome missing signal

More data, more compute, and larger models generally improve performance. This is why big tech dominates frontier AI. But scale can't create signal that doesn't exist.

In practice: A startup claimed their massive dataset and GPU cluster would finally crack stock prediction. They had 10x more data than competitors. But more historical price data doesn't create predictive signal that isn't there. They burned through their funding and shut down.

Implication

Understand both the resource requirements and the fundamental limits of any AI approach.

Principle 10

10. Trust but verify

AI can be a powerful tool, but not an authority. Use it to accelerate work, not replace judgment. The cost of verification is almost always lower than the cost of blind trust.

In practice: A law firm used an LLM to draft a legal brief. The brief cited six court cases. A junior associate was asked to verify the citations before filing. Good thing: three of the cases didn't exist. The LLM had hallucinated plausible-sounding citations. The verification step took an hour and prevented a potential malpractice situation.

Implication

Build verification into your workflow as a standard practice, not as an afterthought.

---

Part III: Applying the Principles

The AI Zoo

Why One "AI" Doesn't Exist

AI is a collection of techniques, not a single technology. Talking about "AI" as if it's one thing is like talking about "vehicles" as if cars, boats, and airplanes are interchangeable. They're all vehicles, but you wouldn't use a boat to drive to work.

Understanding the major categories helps you match problems to solutions and spot mismatches in vendor claims.

Supervised Learning

The model learns from labeled examples. You show it inputs paired with correct outputs, and it learns to map inputs to outputs.

Classification: Is this email spam or not? Is this image a cat or a dog? Is this transaction fraudulent? The output is a category.

Regression: What price will this house sell for? How many units will we sell next month? The output is a number.

Supervised learning requires labeled data. Someone has to tell the model the right answer for the training examples. This labeling can be expensive and time-consuming.

Unsupervised Learning

The model finds structure in unlabeled data. No one tells it the right answer; it discovers patterns on its own.

Clustering: Group similar customers together. Find natural segments in your data.

Dimensionality reduction: Simplify complex data while preserving important patterns.

Unsupervised learning is useful for exploration and understanding your data, but less reliable for prediction.

Reinforcement Learning

The model learns by trial and error with rewards. It takes actions, receives feedback (rewards or penalties), and adjusts to maximize cumulative reward.

This is how AI learned to play games like chess and Go at superhuman levels. It's also used in robotics, optimization, and resource allocation.

Reinforcement learning requires a well-defined reward signal. If you can't clearly specify what success looks like, the model can't learn to achieve it.

Generative Models

The model learns to create new examples similar to its training data. This includes large language models, image generators, and audio synthesizers.

The current hype cycle lives here. ChatGPT, Claude, DALL-E, Midjourney, and similar tools are generative models.

Generative models are powerful but come with the hallucination and verification challenges we discussed earlier.

Matching Problems to Tools

Using the wrong category for your problem guarantees failure. A hammer is great for nails and useless for screws.

When someone says "we used AI," that means nothing without knowing which kind. A fraud detection system and a chatbot are both "AI," but they're completely different tools solving completely different problems.

How to Evaluate AI Claims

Questions to Ask

What was it trained on? The training data determines capabilities and limitations. Ask about data sources, data quality, and potential biases.

Is there a plausible reason the inputs would predict the output? If you can't explain why the signal should exist, be skeptical.

How was performance measured? Metrics on training data are meaningless. Ask about test methodology and real-world validation.

What does it do when it encounters something outside its training? Graceful degradation matters. Does it fail silently or flag uncertainty?

Who benefits from this working vs. appearing to work? Follow the incentives. Vendors have reasons to oversell.

Has it been tested in real-world conditions? Lab results and production results are different things.

Red Flags

"Predicts X with 99% accuracy" for complex, real-world phenomena. The real world is messy. Claims of near-perfect accuracy on hard problems are almost always overfitting or misleading metrics.

Predictions for low-signal problems. Stock prices, election outcomes, individual human behavior. If the signal isn't there, the prediction is worthless.

No explanation of training data or methodology. Legitimate AI practitioners can explain what they did. Black boxes are hiding something.

"AI-powered" as pure marketing. The phrase means nothing without specifics. What kind of AI? Solving what problem? Trained on what?

Demo-only results. If they can only show you controlled demonstrations and not production deployments, be cautious.

The Snake Oil Patterns

Claiming to find signal in noise. Selling predictions for fundamentally unpredictable things.

Overfitting to historical data presented as predictive power. The model memorized the past but can't predict the future.

Narrow benchmarks extrapolated to general capability. Beating a specific test doesn't mean solving a real problem.

Correlation sold as causation. The model found a correlation, but that doesn't mean one thing causes another.

Pattern-matching sold as "intelligence" or "understanding." Anthropomorphizing capabilities that are just statistics.

What AI Can and Cannot Do

What AI Does Well

Pattern recognition at scale. Finding patterns in data that humans would miss or couldn't process fast enough.

Classification and categorization. Sorting things into buckets based on learned patterns.

Synthesis and summarization. Condensing large amounts of information into digestible forms.

Translation between formats and modalities. Text to speech, speech to text, image to text, language to language.

Automation of well-defined cognitive tasks. Tasks that are routine and pattern-based, even if they require "judgment."

Optimization within known constraints. Finding the best solution when the problem is well-specified.

What AI Does Poorly (Today)

Reasoning about novel situations. When something falls outside training patterns, models struggle.

Understanding cause and effect. Correlation is not causation, and models can't tell the difference.

Handling edge cases gracefully. Unusual inputs often produce bizarre outputs.

Knowing what it doesn't know. Models lack metacognition. They can't reliably identify their own limitations.

Operating outside training distribution. Performance degrades unpredictably when data differs from training.

What AI Cannot Do (Today)

Reliably predict low-signal or novelty-driven domains. AI can forecast physical systems with learnable structure (weather, equipment failures, trajectories). But it cannot reliably predict markets, geopolitics, or any domain dominated by reflexivity, unknowable factors, or genuine novelty. Anyone claiming otherwise is selling something.

Handle black swans and regime changes. Genuinely novel events and structural shifts are outside any model's capability. The past is not always a guide to the future.

Replace human judgment on ambiguous, high-stakes decisions. When the stakes are high and the situation is unclear, human oversight is essential.

Guarantee correctness. AI outputs are probabilistic. They can always be wrong.

Understand meaning in the human sense. What looks like understanding is pattern matching. The comprehension isn't there.

---

Conclusion

These ten principles won't make you an AI expert. But they give you a foundation to build upon and to reason about AI claims yourself.

When someone pitches you an AI product, you now know the questions to ask. What was it trained on? Is there signal in this data? How does it handle novel situations? Has it been validated in production?

When you read headlines about AI breakthroughs, you can evaluate them against first principles. Does this claim violate something fundamental about how AI works? Is someone selling prediction in a low-signal domain? Are impressive-sounding numbers hiding overfitting?

The goal isn't to be cynical about AI. The technology is genuinely powerful for the right problems. The goal is to be clear-eyed: to know what AI can actually do, what it can't, and how to tell the difference.

First principles don't change even when implementations do. The specific techniques will evolve. New architectures will emerge. But the fundamental constraints remain. Training data still matters. Signal still matters. Generalization is still the whole game. Trust but verify is still the right approach.

Use these principles. Ask the hard questions. Don't accept hype. And when AI does work, when it genuinely solves a problem with strong signal and good generalization, put it to work. That's where the real value is.

---

Appendix: Glossary of AI Terms

Algorithm: A set of rules or instructions for solving a problem. In AI, this often refers to the training procedure or the model architecture.

Classification: A type of supervised learning where the model predicts which category an input belongs to.

Clustering: An unsupervised learning technique that groups similar data points together.

Context window: In language models, the amount of text the model can consider at once when generating a response.

Deep learning: Machine learning using neural networks with many layers. Most modern AI breakthroughs use deep learning.

Distribution shift: When the data encountered in production differs systematically from the training data, causing model performance to degrade.

Feature: An individual measurable property of the data. Features are the inputs the model uses to make predictions.

Generalization: A model's ability to perform well on new data it hasn't seen during training. The core goal of machine learning.

Gradient descent: The mathematical technique used to adjust model parameters during training to reduce errors.

Hallucination: When a model generates plausible-sounding but false information. Common in large language models, and disturbingly confident.

Hyperparameter: Settings that control the training process, such as learning rate or number of training iterations.

Inference: Using a trained model to make predictions on new data, whether in production or batch processing.

Label: The correct answer attached to a training example. Labels are required for supervised learning.

Large language model (LLM): An AI model trained on massive amounts of text to generate human-like language. Examples include GPT and Claude.

Machine learning: The field of AI focused on systems that learn from data rather than being explicitly programmed.

Model: The trained system that makes predictions. Contains learned parameters that encode patterns from training data.

Neural network: A model architecture inspired by biological neurons. Consists of layers of connected nodes that transform inputs into outputs.

Noise: Random variation in data that doesn't reflect the underlying pattern. Noise obscures signal.

Overfitting: When a model learns the training data too well, including its noise and quirks, and fails to generalize to new data. Looks great in demos, fails embarrassingly in production.

Parameter: A number inside the model that gets adjusted during training. Large models have billions of parameters.

Prompt: The input text given to a language model to generate a response.

Prompt engineering: The practice of crafting inputs to get better outputs from language models. Part art, part science, part frustration.

Regression: A type of supervised learning where the model predicts a numerical value rather than a category.

Reinforcement learning: A type of machine learning where the model learns through trial and error, receiving rewards or penalties.

Signal: The actual pattern in data that you want to learn. Strong signal makes problems learnable.

Signal-to-noise ratio: The strength of the learnable pattern relative to random variation. Determines how learnable a problem is.

Supervised learning: Machine learning using labeled examples where the correct answers are provided during training.

Test set: Data held back from training to evaluate how well the model generalizes. Only as good as how different it is from training data.

Token: A unit of text processed by a language model. Can be a word, part of a word, or punctuation.

Training: The process of adjusting model parameters using examples to improve performance.

Training data: The examples used to train a model. Quality and coverage of training data largely determine model capabilities.

Transfer learning: Using a model trained on one task as a starting point for a different but related task.

Underfitting: When a model is too simple or hasn't trained enough to capture the patterns in the data.

Unsupervised learning: Machine learning without labels, where the model finds patterns in data without being told what to look for.

Validation set: Data used during training to tune hyperparameters and check for overfitting. Separate from the test set.

Weights: Another term for parameters. The numerical values that encode what the model has learned.

Share this article

Ready to apply these principles?

I help mid-market companies navigate AI transformation. Let's discuss how these first principles apply to your specific situation.

Let's Talk