Summary

Alexis King argues that the apparent tension between static typing and “graceful handling of malformed input” dissolves once you reframe input handling as parsing rather than validation. A validator returns () on success and throws on failure, leaving the calling code with no proof anything was checked. A parser returns a more-precise type that encodes the invariant in its structure, so the proof travels with the data and downstream code can rely on it without re-checking. The mantra “parse, don’t validate” is shorthand for: at every system boundary, narrow the input type until illegal states are unrepresentable, and do so once, eagerly.

Key Claims

  • A parser is “a function that consumes less-structured input and produces more-structured output.” Text parsing is one instance; any refinement function with this shape qualifies. Validators are degenerate parsers that throw away the refinement.

  • Validation discards information; parsing preserves it. Compare:

    validateNonEmpty :: [a] -> IO ()
    parseNonEmpty    :: [a] -> IO (NonEmpty a)

    Both fail on the empty list. Only the second hands the caller a value whose type witnesses the check. After validateNonEmpty xs, downstream code still sees [a] and cannot safely call head; after parseNonEmpty xs, it has NonEmpty a and head is total.

  • Strengthen argument types instead of weakening return types. The textbook fix for partial head :: [a] -> a is head :: [a] -> Maybe a, which forces every caller to handle a Nothing case it has often already ruled out. King prefers head :: NonEmpty a -> a — the precondition lives in the type, the function is total, and the caller can’t accidentally drop the proof.

  • Parse once at the boundary; trust the types afterward. Mixing checking with processing leads to “shotgun parsing” — invalid inputs partially mutate state before failure surfaces. King invokes the LangSec 2016 paper for the security framing: shotgun parsing is a recurring root cause of CVEs.

  • Design heuristics derived from the principle:

    1. Use representations that make illegal states unrepresentable (e.g. Map k v over [(k, v)] for unique keys).
    2. Push proofs upward; refine the data into its strongest representation as early as possible.
    3. Write functions on the data you wish you had, then bridge from input.
    4. Suspect functions returning m () whose primary role is error signaling — they likely should return a refined type.
    5. Avoid denormalized mutable state; a single source of truth eliminates a class of “two fields out of sync” bugs.
    6. When unrepresentability is impractical, fall back to abstract data types with smart constructors.
    7. Multiple parsing passes are fine when context demands it; what’s not fine is processing before all parsing completes.

Notable Quotes

The difference lies entirely in the return type: validateNonEmpty always returns (), the type that contains no information, but parseNonEmpty returns NonEmpty a, a refinement of the input type that preserves the knowledge gained in the type system.

Write functions on the data representation you wish you had, not the data representation you are given.

The set of remaining failure modes during execution is minimal by comparison, and they can be handled with the tender care they require.

My Reactions

The conceptual move is small but the consequences are large. Three reactions:

  1. The principle generalizes well past Haskell. It maps cleanly onto Pydantic / dataclasses-with-validators in Python, branded types in TypeScript, NewType patterns, and even runtime guards in dynamic languages — anywhere you can return a narrower type after a check, the same trick works. King writes from the Haskell idiom, but the heuristic is language-agnostic. Worth keeping in mind when reviewing API boundaries in any typed-or-typed-enough codebase.

  2. It changes how you think about “error handling at the edges.” The common DDD / Hexagonal advice is “validate at the boundary.” King’s reframe makes that advice operational: validation alone is not enough; the boundary must produce a different type, otherwise the invariant evaporates the moment data flows inward. That’s a sharper bar.

  3. The connection to LangSec is the strongest argument. “Shotgun parsing causes security bugs” is a much harder claim to dismiss than “this style is more elegant.” Anyone resisting the principle on aesthetic grounds has to also explain why the same anti-pattern that causes RCEs in their parser is fine in their business logic.

The article’s load-bearing assumption is that your type system can express the invariants you care about. For invariants beyond what the type system reaches (e.g. “sorted list,” “balanced tree” without dependent types), the practical path is smart-constructor abstract types — parsing in spirit even when not in form.

Connections

  • Test-Driven Development — both philosophies push verification leftward, but TDD verifies behavior at runtime while parsing verifies structure at compile time. Complementary.
  • Unit Testing — many test cases exist solely to assert “this function rejects invalid input.” Parse-don’t-validate eliminates that class of test outright by making the input unrepresentable.
  • Test Doubles — boundary parsing reduces the surface area where mocks are needed, since downstream code can assume well-formed inputs unconditionally.
  • Promotion candidates (no atomic notes yet, worth writing if the topic recurs):
    • Parse, Don’t Validate — restate the principle as a single declarative claim in Atlas/Notes/. Likely the most-citable atom from this article.
    • Make Illegal States Unrepresentable — older Yaron Minsky / OCaml-community phrasing of the adjacent idea; deserves its own atomic note alongside this one.
    • Shotgun Parsing Causes Security Bugs — the LangSec framing as a standalone claim.

References King Cites