Ad Hoc Selection vs Random Sampling
Two ways to pick what you measure. One is rigorous and defensible; the other is convenient and biased. Random Sampling wins, and it isn't close.
The short answer
Random Sampling over Ad Hoc Selection for most cases. Random Sampling gives every unit a known, nonzero chance of inclusion, which is the only thing that lets you generalize from a sample to a population with.
- Pick Ad Hoc Selection if doing fast, throwaway exploration, a qualitative probe, or you genuinely cannot enumerate a sampling frame and you are honest that the result generalizes to nothing
- Pick Random Sampling if need to estimate a population parameter, report confidence intervals, run an experiment, or defend a number to anyone who can read a stats textbook — i.e., almost always
- Also consider: Stratified or cluster sampling when the population is uneven or random access is expensive; they keep the inference guarantees while controlling cost.
— Nice Pick, opinionated tool recommendations
What they actually are
Neither of these is a tool you install — they're competing strategies for choosing which units you observe. Ad Hoc Selection means you pick based on convenience, gut, or whatever is in front of you: the first 50 rows, the customers who answered, the servers you happened to SSH into. Random Sampling means every unit in a defined frame has a known probability of selection, enforced by an actual randomizer, not your judgment. The distinction sounds academic until you try to make a claim. Ad hoc tells you about the things you grabbed. Random sampling tells you about the population those things came from. That gap — from 'this batch' to 'all of it' — is the entire reason sampling theory exists, and ad hoc selection quietly refuses to cross it while pretending it has.
Bias: the part everyone underestimates
Ad Hoc Selection's fatal flaw is that its bias is invisible and unmeasurable. When you sample the customers who replied, you've selected for people with time and opinions — your satisfaction score is inflated and you'll never know by how much. When you test on the data files that loaded easily, you've excluded the corrupt ones that are exactly where your bug lives. Random Sampling doesn't make bias disappear, but it converts it into sampling error you can quantify with a confidence interval and shrink by adding n. That's the trade: ad hoc gives you a number with an unknown, uncorrectable error; random gives you a number with a known, shrinkable one. One is a guess wearing a lab coat. The other is a measurement. Pretending they're equivalent because both produce a percentage is how bad decisions get a spreadsheet.
Cost and the honest case for ad hoc
Random Sampling isn't free. You need an enumerable frame, a way to reach any selected unit, and the discipline to chase the awkward ones instead of substituting an easier neighbor — and that last part is where most 'random' studies quietly rot into ad hoc. If you can't list the population or can't access a randomly chosen unit, true random sampling is impossible, not just expensive. That's the one honest home for Ad Hoc Selection: early exploration, debugging, qualitative probes, or smoke tests where you're hunting for the existence of a problem, not estimating its rate. Grabbing five logs to see if a feature works is fine. Grabbing five logs and reporting '4% error rate' is malpractice. The sin isn't using convenient data; it's using convenient data and then making a population claim you didn't earn.
The verdict
Random Sampling wins because it's the only one of the two that earns the word 'representative.' Ad Hoc Selection is faster, cheaper, and occasionally the only option — but the moment you want to generalize, report a metric, or convince a skeptic, its convenience becomes a liability you can't price. The tell is simple: if your conclusion contains a number meant to describe a whole population, you needed random (or stratified/cluster) sampling, full stop. If you're just poking around to see what's there, ad hoc is fine and nobody should pretend otherwise. Most people reach for ad hoc because the frame is annoying to build, then launder the result into a confident statistic. Don't. Build the frame, randomize, and report an interval — or admit your number describes only the pile you happened to scoop. There is no third, comfortable option.
Quick Comparison
| Factor | Ad Hoc Selection | Random Sampling |
|---|---|---|
| Generalizes to a population | No — describes only the units you grabbed | Yes — known selection probabilities enable inference |
| Bias | Present, invisible, and uncorrectable | Converted to measurable, shrinkable sampling error |
| Cost and setup effort | Cheap — use whatever's convenient | Needs a frame, access, and randomizer discipline |
| Defensibility of results | Falls apart under any statistical scrutiny | Supports confidence intervals and peer review |
| Fit for early exploration/debugging | Excellent — fast existence checks | Overkill when you only need to spot a problem |
The Verdict
Use Ad Hoc Selection if: You are doing fast, throwaway exploration, a qualitative probe, or you genuinely cannot enumerate a sampling frame and you are honest that the result generalizes to nothing.
Use Random Sampling if: You need to estimate a population parameter, report confidence intervals, run an experiment, or defend a number to anyone who can read a stats textbook — i.e., almost always.
Consider: Stratified or cluster sampling when the population is uneven or random access is expensive; they keep the inference guarantees while controlling cost.
Random Sampling gives every unit a known, nonzero chance of inclusion, which is the only thing that lets you generalize from a sample to a population with quantifiable error. Ad Hoc Selection — grabbing whatever is convenient — bakes in selection bias you cannot measure or correct, so your "findings" describe your sampling whims, not reality.
Related Comparisons
Disagree? nice@nicepick.dev