Between a Thermometer and a Crystal Ball: The Case for Responsible Genetic Testing

Why consumer SNP testing for complex traits can be valuable—if we're honest about what it can and can't do

opinion Expert Analysis geneticsSNP testingethicspolygenic scoresconsumer genomicsreproductive genetics

Consumer SNP testing for traits like "tendency toward same-sex sibships" or "intelligence" lives in an awkward middle ground. Done well, it's a thermometer—a tool that measures small temperature shifts you can use to make slightly better choices. Done poorly, it's a crystal ball—a prop that flatters certainty where none exists. I'm in favor of offering these tests, but only if we're honest about what they can and can't do.

What these tests can actually tell you

Most complex traits are polygenic and context-dependent. Single SNPs rarely decide outcomes; at best, they tilt probabilities. Polygenic scores (PGS) built from many variants capture a modest share of trait variance in the cohorts they were trained on and often lose accuracy across ancestries, environments, and time. That's not a failure of genetics; it's a reality check on biology.

So the right framing is probabilistic and incremental: "Given your genotype, your odds nudge up or down a bit." If that nudge helps you plan, great. If you expect certainty, you'll be disappointed.

Two case studies: Sex-of-offspring patterns and intelligence

1) "Multiple-birth" signals, clarified

The phrase is fuzzy, so let's separate two ideas:

Sex-of-offspring clustering across births (e.g., families with many girls in a row). Here, data suggest a family-level tilt—think "weighted coin"—that becomes visible only after multiple same-sex births. Genotyping might identify maternal variants associated with female-only or male-only sibships, but the effect sizes are small and clearest in families with a history. A report that quantifies how your odds shift (with intervals) is useful; a promise to "predict the next baby's sex" is not.

Twin propensity (especially dizygotic twinning). There are known maternal genetic contributors that influence ovulation biology and modestly raise the chance of fraternal twins. Again: modest is the operative word. A responsible service would return "baseline odds" and "genotype-adjusted odds," plus how much uncertainty remains.

In both sub-domains, the ethical lane is expectation-setting, not selection. You're helping families understand probabilities—like weather forecasts—so they can calibrate decisions (timing, financial planning, clinical conversations), not engineer outcomes.

2) Intelligence, translated

What people usually want is a number for "intelligence." What genomics can responsibly provide today is a polygenic proxy for education-related traits. That proxy is informative at the population level (groups differ in average risk/propensity), but noisy for individuals. It also entangles biology with environment (school quality, parental education, socioeconomic status) and can reflect genetic nurture (parents' genotypes shaping the environment they provide). Portability across ancestries is limited because most discovery data are Eurocentric.

If you ship an "intelligence" score without this context, you're selling numerology. If you present it as a contextualized, bounded estimate—with uncertainty, ancestry caveats, and concrete, non-deterministic guidance (sleep, reading environment, early math/language exposure, screening for learning differences)—it becomes a starting point for support, not a label.

How to design these services so they help rather than harm

Lead with base rates, then show the delta. Report the population baseline (e.g., chance of dizygotic twins for your age group; probability of another girl after two girls) and then the genotype-adjusted change. If the change is tiny, say so plainly.

Show uncertainty as a first-class output. Confidence intervals, calibration plots, and—when possible—simple visuals that compare your range with population ranges. Treat "±" as a feature, not a bug.

Disclose measurement choices. For cognitive traits, name the actual outcome (e.g., years of education, standardized test composite), the cohorts used, and known limits (ancestry transferability, cohort era effects). For sex-of-offspring and twin propensity, be explicit about whether the model uses single SNPs, a score, or family history.

Avoid outcome reification. Intelligence is not a single on/off gene; "likelihood of twins" is not destiny. Keep language probabilistic ("higher/lower odds," "modest increase") and avoid categorical labels.

Guardrail use with minors. Returning non-medical "intelligence" scores for children risks self-fulfilling prophecies and stigma. A conservative policy is to restrict such reports to adults, or to return environmental guidance only for minors.

Center utility, not spectacle. Pair results with actions that are low-risk and high-benefit regardless of genotype: evidence-based prenatal counseling, financial planning checklists, reading and numeracy interventions, sleep and exercise routines. Let genetics guide priorities, not dictate identity.

Be ancestry-aware. If a model is trained in one ancestry and you can't show performance in another, either don't report a score or label it as "insufficiently validated." This is both scientifically honest and ethically necessary.

Invite second opinions. Provide references, a clinician-facing PDF, and clear routes to genetic counselors. Treat the result as an input to a conversation, not the end of it.

What a "good" report looks like

A credible output doesn't wow with a single number. It offers a compact stack:

Baseline: "At age 32, typical dizygotic twin odds are ~X%."

Genotype effect: "Your variants increase that by ~Y percentage points (95% CI: A–B)."

Context: "Odds also depend on parity, fertility treatments, and BMI; our model doesn't include those."

Decision helper: "If you're planning pregnancy in the next 12 months, here's what to discuss with your OB."

Ethical note: "These probabilities are not recommendations to pursue or avoid conception strategies."

For intelligence-adjacent traits:

Construct clarity: "This score correlates with years of education in large cohorts; it is not an IQ test."

Expected accuracy: "Explains a modest share of differences in the training cohorts; real-world individual prediction is noisy."

Equity statement: "Performance is weaker outside the ancestry groups used to build it; we report your ancestry-specific confidence if available."

Actionables: "Evidence-based ways to support learning that matter regardless of genetics."

Why offer these tests at all?

Because quantified uncertainty beats folklore. Families already form beliefs—"we always have girls," "twins run on my mother's side," "he's just a math kid"—and act on them. A careful genetic readout can replace myths with numbers and caveats, nudging choices toward reality. That's valuable, even when the signal is small.

The catch is discipline. If a company can't resist hype, it shouldn't sell these panels. If it can commit to transparent methods, sober claims, and useful next steps, then SNP services for sensitive traits can move from parlor trick to practical tool—not a verdict on who we are, but a clearer map of the terrain we're navigating.