I Took the Meta Data Science Interview: A Real, First-Person Review

I went through the Meta data science interview this spring. For an even deeper dive into that exact gauntlet, you can check out my extended Meta interview diary. I was nervous. I was excited. I was very caffeinated. Here’s my honest take, with real questions I got, the parts I loved, the parts that stung, and what I’d do if I ran it back tomorrow.

For a crowdsourced peek at what other candidates have faced recently, I also skimmed this running list of Meta Data Scientist interview questions on Glassdoor and found it surprisingly on-point with my own experience.

Quick context

  • Role: Data Scientist, Product (consumer side)
  • Format: Remote, then virtual “onsite”
  • Rounds: Recruiter screen, hiring manager chat, SQL/stats, product sense, and a final loop

You know what? It felt more like a product workout than a test. That’s good and bad.


How the process flowed for me

  • Recruiter screen (30 min): background, timeline, level check
  • Hiring manager (45 min): team fit, how I think, a tiny case
  • SQL + stats (60 min): live query writing, metrics, test logic
  • Product sense (60 min): pick a feature, define success, trade-offs
  • Final loop (2 hours): a mix of all, plus behavioral

The recruiter step can be a black box; I’ve chronicled some eyebrow-raising recruiter tales in this separate post.

They were on time. They kept a friendly tone. Still, the pace was quick. I ate cold noodles in the 8-minute break. It tasted like stress and sesame oil.


Real examples from my rounds

1) SQL round: small tables, real stakes

They gave me two tables:

  • users(user_id, country, created_at)
  • messages(sender_id, receiver_id, sent_at)

Task 1: Find daily active senders (unique sender_id) for the last 7 days.

I wrote something like:

select
  date_trunc('day', sent_at) as day,
  count(distinct sender_id) as daily_active_senders
from messages
where sent_at >= current_date - interval '7' day
group by 1
order by 1;

Then they asked: “How would you handle late events?” I said I’d set a stable window (like 48 hours late), backfill once, and mark the backfill run. Not perfect, but honest.

Task 2: Top 3 countries by week for senders.

I joined users on messages, bucketed by week, and used a rank window. I almost tripped on null country. I called it out and filtered nulls. Calling out pitfalls seemed to help.

What they watched:

  • Clean joins (left vs inner)
  • Date grain
  • Nulls and late data
  • Why this metric, not that one

Tiny miss: I used the wrong date function at first (old habit from BigQuery). I corrected it fast and explained the change.


2) Product sense: Should we launch a “typing indicator” tweak in Messenger?

Prompt: A new typing dot shows sooner. Will this help?

I framed it:

  • Goal: More good chats, not just more taps
  • Primary metric: Reply rate per conversation
  • Secondaries: Response time, messages per user per day, session length
  • Guardrails: Blocks, reports, send fails, crash rate

I said I’d read the risk as “more pressure and ghosting.” So I’d watch new user cohorts and teen users closely. Different groups feel nudges in different ways. I shared one real story: I once shipped a nudge that looked friendly but made anxious users bounce. That got a nod.

Regional behavior differences matter—for example, many Asian chat communities use typing indicators and stickers in slightly different ways than Western audiences. If you want a quick, hands-on glimpse of that live user behavior, hop into InstantChat’s Asian chat rooms where you can observe real-time conversations and see how subtle UX tweaks like an early typing dot influence engagement at scale.

Decision rule I gave:

  • Launch if reply rate +1% or more
  • No guardrail worse than −0.5%
  • Stable for two full weekly cycles
  • No extreme subgroup harm

They pushed: “Why +1%?” I said it’s a guess, tied to revenue and habit strength, but I’d pre-register the threshold and not wiggle it.


3) Experiment design: New Stories sticker

We tested a new sticker in Stories.

  • Unit: User level, 50/50 split
  • Exposure: All Story creators; feed viewers unaffected
  • Length: 2–3 weeks (to catch weekends)
  • Power: Aim for 80%. Based on past, we expected +1.5% lifts. I said we’d need hundreds of thousands per arm. I gave a rough number, then said, “I’d confirm with our internal calculator.”

Metrics:

  • Primary: Stories posted per creator
  • Quality: Completion rate, replays
  • Creator stickiness: 7-day active creators
  • Guardrails: Report rate, time spent by viewers, crash rate

I also called out interference: If creators post more, viewers may change their time spent. If that spillover’s big, I’d consider geo split or cluster by friend graph. Not always possible, but at least I named it.

Stats call:

  • Two-sided test
  • If metric is skewed, trim top 1% or use a bootstrap
  • Pre-register the plan

4) Stats brain-teaser: Significance vs truth

They gave me a tiny case: One arm has 10% click. The other has 10.3%. p = 0.03. Should we ship?

My answer:

  • Yes, it’s “significant,” but magnitude matters
  • Check power, seasonality, and novelty effects
  • Look for p-fishing signs (many peeks, many metrics)
  • Check subgroups for harm
  • If effect is tiny and costs are real, I’d run a holdout post-launch

They liked that I didn’t chase p-values without context.


5) Behavioral: When I killed my own project

Story:

  • Situation: I led a feed tweak that lifted clicks but raised hides by 2%
  • Task: Decide fast, with little time
  • Action: I paused the ramp, showed the hide spike, and shared 3 follow-ups
  • Result: We fixed ranking rules, relaunched later, and hit +0.8% with no harm

I shared how it felt. It stung. But it built trust. I think that mattered more than a win.


What I liked

  • Real product talk. Not just theory.
  • Friendly interviewers who asked “why” a lot.
  • Clear structure. I knew what was next.
  • Hands-on SQL that felt close to work.

What bugged me

  • Time crunch. Good ideas got cut short.
  • One tool quirk (date functions) ate minutes.
  • Little space for quick data pulls; it was all whiteboard-ish.
  • Fatigue by the last loop. My brain felt like oatmeal.

How I prepped (and what actually helped)

  • Daily SQL reps on real-ish tables (users, events, sessions). I used Mode and a small local Postgres.
  • Wrote one “metric sheet” per product: Messenger, Feed, Reels. Just basic funnels and guardrails.
  • A/B test drills: I used a simple power calculator and ran toy sims in Python. Even napkin math helps.
  • Product teardowns: 15 minutes a night. What would I change? Why?
  • Mock chats with a friend who kept asking “so what?”
  • Skimmed the Meta data scientist interview guide which neatly maps out each round and packs sample questions.

I also browsed a handful of concise interview breakdowns on vhfdx.net to sanity-check my approach against other candidates’ experiences. A few that stood out were a brutally honest look at the Costco data science internship and a first-hand review of a New York data science internship.

Big help: Saying my assumptions out loud. They care more about your story than your script.


What I’d do differently next time

  • Set a stopwatch for every answer: 2–3 minutes per part
  • Lead with the decision rule, then the details
  • Keep a tiny checklist on a sticky note: metric, guardrail, power, risks
  • Practice a few “late data” takes and time zone traps
  • Eat a real snack between rounds (not cold noodles)

Scorecard

  • Depth of product talk: 4.5/5
  • SQL realism: 4/5
  • Fairness and tone: 5/5
  • Time to think: 3/5

Overall: 4.3/5. Hard, fair, and kind of fun