luvv to helpDiscover the Best Free Online Tools

Data Scientist

Learn Data Scientist for free: what to study, where to work, salary ranges, a fit test, and a full exam.

Published: January 1, 2026 | Updated: January 1, 2026

What does a Data Scientist do?

Data Scientists turn raw data into decisions. They explore and clean data, design experiments, build models, evaluate impact, and communicate insights so teams can ship better products and policies. The goal is not “a fancy model” but measurable business and user value.

A week in the life (example)
  • Mon: Clarify a stakeholder question, define success metrics, draft a quick analysis plan.
  • Tue: Query data with SQL, clean/transform in Python, visualize trends.
  • Wed: Train baseline and improved ML models, compare metrics with cross-validation.
  • Thu: Design or analyze an A/B test; check assumptions and power.
  • Fri: Write a short findings memo with charts and next-step recommendations; share in a review.

Day-to-day responsibilities

  • Translate vague problems into clear, measurable questions and metrics.
  • Pull, clean, and explore data (SQL + Python).
  • Build features and train models; choose appropriate algorithms.
  • Evaluate with rigorous metrics; stress-test assumptions.
  • Design and analyze experiments or quasi-experiments.
  • Create clear visuals and narratives for non-technical audiences.
  • Ship insights and models with engineering and product teams.

Typical deliverables

  • Exploratory analysis notebooks and data quality checks.
  • Metric definitions and dashboards.
  • Experiment design docs and impact analyses.
  • Production-ready model artifacts (code, params, monitoring plan).
  • Decision memos with visuals and recommendations.
Portfolio-friendly artifacts you can include
  • Well-documented Jupyter notebooks with a clear README.
  • Before/after charts tied to a specific decision.
  • Experiment analysis with power, effect size, and limitations.
  • Model card: problem, data, metrics, fairness checks, risks, maintenance.

Where you can work

  • Industries: tech, finance, healthcare, retail/e-commerce, media, logistics, energy, government, NGOs.
  • Teams: product analytics, growth, risk/fraud, recommendation systems, operations, marketing, research, platform/ML infra.
  • Company sizes: startups (broad scope), scale-ups (fast iteration), enterprises (deep specialization).

Hiring expectations by level

Junior / Associate
  • Scope: Well-defined tasks; close mentorship.
  • Signals: Solid SQL/Python, core stats, can clean data and produce a baseline model.
  • Interview focus: SQL querying, probability/stats, simple modeling, communication clarity.
Mid-level
  • Scope: End-to-end projects from problem framing to decision.
  • Signals: Chooses proper metrics, avoids leakage, runs experiments, explains trade-offs.
  • Interview focus: Case studies, model evaluation, experiment design, stakeholder communication.
Senior / Lead
  • Scope: Ambiguous problem spaces; sets standards; mentors others.
  • Signals: Strong judgment, drives measurable impact, balances rigor with speed.
  • Interview focus: Strategy, system-level thinking, influencing, risk/fairness, production considerations.

Salary ranges (rough)

  • Junior: ~$70k–$110k base.
  • Mid-level: ~$110k–$160k base.
  • Senior/Lead: ~$150k–$220k+ base.
  • Staff/Principal or high-comp markets: total comp can reach ~$200k–$300k+.

Varies by country/company; treat as rough ranges.

Skill map

Master these in order. Each skill includes a mini task to practice.

  • Statistics: Descriptive stats, distributions, confidence intervals, hypothesis testing. Mini task: Compute mean/median/IQR and a 95% CI for a numeric column; explain what the CI means in plain language.
  • Probability: Conditional probability, Bayes, independence. Mini task: Given P(A)=0.3, P(B)=0.5, P(A and B)=0.15, compute P(A|B) and interpret.
  • Python (pandas and numpy): Data wrangling, vectorization, reproducible notebooks. Mini task: Write a function to clean missing values and outliers, then unit test it.
  • Machine Learning Algorithms: Linear/logistic regression, trees, ensembles, clustering, basic NLP/Recs. Mini task: Train a baseline logistic regression then a gradient boosted model; compare.
  • Feature Engineering: Encodings, scaling, text/time features, leakage prevention. Mini task: Create target-aware time splits and demonstrate that leakage is avoided.
  • Model Evaluation: Metrics for regression/classification, cross-validation, calibration, monitoring. Mini task: Plot ROC and PR curves on an imbalanced dataset; discuss which metric you’d report and why.
  • Experiment Design: A/B tests, power, sample size, non-inferiority, pitfalls. Mini task: Calculate required sample size for a target effect and significance.
  • SQL: Joins, window functions, CTEs, performance basics. Mini task: Build a weekly retention query using window functions.
  • Visualization: Clear charts, color use, storytelling, reproducible plots. Mini task: Turn a cluttered plot into a clean, annotated chart with a one-sentence takeaway.
  • Communication: Executive summaries, decision memos, stakeholder alignment. Mini task: Write a 5-sentence summary of a model’s impact for non-technical readers.

Practical projects to build your portfolio

  1. Impactful A/B Test Analysis
    Goal: Evaluate a product change and recommend rollout.
    Deliverable: Design doc, power calc, analysis notebook, memo.
    Success checklist
    • Clear hypothesis and primary metric.
    • Pre-registered analysis plan and stopping rule.
    • Effect size with CI and practical interpretation.
    • Limitations and follow-ups.
  2. End-to-End Predictive Model
    Goal: Predict churn or conversion and quantify value.
    Deliverable: Reproducible pipeline, model card, monitoring plan.
    Success checklist
    • Time-aware splits to avoid leakage.
    • Baseline vs improved model with fair comparison.
    • Business translation: expected ROI or cost savings.
    • Fairness/ethics considerations.
  3. Customer Segmentation
    Goal: Unsupervised clusters that inform strategy.
    Deliverable: EDA + clustering + profile deck.
    Success checklist
    • Feature scaling rationale.
    • Cluster stability and number selection.
    • Actionable segment narratives.
  4. Metric Definition and Dashboard
    Goal: Define North Star and guardrail metrics and build a dashboard.
    Deliverable: Metric spec + dashboard with annotations.
    Success checklist
    • Precise metric formulas and data sources.
    • Sampling/latency constraints documented.
    • Anomaly alerts and ownership noted.
  5. Recommendation or NLP Mini-System
    Goal: Simple recommender or text classifier with offline and simulated online evaluation.
    Deliverable: Notebook, evaluation report, ethics notes.
    Success checklist
    • Cold-start considerations.
    • Business-meaningful metrics, not just accuracy.
    • Bias and safety checks.

Interview preparation checklist

  • Review: SQL joins/window functions; vectorized pandas; stats tests; evaluation metrics; experiment design.
  • Practice: 10–15 SQL and modeling problems under time constraints.
  • Case drills: Structure ambiguous problems into metrics, plan, risks, decision.
  • Story bank: 4–6 STAR stories (impact, conflict, ambiguity, failure, leadership).
  • Portfolio rehearsal: 2 projects with crisp problem, solution, results, lessons.
  • Communication: One-slide summary per project; 2-minute and 5-minute versions.
Mini tasks to get ready fast
  • Write a 7-line SQL query using a window function and explain each line.
  • Explain ROC vs PR to a PM in 60 seconds.
  • Draft a one-paragraph experiment design with metric, sample size, stopping rule.

Common mistakes and how to avoid them

  • Optimizing the wrong metric: Align with a primary metric and guardrails before coding.
  • Data leakage: Use time-based splits and freeze feature windows.
  • Overfitting: Prefer simpler baselines; use cross-validation and calibration.
  • Unclear narratives: Lead with the decision and impact; put math in the appendix.
  • Ignoring uncertainty: Report CIs/intervals and practical significance.
  • No reproducibility: Use seeds, requirements, and a simple Makefile or notebook index.

Learning path

  1. Weeks 1–2: SQL + Python data wrangling. Build a small EDA notebook.
  2. Weeks 3–4: Statistics + Probability for inference and decisions.
  3. Weeks 5–7: ML algorithms + Feature engineering; compare baselines.
  4. Weeks 8–9: Model evaluation + monitoring; handle imbalance and drift.
  5. Weeks 10–11: Experiment design; plan/assess an A/B test.
  6. Week 12+: Communication + portfolio polish; apply for roles.
Fast track (already comfortable with Python)
  • Focus on SQL, evaluation, and experimentation first.
  • Ship one end-to-end project and one strong A/B analysis.

Who this is for

  • Analytical thinkers who enjoy turning ambiguity into structure.
  • People comfortable with coding and statistics.
  • Professionals transitioning from analytics, engineering, or research.

Prerequisites

  • Comfortable with basic algebra and Python fundamentals.
  • Willingness to write SQL and explain results to non-technical peers.
  • Curiosity and patience with messy, real-world data.

Next steps

  • Take the Fit Test on this page to gauge your match.
  • Skim the Exam to see what “job-ready” looks like.
  • Choose one portfolio project and schedule time to complete it.

Pick a skill to start

Scroll to the Skills section below and begin with the first skill. Small daily practice beats marathon study sessions.

Have questions about Data Scientist?

AI Assistant

Ask questions about this tool