Lineage Graph

Code Ocean

The Product

Code Ocean empowers computational scientists to run, reproduce, and share analyses. As projects scaled, so did complexity: dozens of Capsules (Code + Environment + Data) with other datasets and repositories interacted invisibly so teams struggled to trace dependencies, debug failures, or reproduce results confidently.

The Lineage Graph was conceived to bring clarity and trust: a dynamic visualization mapping every relationship between code, data, results, and pipelines in real time.

My Role

As a UX designer, I worked on all phases of the project, from user research and competitor analysis, through wireframes, surveys, and usability tests on prototypes to the final design and delivery to the Development team.

I worked closely with product managers, customer support, customer success, and developers.

The Process

The Process

The Challenge

Users lacked a unified view of dependencies and when one dataset or script changed, there was no clear way to see:
Which results or capsules depended on it.?
• Whether outputs were still valid.?
• Who was responsible for updates?

Result:
Slow research cycles -> Increase reproducibility risk -> Team Fricition

Users lacked a unified view of dependencies and when one dataset or script changed, there was no clear way to see:
Which results or capsules depended on it.?
• Whether outputs were still valid.?
• Who was responsible for updates?

Result:
Slow research cycles -> Increase reproducibility risk -> Team Fricition

“I just want to see what feeds what, without reading five scripts.”

— Data Scientist, Bioinformatics Team

The Research

Competitive Benchmarking

Competitive Benchmarking

User Interviews

User Interviews

Heuristic Analysis

Heuristic Analysis

Findings

  1. Cognitive overload: Users abandoned DAGs (Directed Acyclic Graphs) once graphs exceeded ~50 nodes.

  2. Traceability gaps: Users couldn’t answer “what created this file quickly?” without investigating many scripts.

  3. Role diversity: Scientists needed readability, not raw DAG syntax.

  4. Tool gaps:

    • Airflow → powerful but intimidating for non-engineers.

    • Neo4j Bloom → friendly but too generic for code/data context.

    • Alation → excellent progressive expansion, limited workflow awareness.

Takeaways

  1. Apply progressive disclosure and focus+context to tame complexity.

  2. Use Gestalt grouping for readability.

  3. Use Shneiderman’s Mantra for interaction flow: Overview → Zoom & Filter → Details on Demand.

  4. Use the Tufte’s data-ink ratio: minimal yet informative visuals.

The Design

Design Principles

  1. Clarity: every line communicates purpose.

  2. Progressive disclosure: start calm, expand only as curiosity grows.

  3. Cognitive efficiency: structure and color guide attention, not decoration.

  4. Reproducibility by design: every node works as evidence: who, what, when, how.

Architecture

  • Graph canvas (core view).

  • Layer controls.

  • Advanced Filters: Phase 2

  • “Details Panel” (on demand).

  • Breadcrumbs highlights on hover for orientation.

Solution Overview

  • Visual Vocabulary: Shapes & color convey type, not status.

  • Calm Overview + Expandable Depth: Inspired by Alation’s progressive disclosure patterns.

  • Focus + Context: Click or hover over node, highlights all the upstream paths.

  • Role-Based Layers: One graph, many audiences.

  • Real-Time Synchronization: Nodes show live run state.

Validation & Outcomes

Results from Usability Sessions

  • Tasks: trace upstream sources, verify reproducibility, assess impact of change.

  • Outcome:

    • 90% task completion success.

    • Average trace time reduced from ~6min → 2min.

    • “Feels calm and obvious” repeated across testers.

“This finally matches how I think, start small, follow the thread, open detail when I need it.”

— Computational Scientist, internal pilot

Reflection

Designing for technical users reaffirmed that:

  • People want clarity, not complexity. Even engineers appreciate a “calm default.”

  • Progressive disclosure builds trust. A predictable reveal rhythm reduces stress.

  • Reproducibility is emotional. Seeing lineage instantly builds confidence in results.

Collaboration with engineering was crucial: graph performance required virtualized rendering and dynamic clustering, ensuring UX fluidity even with 1,000+ nodes.

The Lineage Graph transformed hidden complexity into navigable clarity.

It unified scientists and engineers under a single visual truth, reduced cognitive friction, and redefined reproducibility as an interactive experience.

Thankyou :)

“Good design is obvious. Great design is transparent.”

— Joe Sparano

Hey! 🤪

This page is more fun in Desktop!

Thankyou :)

Create a free website with Framer, the website builder loved by startups, designers and agencies.