Lineage Graph
Code Ocean


The Product
Code Ocean empowers computational scientists to run, reproduce, and share analyses. As projects scaled, so did complexity: dozens of Capsules (Code + Environment + Data) with other datasets and repositories interacted invisibly so teams struggled to trace dependencies, debug failures, or reproduce results confidently.
The Lineage Graph was conceived to bring clarity and trust: a dynamic visualization mapping every relationship between code, data, results, and pipelines in real time.
My Role
As a UX designer, I worked on all phases of the project, from user research and competitor analysis, through wireframes, surveys, and usability tests on prototypes to the final design and delivery to the Development team.
I worked closely with product managers, customer support, customer success, and developers.
The Process
The Process
The Challenge
Users lacked a unified view of dependencies and when one dataset or script changed, there was no clear way to see:
• Which results or capsules depended on it.?
• Whether outputs were still valid.?
• Who was responsible for updates?
Result:
Slow research cycles -> Increase reproducibility risk -> Team Fricition
Users lacked a unified view of dependencies and when one dataset or script changed, there was no clear way to see:
• Which results or capsules depended on it.?
• Whether outputs were still valid.?
• Who was responsible for updates?
Result:
Slow research cycles -> Increase reproducibility risk -> Team Fricition

“I just want to see what feeds what, without reading five scripts.”
— Data Scientist, Bioinformatics Team
The Research

Competitive Benchmarking
Competitive Benchmarking

User Interviews
User Interviews

Heuristic Analysis
Heuristic Analysis
Findings
Cognitive overload: Users abandoned DAGs (Directed Acyclic Graphs) once graphs exceeded ~50 nodes.
Traceability gaps: Users couldn’t answer “what created this file quickly?” without investigating many scripts.
Role diversity: Scientists needed readability, not raw DAG syntax.
Tool gaps:
Airflow → powerful but intimidating for non-engineers.
Neo4j Bloom → friendly but too generic for code/data context.
Alation → excellent progressive expansion, limited workflow awareness.
Takeaways
Apply progressive disclosure and focus+context to tame complexity.
Use Gestalt grouping for readability.
Use Shneiderman’s Mantra for interaction flow: Overview → Zoom & Filter → Details on Demand.
Use the Tufte’s data-ink ratio: minimal yet informative visuals.

The Design
Design Principles
Clarity: every line communicates purpose.
Progressive disclosure: start calm, expand only as curiosity grows.
Cognitive efficiency: structure and color guide attention, not decoration.
Reproducibility by design: every node works as evidence: who, what, when, how.


Architecture
Graph canvas (core view).
Layer controls.
Advanced Filters: Phase 2
“Details Panel” (on demand).
Breadcrumbs highlights on hover for orientation.
Solution Overview
Visual Vocabulary: Shapes & color convey type, not status.
Calm Overview + Expandable Depth: Inspired by Alation’s progressive disclosure patterns.
Focus + Context: Click or hover over node, highlights all the upstream paths.
Role-Based Layers: One graph, many audiences.
Real-Time Synchronization: Nodes show live run state.

Validation & Outcomes
Results from Usability Sessions
Tasks: trace upstream sources, verify reproducibility, assess impact of change.
Outcome:
90% task completion success.
Average trace time reduced from ~6min → 2min.
“Feels calm and obvious” repeated across testers.
“This finally matches how I think, start small, follow the thread, open detail when I need it.”
— Computational Scientist, internal pilot

Reflection
Designing for technical users reaffirmed that:
People want clarity, not complexity. Even engineers appreciate a “calm default.”
Progressive disclosure builds trust. A predictable reveal rhythm reduces stress.
Reproducibility is emotional. Seeing lineage instantly builds confidence in results.
Collaboration with engineering was crucial: graph performance required virtualized rendering and dynamic clustering, ensuring UX fluidity even with 1,000+ nodes.
The Lineage Graph transformed hidden complexity into navigable clarity.
It unified scientists and engineers under a single visual truth, reduced cognitive friction, and redefined reproducibility as an interactive experience.
Thankyou :)
“Good design is obvious. Great design is transparent.”
— Joe Sparano

