refactor: migrate voxel data storage to DataAllocator for CUDA

2026-02-28 10:05:39 +00:00
parent 07915295cb
commit 52580d8cde
14 changed files with 1484 additions and 1022 deletions
--- a/docs/usage/usage.md
+++ b/docs/usage/usage.md
@@ -0,0 +1,60 @@
+# Usage and Installation Guide
+
+## Requirements
+
+### Compiling with CUDA Support
+
+The library supports running VoxImage filtering operations directly on CUDA cores via transparent RAM/VRAM memory transfers.
+
+By default, the `CMakeLists.txt` build system sets `USE_CUDA=ON` and will attempt to locate `nvcc` and the NVIDIA CUDA Toolkit. If the toolkit is missing, `CMake` will fail unless you explicitly configure the project with `-DUSE_CUDA=OFF`.
+
+### 1. Installing CUDA Environment via Micromamba
+
+If you are developing inside an isolated Conda/Micromamba environment (e.g., `mutom`), you can inject the CUDA compilers directly into your environment rather than relying on global system dependencies:
+
+```bash
+# Add the conda-forge channel if not already available
+micromamba config append channels conda-forge
+
+# Install nvcc and the necessary CUDA toolkit components
+micromamba install cuda-nvcc
+```
+
+Verify your installation:
+
+```bash
+nvcc --version
+```
+
+### 2. Building the Project
+
+Configure and compile the project using standard CMake flows:
+
+```bash
+mkdir -p build && cd build
+
+# Configure CMake
+# (Optional) Explicitly toggle CUDA: cmake -DUSE_CUDA=ON ..
+cmake ..
+
+# Compile the project and tests
+make -j $(nproc)
+```
+
+### 3. Validating CUDA Support
+
+You can verify that the CUDA kernels are launching correctly and allocating device memory through `DataAllocator` by running the mathematical unit tests.
+
+```bash
+# From the build directory
+./src/Math/testing/VoxImageFilterTest
+
+# Output should show:
+# "Data correctly stayed in VRAM after CUDA execution!"
+```
+
+## How It Works Under The Hood
+
+The `DataAllocator<T>` container automatically wraps memory allocations to transparently map to CPU RAM, or GPU VRAM. Standard iteration automatically pulls data backwards using implicit `MoveToRAM()` calls. 
+
+Filters using `#ifdef USE_CUDA` explicitly dictate `<buffer>.MoveToVRAM()` allocating directly on device bounds seamlessly. Fallbacks to Host compute iterations handle themselves automatically. Chaining specific filters together safely chains continuous VRAM operations avoiding costly Host copies in between iterations.