Files
uLib/docs/usage/usage.md

2.1 KiB

Usage and Installation Guide

Requirements

Compiling with CUDA Support

The library supports running VoxImage filtering operations directly on CUDA cores via transparent RAM/VRAM memory transfers.

By default, the CMakeLists.txt build system sets USE_CUDA=ON and will attempt to locate nvcc and the NVIDIA CUDA Toolkit. If the toolkit is missing, CMake will fail unless you explicitly configure the project with -DUSE_CUDA=OFF.

1. Installing CUDA Environment via Micromamba

If you are developing inside an isolated Conda/Micromamba environment (e.g., mutom), you can inject the CUDA compilers directly into your environment rather than relying on global system dependencies:

# Add the conda-forge channel if not already available
micromamba config append channels conda-forge

# Install nvcc and the necessary CUDA toolkit components
micromamba install cuda-nvcc

Verify your installation:

nvcc --version

2. Building the Project

Configure and compile the project using standard CMake flows:

mkdir -p build && cd build

# Configure CMake
# (Optional) Explicitly toggle CUDA: cmake -DUSE_CUDA=ON ..
cmake ..

# Compile the project and tests
make -j $(nproc)

3. Validating CUDA Support

You can verify that the CUDA kernels are launching correctly and allocating device memory through DataAllocator by running the mathematical unit tests.

# From the build directory
./src/Math/testing/VoxImageFilterTest

# Output should show:
# "Data correctly stayed in VRAM after CUDA execution!"

How It Works Under The Hood

The DataAllocator<T> container automatically wraps memory allocations to transparently map to CPU RAM, or GPU VRAM. Standard iteration automatically pulls data backwards using implicit MoveToRAM() calls.

Filters using #ifdef USE_CUDA explicitly dictate <buffer>.MoveToVRAM() allocating directly on device bounds seamlessly. Fallbacks to Host compute iterations handle themselves automatically. Chaining specific filters together safely chains continuous VRAM operations avoiding costly Host copies in between iterations.