refactor: migrate voxel data storage to DataAllocator for CUDA
This commit is contained in:
60
docs/usage/usage.md
Normal file
60
docs/usage/usage.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Usage and Installation Guide
|
||||
|
||||
## Requirements
|
||||
|
||||
### Compiling with CUDA Support
|
||||
|
||||
The library supports running VoxImage filtering operations directly on CUDA cores via transparent RAM/VRAM memory transfers.
|
||||
|
||||
By default, the `CMakeLists.txt` build system sets `USE_CUDA=ON` and will attempt to locate `nvcc` and the NVIDIA CUDA Toolkit. If the toolkit is missing, `CMake` will fail unless you explicitly configure the project with `-DUSE_CUDA=OFF`.
|
||||
|
||||
### 1. Installing CUDA Environment via Micromamba
|
||||
|
||||
If you are developing inside an isolated Conda/Micromamba environment (e.g., `mutom`), you can inject the CUDA compilers directly into your environment rather than relying on global system dependencies:
|
||||
|
||||
```bash
|
||||
# Add the conda-forge channel if not already available
|
||||
micromamba config append channels conda-forge
|
||||
|
||||
# Install nvcc and the necessary CUDA toolkit components
|
||||
micromamba install cuda-nvcc
|
||||
```
|
||||
|
||||
Verify your installation:
|
||||
|
||||
```bash
|
||||
nvcc --version
|
||||
```
|
||||
|
||||
### 2. Building the Project
|
||||
|
||||
Configure and compile the project using standard CMake flows:
|
||||
|
||||
```bash
|
||||
mkdir -p build && cd build
|
||||
|
||||
# Configure CMake
|
||||
# (Optional) Explicitly toggle CUDA: cmake -DUSE_CUDA=ON ..
|
||||
cmake ..
|
||||
|
||||
# Compile the project and tests
|
||||
make -j $(nproc)
|
||||
```
|
||||
|
||||
### 3. Validating CUDA Support
|
||||
|
||||
You can verify that the CUDA kernels are launching correctly and allocating device memory through `DataAllocator` by running the mathematical unit tests.
|
||||
|
||||
```bash
|
||||
# From the build directory
|
||||
./src/Math/testing/VoxImageFilterTest
|
||||
|
||||
# Output should show:
|
||||
# "Data correctly stayed in VRAM after CUDA execution!"
|
||||
```
|
||||
|
||||
## How It Works Under The Hood
|
||||
|
||||
The `DataAllocator<T>` container automatically wraps memory allocations to transparently map to CPU RAM, or GPU VRAM. Standard iteration automatically pulls data backwards using implicit `MoveToRAM()` calls.
|
||||
|
||||
Filters using `#ifdef USE_CUDA` explicitly dictate `<buffer>.MoveToVRAM()` allocating directly on device bounds seamlessly. Fallbacks to Host compute iterations handle themselves automatically. Chaining specific filters together safely chains continuous VRAM operations avoiding costly Host copies in between iterations.
|
||||
Reference in New Issue
Block a user