algorithm chain for ram-vram
This commit is contained in:
@@ -1,7 +1,338 @@
|
||||
# Aggoritm definition
|
||||
# Algorithm Infrastructure
|
||||
|
||||
## Overview
|
||||
|
||||
An algorithm in the uLib infrastructure is a class for containing a functional that can be dynamically loaded into memory as a plug-in.
|
||||
It derives from the base Object class and therefore can contain properties that define the serialization of operating parameters or the implementation of widgets for interactive parameter manipulation.
|
||||
The algorithm class is designed to be inserted into a Task, i.e., a class for managing the execution of scheduled operations. A task contains a Run and Stop method to start and stop execution. Furthermore, a task can be configured to work in cyclic or asynchronous mode: in cyclic mode it will be possible to define a cycle time, while in asynchronous mode a task can be hooked to a signal-slot of the Object structure or to a condition variable defined in the monitor pattern (Monitor.h).
|
||||
It derives from the base `Object` class (`Core/Object.h`) and therefore can contain properties that define the serialization of operating parameters or the implementation of widgets for interactive parameter manipulation.
|
||||
|
||||
The algorithm class is designed to be inserted into an `AlgorithmTask`, a class for managing the execution of scheduled operations. A task contains `Run` and `Stop` methods to start and stop execution. A task can be configured to work in two modes:
|
||||
|
||||
- **Cyclic mode**: the algorithm is executed periodically with a configurable cycle time.
|
||||
- **Asynchronous mode**: the task waits for a trigger before each execution. Triggers can come from the uLib signal-slot system (`Object::connect`) or from a condition variable as defined in the monitor pattern (`Core/Monitor.h`).
|
||||
|
||||
The algorithm is defined as a template class on two types `T_enc` and `T_dec`. The encoder is a type for data input or another algorithm that is chained with this one and outputs data in a compatible format. The decoder is the type of data output or a downstream algorithm compatible with it.
|
||||
|
||||
## Class Hierarchy
|
||||
|
||||
```
|
||||
Object (Core/Object.h)
|
||||
|
|
||||
+-- Algorithm<T_enc, T_dec> (Core/Algorithm.h)
|
||||
| |
|
||||
| +-- VoxImageFilter<VoxelT, CrtpImplT> (Math/VoxImageFilter.h)
|
||||
| |
|
||||
| +-- VoxFilterAlgorithmLinear (Math/VoxImageFilterLinear.hpp)
|
||||
| +-- VoxFilterAlgorithmMedian (Math/VoxImageFilterMedian.hpp)
|
||||
| +-- VoxFilterAlgorithmAbtrim (Math/VoxImageFilterABTrim.hpp)
|
||||
| +-- VoxFilterAlgorithmSPR (Math/VoxImageFilterABTrim.hpp)
|
||||
| +-- VoxFilterAlgorithmThreshold (Math/VoxImageFilterThreshold.hpp)
|
||||
| +-- VoxFilterAlgorithmBilateral (Math/VoxImageFilterBilateral.hpp)
|
||||
| +-- VoxFilterAlgorithmBilateralTrim(Math/VoxImageFilterBilateral.hpp)
|
||||
| +-- VoxFilterAlgorithm2ndStat (Math/VoxImageFilter2ndStat.hpp)
|
||||
| +-- VoxFilterAlgorithmCustom (Math/VoxImageFilterCustom.hpp)
|
||||
|
|
||||
+-- Thread (Core/Threads.h)
|
||||
|
|
||||
+-- AlgorithmTask<T_enc, T_dec> (Core/Algorithm.h)
|
||||
```
|
||||
|
||||
## Algorithm (`Core/Algorithm.h`)
|
||||
|
||||
### Template Parameters
|
||||
|
||||
```cpp
|
||||
template <typename T_enc, typename T_dec>
|
||||
class Algorithm : public Object;
|
||||
```
|
||||
|
||||
- **`T_enc`** (Encoder): the input data type. Can be a raw data type or a pointer to a data structure. When chaining algorithms, the upstream algorithm's `T_dec` must be compatible with this algorithm's `T_enc`.
|
||||
- **`T_dec`** (Decoder): the output data type. Produced by `Process()` and consumed by the next algorithm in the chain.
|
||||
|
||||
### Core Interface
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `virtual T_dec Process(const T_enc& input) = 0` | Pure virtual. Implement the algorithm logic here. |
|
||||
| `T_dec operator()(const T_enc& input)` | Calls `Process()`. Enables functional syntax: `result = alg(data)`. |
|
||||
|
||||
### Algorithm Chaining
|
||||
|
||||
Algorithms can be linked in processing pipelines via encoder/decoder pointers:
|
||||
|
||||
```cpp
|
||||
Algorithm* upstream; // SetEncoder() / GetEncoder()
|
||||
Algorithm* downstream; // SetDecoder() / GetDecoder()
|
||||
```
|
||||
|
||||
This allows building chains like:
|
||||
|
||||
```
|
||||
[RawData] --> AlgorithmA --> AlgorithmB --> [Result]
|
||||
encoder decoder
|
||||
```
|
||||
|
||||
### Signals
|
||||
|
||||
| Signal | Emitted when |
|
||||
|--------|-------------|
|
||||
| `Started()` | The algorithm begins processing (caller responsibility). |
|
||||
| `Finished()` | The algorithm completes processing (caller responsibility). |
|
||||
|
||||
### Device Preference (CUDA)
|
||||
|
||||
Algorithms report their preferred execution device via `GetPreferredDevice()`:
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `virtual MemoryDevice GetPreferredDevice() const` | Returns `RAM` or `VRAM`. Subclasses override. |
|
||||
| `void SetPreferredDevice(MemoryDevice dev)` | Manually set the device preference. |
|
||||
| `bool IsGPU() const` | Shorthand for `GetPreferredDevice() == VRAM`. |
|
||||
|
||||
GPU-based algorithms are responsible for calling `cudaDeviceSynchronize()` inside their `Process()` implementation before returning, so that results are available to the caller or downstream algorithm.
|
||||
|
||||
### Example: Defining a Custom Algorithm
|
||||
|
||||
```cpp
|
||||
class MyFilter : public Algorithm<VoxImage<Voxel>*, VoxImage<Voxel>*> {
|
||||
public:
|
||||
const char* GetClassName() const override { return "MyFilter"; }
|
||||
|
||||
VoxImage<Voxel>* Process(VoxImage<Voxel>* const& image) override {
|
||||
// ... filter the image in-place ...
|
||||
return image;
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## AlgorithmTask (`Core/Algorithm.h`)
|
||||
|
||||
`AlgorithmTask` manages the execution of an `Algorithm` within a scheduled, threaded context. It inherits from `Thread` (`Core/Threads.h`) and uses `Mutex` (`Core/Monitor.h`) for synchronization.
|
||||
|
||||
### Template Parameters
|
||||
|
||||
```cpp
|
||||
template <typename T_enc, typename T_dec>
|
||||
class AlgorithmTask : public Thread;
|
||||
```
|
||||
|
||||
Must match the `Algorithm<T_enc, T_dec>` it manages.
|
||||
|
||||
### Configuration
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `void SetAlgorithm(AlgorithmType* alg)` | Set the algorithm to execute. |
|
||||
| `void SetMode(Mode mode)` | `Cyclic` or `Async`. |
|
||||
| `void SetCycleTime(int ms)` | Period for cyclic mode (milliseconds). |
|
||||
|
||||
### Execution Modes
|
||||
|
||||
#### Cyclic Mode
|
||||
|
||||
The algorithm's `Process()` is called periodically. The cycle waits on a `condition_variable_any` with timeout, so `Stop()` can interrupt immediately without waiting for the full cycle.
|
||||
|
||||
```cpp
|
||||
AlgorithmTask<int, int> task;
|
||||
task.SetAlgorithm(&myAlgorithm);
|
||||
task.SetMode(AlgorithmTask<int, int>::Cyclic);
|
||||
task.SetCycleTime(100); // every 100ms
|
||||
task.Run(inputData);
|
||||
// ... later ...
|
||||
task.Stop();
|
||||
```
|
||||
|
||||
#### Asynchronous Mode
|
||||
|
||||
The task thread blocks on a condition variable until `Notify()` is called. Each notification triggers exactly one `Process()` invocation.
|
||||
|
||||
```cpp
|
||||
task.SetMode(AlgorithmTask<int, int>::Async);
|
||||
task.Run(inputData);
|
||||
|
||||
// Trigger manually:
|
||||
task.Notify();
|
||||
|
||||
// Or connect to a signal:
|
||||
task.ConnectTrigger(sender, &SenderClass::DataReady);
|
||||
// Now each emission of DataReady() triggers one Process() call.
|
||||
```
|
||||
|
||||
### Lifecycle
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `void Run(const T_enc& input)` | Starts the background thread with the given input. |
|
||||
| `void Stop()` | Requests stop and joins the thread. |
|
||||
| `bool IsRunning()` | Inherited from `Thread`. |
|
||||
|
||||
### Signals
|
||||
|
||||
| Signal | Emitted when |
|
||||
|--------|-------------|
|
||||
| `Stopped()` | The task thread has completed (after last `Process()` and before thread exit). |
|
||||
|
||||
### Signal-Slot Triggering
|
||||
|
||||
`ConnectTrigger()` connects any uLib `Object` signal to the task's `Notify()` method:
|
||||
|
||||
```cpp
|
||||
task.ConnectTrigger(detector, &Detector::EventReady);
|
||||
```
|
||||
|
||||
This uses the uLib signal system (`Core/Signal.h`), not Qt signals. The connection is type-safe and works with the `Object::connect` infrastructure.
|
||||
|
||||
## VoxImageFilter (`Math/VoxImageFilter.h`)
|
||||
|
||||
`VoxImageFilter` specializes `Algorithm` for kernel-based volumetric image filtering. It uses CRTP (Curiously Recurring Template Pattern) so that concrete filters provide their `Evaluate()` method without virtual dispatch overhead in the inner loop.
|
||||
|
||||
### Template Parameters
|
||||
|
||||
```cpp
|
||||
template <typename VoxelT, typename CrtpImplT>
|
||||
class VoxImageFilter : public Abstract::VoxImageFilter,
|
||||
public Algorithm<VoxImage<VoxelT>*, VoxImage<VoxelT>*>;
|
||||
```
|
||||
|
||||
- **`VoxelT`**: the voxel data type (must satisfy `Interface::Voxel` — requires `.Value` and `.Count` fields).
|
||||
- **`CrtpImplT`**: the concrete filter subclass. Must implement:
|
||||
```cpp
|
||||
float Evaluate(const VoxImage<VoxelT>& buffer, int index);
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
1. `Process(image)` creates a read-only buffer copy of the input image.
|
||||
2. For each voxel in parallel (OpenMP), it calls `CrtpImplT::Evaluate(buffer, index)`.
|
||||
3. `Evaluate()` reads from the buffer using the kernel offsets and writes the result.
|
||||
4. The filtered image is returned (in-place modification).
|
||||
|
||||
```
|
||||
Process(image)
|
||||
|
|
||||
+-- buffer = copy of image (read-only snapshot)
|
||||
|
|
||||
+-- #pragma omp parallel for
|
||||
| for each voxel i:
|
||||
| image[i].Value = CrtpImplT::Evaluate(buffer, i)
|
||||
|
|
||||
+-- return image
|
||||
```
|
||||
|
||||
### Kernel System
|
||||
|
||||
The `Kernel<VoxelT>` class stores convolution weights and precomputed index offsets:
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `SetKernelNumericXZY(values)` | Set kernel weights from a flat vector (XZY order). |
|
||||
| `SetKernelSpherical(shape)` | Set weights via a radial function `f(distance^2)`. |
|
||||
| `SetKernelWeightFunction(shape)` | Set weights via a 3D position function `f(Vector3f)`. |
|
||||
|
||||
### CUDA Support
|
||||
|
||||
Concrete filters can override `Process()` with a CUDA implementation:
|
||||
|
||||
```cpp
|
||||
#if defined(USE_CUDA) && defined(__CUDACC__)
|
||||
VoxImage<VoxelT>* Process(VoxImage<VoxelT>* const& image) override {
|
||||
if (this->GetPreferredDevice() == MemoryDevice::VRAM) {
|
||||
// Launch CUDA kernel, synchronize, return
|
||||
} else {
|
||||
return BaseClass::Process(image); // CPU fallback
|
||||
}
|
||||
}
|
||||
#endif
|
||||
```
|
||||
|
||||
The base class `GetPreferredDevice()` automatically returns `VRAM` when the image or kernel data resides on the GPU, enabling transparent device dispatch.
|
||||
|
||||
Filters with CUDA implementations: `VoxFilterAlgorithmLinear`, `VoxFilterAlgorithmAbtrim`, `VoxFilterAlgorithmSPR`.
|
||||
|
||||
### Concrete Filters
|
||||
|
||||
| Filter | File | Description |
|
||||
|--------|------|-------------|
|
||||
| `VoxFilterAlgorithmLinear` | `VoxImageFilterLinear.hpp` | Weighted linear convolution (FIR filter). CUDA-enabled. |
|
||||
| `VoxFilterAlgorithmMedian` | `VoxImageFilterMedian.hpp` | Median filter with kernel-weighted sorting. |
|
||||
| `VoxFilterAlgorithmAbtrim` | `VoxImageFilterABTrim.hpp` | Alpha-beta trimmed mean filter. CUDA-enabled. |
|
||||
| `VoxFilterAlgorithmSPR` | `VoxImageFilterABTrim.hpp` | Robespierre filter: trimmed mean applied only to outlier voxels. CUDA-enabled. |
|
||||
| `VoxFilterAlgorithmThreshold` | `VoxImageFilterThreshold.hpp` | Binary threshold filter. |
|
||||
| `VoxFilterAlgorithmBilateral` | `VoxImageFilterBilateral.hpp` | Edge-preserving bilateral filter (intensity-weighted Gaussian). |
|
||||
| `VoxFilterAlgorithmBilateralTrim` | `VoxImageFilterBilateral.hpp` | Bilateral filter with alpha-beta trimming. |
|
||||
| `VoxFilterAlgorithm2ndStat` | `VoxImageFilter2ndStat.hpp` | Local variance (second-order statistic). |
|
||||
| `VoxFilterAlgorithmCustom` | `VoxImageFilterCustom.hpp` | User-supplied evaluation function via function pointer. |
|
||||
|
||||
### Example: Using a Filter with AlgorithmTask
|
||||
|
||||
```cpp
|
||||
// Create filter and configure kernel
|
||||
VoxFilterAlgorithmLinear<Voxel> filter(Vector3i(3, 3, 3));
|
||||
std::vector<float> weights(27, 1.0f); // uniform 3x3x3
|
||||
filter.SetKernelNumericXZY(weights);
|
||||
|
||||
// Direct use
|
||||
filter.SetImage(&image);
|
||||
filter.Run();
|
||||
|
||||
// Or via Algorithm interface
|
||||
VoxImage<Voxel>* result = filter.Process(&image);
|
||||
|
||||
// Or scheduled in a task
|
||||
AlgorithmTask<VoxImage<Voxel>*, VoxImage<Voxel>*> task;
|
||||
task.SetAlgorithm(&filter);
|
||||
task.SetMode(AlgorithmTask<VoxImage<Voxel>*, VoxImage<Voxel>*>::Cyclic);
|
||||
task.SetCycleTime(500);
|
||||
task.Run(&image);
|
||||
```
|
||||
|
||||
## Structural Benefits
|
||||
|
||||
### 1. Uniform Processing Interface
|
||||
|
||||
Every algorithm — from a simple threshold to a GPU-accelerated convolution — exposes the same `Process(input) -> output` interface. Client code does not need to know the concrete type:
|
||||
|
||||
```cpp
|
||||
Algorithm<VoxImage<Voxel>*, VoxImage<Voxel>*>* alg = &anyFilter;
|
||||
alg->Process(&image);
|
||||
```
|
||||
|
||||
### 2. Pipeline Composition
|
||||
|
||||
The encoder/decoder chaining allows building data processing pipelines where each stage transforms data and passes it to the next. Type safety is enforced at compile time through template parameters.
|
||||
|
||||
### 3. Scheduled and Event-Driven Execution
|
||||
|
||||
`AlgorithmTask` decouples the algorithm from its execution schedule. The same algorithm can be:
|
||||
- Called directly (`Process()`)
|
||||
- Run periodically (Cyclic mode for monitoring/acquisition)
|
||||
- Triggered by events (Async mode for reactive processing)
|
||||
|
||||
### 4. Transparent CPU/GPU Dispatch
|
||||
|
||||
The `MemoryDevice` preference and `GetPreferredDevice()` virtual allow the same algorithm interface to dispatch to CPU or GPU implementations. The `DataAllocator` transparently manages RAM/VRAM transfers, and concrete filters override `Process()` with CUDA kernels when data is on the GPU.
|
||||
|
||||
### 5. Integration with the Object System
|
||||
|
||||
Since `Algorithm` inherits from `Object`, algorithms gain:
|
||||
- **Properties**: serializable parameters via the `Property<T>` system, enabling persistent configuration and GUI widget generation.
|
||||
- **Signals**: `Started`/`Finished` notifications for connecting to monitoring or logging.
|
||||
- **Serialization**: save/load algorithm configuration via Boost archives.
|
||||
- **Instance naming**: `SetInstanceName()` for runtime identification in contexts.
|
||||
|
||||
### 6. CRTP Performance for Inner Loops
|
||||
|
||||
`VoxImageFilter` uses CRTP to dispatch to `Evaluate()` without virtual function overhead. The per-voxel evaluation runs at full speed inside OpenMP parallel loops, while the outer `Process()` method remains virtual for polymorphic use through the Algorithm interface.
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
Core/Object.h — base class, properties, signals, serialization
|
||||
Core/Signal.h — signal-slot connection infrastructure
|
||||
Core/Monitor.h — Mutex, condition variables, ULIB_MUTEX_LOCK
|
||||
Core/Threads.h — Thread base class for AlgorithmTask
|
||||
Core/DataAllocator.h — MemoryDevice enum, RAM/VRAM data management
|
||||
Math/VoxImage.h — volumetric image container
|
||||
Math/VoxImageFilter.h — kernel-based filter framework
|
||||
```
|
||||
|
||||
|
||||
The algorithm in particular is defined as a template class on two types T_enc, T_dec. The encoder is a type for data input or another algorithm that is chained with this one that outputs data in the format compatible with input. The decoder is the type of data output or an algorithm compatible with it.
|
||||
|
||||
Reference in New Issue
Block a user