5.5 KiB
XMCTS data cleaning
This repository contains a project to resolve the faults in the data acquisition of the XMCTS diagnostic of W7-X.
Getting started
First of all you should create a python environment to install all of the needed dependencies. The recommended way is to use venv, one can create a venv by executing the following command:
python -m venv path\to\venv\env-name
Once the virtual enviroment has been created, one can install the package by moving inside the project main folder and executing the following command:
python -m build
follwing this, a folder called dist is created and one can then install the package by runnning:
python -m pip install dist/xmctsguard-vnum.tar.gz
where vnum is the current version of the package, starting from 0.1.0.
Once this operation is completed all of the required libraries should have been installed and the package can be used. For instance the gui can be called by executing the command XMCTSGuard-gui.
Structure
The project is structured in the following way:
XMCTSGuard/ ├── src/ │ └── XMCTSGuard/ │ ├── init.py # Makes this a package │ ├── main.py # Entry point to launch the GUI │ │ │ ├── gui/ # All Things Visual │ │ ├── init.py │ │ ├── main_window.py # Your Main Class │ │ ├── widgets.py # Custom buttons, sliders, etc. │ │ └── helpers.py # GUI-only helper functions │ │ │ ├── engine/ # The "Brain" (Neural Network) │ │ ├── init.py │ │ ├── model.py # The NN class │ │ ├── trainer.py # Training logic │ │ ├── database.py # Data loading │ │ └── callbacks.py # Training monitors │ │ │ └── analysis/ # The Bridge │ ├── init.py │ └── processors.py # Functions that use the NN to analyze data │ ├── data/ # Local storage for datasets (git-ignored) ├── tests/ # Your test files ├── pyproject.toml # Build config └── README.md
Usage
The project leverages a neural network (NN) based on a AutoEncoder (AE) architecture, contained in engine/model.py to give an ansatz of what the correct brightness "profile" should look like. Subsequently, the correlation between the ansatz and the measured profile is computed, this gives a metric to understand the distance of the measured profile from the usual distribution of profiles in W7-X. Moreover, the ansatz for the profile is also used to compute the distance of each diode from the reconstructed brightness. Via a threshold on the residuals between these two curves, it is possible to highlight the outliers in the measured profile and correct their values knowing the gain ratio between the old pulse and the new pulse.
Once the analysis is completed one can save (cache) the "new" data, using the same format of the qxtdataaccess gui, so that it is usable for other purposes.
One can install the package on
Neural Network
The NN used in this project, previously introduced, is developed using the lightning python library (lightning docs), which is based on pytorch.
All of the necessary code for the NN is contained in the src/app/engine folder. The model.py file contains the base file with the network structure, trainer.py contains the functions used for the neural network training and database.py has the data loader and dataset classes for file reading and manipulation to make them NN compliant.
Training
The training of the NN is done via running the script train.py in the enigine folder. There is a config dictionary with the various parameters it is possible to tweak for each running. Before training a database over which doing the training procedure should be created, this can be done by running the function consolidate_pulses contained in src/engine/pulse_dataset.py script The training procedure can be run by calling the train_autoencoder function in src/engine/train.py. Refer to this function documentation for more information.
Attention
It may be that the trainig routine, if run locally on IPPs PCs, tries to select the available GPUs, even if it may not be posible to use them. If this is the case, before running the training procedure, one should run the following command:
export CUDA_VISIBLE_DEVICES=''
in order to deselect the possibility of using said GPUs.
Training in HPC
It is also possible to train the model on the Raven HPC, even if for a single training, without the need for optimization, it is not necessary to use such a powerful machine.
The steps to deploy and train the model on the raven HPC, which has NVIDIA GPUs available, are thoroughly described in the file HPCProcedures.md
Jupyter Notebooks
This is a useful tool for exploring the code and see hands-on examples on how the various steps work together, however it can be messy inside a .git repository, in order to avoid embedding in the version control a great amount of useless data and plot, the use of the nbstripout package is strongly recommended. This package, a possible implementation can be found here under the 'Using as a Git filter' section.
Notice
For any problems, doubts or error with the code, contact luca.orlandi@igi.cnr.it