Skip to content

This toolbox extends frequency domain quantitative electroencephalography (qEEG) methods pursuing higher sensitivity to detect Brain Developmental Disorders. Prior qEEG work lacked integration of cross-spectral information omitting important functional connectivity descriptors.

Notifications You must be signed in to change notification settings

tperezdevelopment/HarMNqEEG

Repository files navigation

HarMNqEEG Toolbox

Global EEG Normative Project

Global Brain Consortium Homepage Link
Original code Author: Ying Wang, Min Li
Project leader:       Pedro Antonio Valdes-Sosa 
Researchers:          Jorge F. Bosch-Bayard, Lidice Galan Garcia  
Cbrain Tool Author:   Eng. Tania Perez Ramirez <[email protected]>
Copyright(c):         2022 Ying Wang, [email protected],
                      Min Li, [email protected]
Joint China-Cuba LAB, UESTC, CNEURO (Cuban Center for Neurosciences)

DOI

Reference Paper


doi:10.1016/j.neuroimage.2022.119190. Epub 2022 Apr 7. PMID: 35398285.

HarMNqEEG Toolbox Description

This toolbox extends frequency domain quantitative electroencephalography (qEEG) methods pursuing higher sensitivity to detect Brain Developmental Disorders. Prior qEEG work lacked integration of cross-spectral information omitting important functional connectivity descriptors. Lack of geographical diversity precluded accounting for site-specific variance, increasing qEEG nuisance variance. We ameliorate these weaknesses by (i) Creating lifespan Riemannian multinational qEEG norms for cross-spectral tensors. These norms result from the HarMNqEEG project fostered by the Global Brain Consortium. We calculated the norms with data from 9 countries, 12 devices, and 14 studies, including 1564 subjects. Developmental equations for the mean and standard deviation of qEEG traditional and Riemannian DPs were calculated using additive mixed-effects models. We demonstrate qEEG “batch effects” and provide methods to calculate harmonized z-scores. (ii) We also show that harmonized Riemannian norms produce z-scores with increased diagnostic accuracy. These results contribute to developing bias-free, low-cost neuroimaging technologies applicable in various health settings. In this first version, we limited the harmonized qEEG to the 19 channels of the S1020 montage. At the present, the toolbox accepts the input EEG data in EEG-BIDS, EDF+, BDF+, PLG, EEGLAB SET format, and a predefined TEXT format. In the case of not EEG-BIDS structure, the derivatives are stored in the same directory where the raw EEG file is located. The toolbox also contains the definition of the Harmonized qEEG derivatives for the EEG-BIDS format. The derivatives are stored in the BIDS structure compliant with the BIDS definition for the derivatives, in the Hierarchical Data Format (HDF). The functions for creating and loading the HarMNqEEG derivatives can be found in the directory "derivatives_functions".

HarMNqEEG Toolbox Installation and Requirements

  1. Matlab version: 2021b
  2. Clone the repository or download the .zip folder.
  3. Unzip the folder and Add the HarMNqEEG folder to your path in MATLAB.
  4. Call the main function HarMNqEEG_main.m.

HarMNqEEG Toolbox Example

In the folder example_data, there are two subfolders and the test_HarMNqEEG.m file to run the tool with data example.

  • Subfolder: with_cross_spectra_generated. This folder will be the raw_data_path parameter. This folder contains 2 subjects by folder for testing the tool. In each subject folder there is a .mat file with the cross spectra generated. When the raw_data_path parameters is the results of the data_gatherer ( Github location of the script: https://github.com/CCC-members/BC-V_group_stat/blob/master/data_gatherer.m), the generate_cross_spectra parameter must be 0.
  • Subfolder: without_the_cross_spectra_generated. This folder will be the raw_data_path parameter. This folder contains 2 subjects by folder for testing the tool. In each subject folder there is a .mat file with the cross spectra input data. The generate_cross_spectra parameter must be 1.
  • Subfolder: raw_data. This folder contain different formats: BIDS, .set, .plg, .txt. For run this data, you must select as subjects_metadata parameter the file raw_data_table.tsv or raw_data_table.mat.

HarMNqEEG Cbrain Plugin and Docker Image

Note important

This is the matlab code to develop the Cbrain plugin. That's why it doesn't have a graphical interface. In example_data folder you have an example script.

  1. HarMNqEEG.json
  2. HarMNqEEG docker image

Input Parameters:

Note important

The chanel montage must be 10-20 system

Data Gatherer

  • generate_cross_spectra: Boolean parameter. Default False. Case False (0), the raw_data_path folder will contain the data_gatherer output. Case True (1) is required to calculate the cross spectra.
  • raw_data_path : This parameter is required. Folder path of the raw data. The content of this raw_data_path depends of generate_cross_spectra parameters:
    • 1- If the generate_cross_spectra is False (0), this folder must be contain the data_gatherer output, with the cross spectra generated. (See more: https://github.com/CCC-members/BC-V_group_stat/blob/master/data_gatherer.m)
    • 2- If the generate_cross_spectra is True (1), the raw_data_path can contain the following formats:
      • 2.1- A Matlab structure (*.mat) with the following parameters:
        • - data : an artifact-free EEG scalp data matrix, organized as nd x nt x ne, where
          • nd : number of channels
          • nt : epoch size (# of instants of times in an epoch)
          • ne : number of epochs
        • - sampling_freq : sampling frequency in Hz. Eg: 200
        • - cnames : a cell array containing the names of the channels. The expected names are:
          • 'Fp1' 'Fp2' 'F3' 'F4' 'C3' 'C4' 'P3' 'P4' 'O1' 'O2' 'F7' 'F8' 'T3' 'T4' 'T5' 'T6' 'Fz' 'Cz' 'Pz'
          If the channels come in another order, they are re-arranged according to the expected order
        • - data_code : is the name of the original data file just for purpose of identification.
        • - reference : a string containing the name of the reference of the data.
        • - age : subject's age at recording time
        • - sex : subject's sex
        • - country : country providing the data
        • - eeg_device : EEG hardware where the data was recorded
      • 2.2- An ASCII file (*.txt) with a fixed structure which contains the data of an EEG file. In that case, the file needs to have the extension ".txt" and must have the following structure:
        • - NAME
        • - SEX
        • - AGE
        • - SAMPLING_FREQ
        • - EPOCH_SIZE
        • - NCHANNELS
        • - MONTAGE=
          • Fp1-REF
          • Fp2-REF
          • F3_-REF
          and so on. The program expects NCHANNELS lines with the names
        • AFTER THE CHANNELS NAMES THE EEG DATA where each lione is an instant of time and each column represents a channel. If the EEG contains 30 segments of 512 points each and 19 channels, then 30*512 lines of 19 columns of numbers (either float or integer) are expected
        • 2.3- Generic data formats (*.edf)
        • 2.4- Biosemi (*.bdf)
        • 2.5- EEGLAB format (*.set)
        • 2.6- MEDICID neurometrics system (*.plg)

Metadata

  • subjects_metadata: This files is optional in case generate_cross_spectra is False (0) In case generate_cross_spectra is True this must be a *.csv, *.tsv or *.mat file format. This file must contain a list of subjects with the following metadata info:
    • 1- data_code: Name of the file subject or the subfolder subject listed in raw_data_path folder. Required metadata
    • 2- reference: A string containing the name of the reference of the data. Required metadata
    • 3- age: Subject's age at recording time. Required metadata
    • 4- sex: Subject's sex. Optional metadata
    • 5- country: Country providing the data. Required metadata
    • 6- eeg_device: EEG hardware where the data was recorded. Required metadata

Preproccess Guassianize Data and Calculate z-scores and harmonize

  • typeLog: This parameter is required. Type of gaussianize method to apply.
    Options:
    • typeLog(1): for log (Boolean). By default is False: log-spectrum.
    • typeLog(2): for riemlogm (Boolean). By default is True: cross-spectrum in Tangent Space.
  • batch_correction --> List of the batch correction. You must select one closed study for calculating batch harmonized z-scores. The batch_correction you can put the number of the batch list or the batch correction name.
    The name of existed batch reference is the union between: EEG_Device+Country+Study_Year:
    • 1: ANT_Neuro-Malaysia
    • 2: BrainAmp_DC-Chengdu_2014
    • 3: BrainAmp_MR_plus_64C-Chongqing
    • 4: BrainAmp_MR_plus-Germany_2013
    • 5: DEDAAS-Barbados_1978
    • 6: DEDAAS-NewYork_1970s
    • 7: EGI-256_HCGSN_Zurich_2017-Swiss
    • 8: Medicid-3M-Cuba_1990
    • 9: Medicid-4-Cuba_2003
    • 10: Medicid_128Ch-CHBMP
    • 11: NihonKohden-Bern_1980_Swiss
    • 12: actiCHamp_Russia_2013
    • 13: Neuroscan_synamps_2-Colombia
    • 14: nvx136-Russia_2013

Optional Matrices to save

  • optional_matrix: List of matrix optional that the user can select.
    Options:
    • optional_matrix(1): FFT_coefs (Boolean): Complex matrix of FFT coefficients of nd x nfreqs x epoch length
    • optional_matrix(2): Mean_Age_Cross (Boolean): Mean for Age of Tangent Space Cross Spectra Norm

Auxiliar inputs

  • outputFolder_path: Path of output folder

HarMNqEEG Toolbox Output Description

Folder structure
The tool will save a subfolder 'derivatives' (following the struct BIDs, https://bids.neuroimaging.io/) into the folder defined by the user with the outputFolder_path parameter. Into the subfolder derivatives, the result will save by each folder subject.
Type of output files
Into each folder subject (data_code) will be saved three files: 1-log_[data_code] file, 2- HarMNqeeg_derivatives_[data_code].json and 3- HarMNqeeg_derivatives_[data_code].h5. The .h5 file is a hdf5 format. HDF5 is a data model, library, and file format for storing and managing data (More info: https://portal.hdfgroup.org/display/HDF5/HDF5). The two files will saved commun values like the name of the tool, description, and other metas info. Also will be saved attributes and matrix description
  1. Name_Subject: name of the subject
  2. Country: of the subject
  3. EEGMachine: EEG device with which the study was carried out
  4. Sex: sex of the subject
  5. Age: age of the subject
  6. MinFreq: Minimum spectral frequency (according to the data recording maybe down-sampled if higher than the expected, or the original one if lower than the expected)
  7. FreqRes: Frequency resolution (maybe down-sampled if higher than the expected, or the original one if lower than the expected)
  8. MaxFreq: Maximum spectral frequency (according to the data recording maybe down-sampled if higher than the expected, or the original one if lower than the expected)
  9. Epoch_Length: Epoch size (# of instants of times in an epoch)
  10. reRefBatch: In case of the batch_correction is not empty. The reRefBatch is the batch correction of the z-scores

The HarMNqeeg_derivatives.h5 will be saved the following matrices

Log Spectra matrices. Case typeLog(1) parameter has true or 1 value
Derivative DataType Dimensions Description
Raw Log-Spectra 2D Spectral Matrix Nc x Nf The (i, j) element of this matrix (where i,j=1:Nc and f=1:Nf) are the real log power spectral density (PSD) of channel i and frequency f. The raw spectra are transformed to the Log space to achieve quasi gaussian distribution.
Harmonized Raw Log-Spectra 2D Spectral Matrix Nc x Nf Same as the Raw Log-Spectra after harmonization to account for the batch effect.
Z-scores Log-Spectra 2D Spectral Matrix Nc x Nf The Z-scores of an individual raw Spectra. The element (i, f) of this matrix represents the deviation from normality of the log power spectral density (PSD) of channel i and frequency f.
Harmonized Z-scores Log-Spectra 2D Spectral Matrix Nc x Nf Same as the Z-scores Log-Spectra for the harmonized Raw Log-Spectra.
Cross Spectra matrices. Case typeLog(2) parameter has true or 1 value
Derivative DataType Dimensions Description
Raw Cross-Spectra in Tangent Space 3D Hermitian Tensor Nc x Nc x Nf The (i, j) element of the f-th matrix (where i,j=1:Nc and f=1:Nf) are the complex cross-spectrum between channels i and j, at frequency f, transformed to the tangent space. It is understood as a measurement of coupling between the time series of channels I and j, at frequency f.
Harmonized Raw Cross-Spectra in Tangent Space 3D Hermitian Tensor Nc x Nc x Nf Same as the Raw Cross-Spectra in Tangent Space, after harmonization to account for the batch effect.
Z-scores of Cross-Spectra in Tangent Space 3D Hermitian Tensor Nc x Nc x Nf It is the Z-scores of the Raw Cross-Spectra in Tangent Space.
Harmonized Z-scores of Cross-Spectra in Tangent Space 3D Hermitian Tensor Nc x Nc x Nf It is the Z-scores of the Harmonized Raw Cross-Spectra in Tangent Space.
FFT_coefs and Mean_Age_Cross matrices. Case optional_matrix(1) and optional_matrix(2) parameters have true or 1 value, respectively
Derivative DataType Dimensions Description
Age evaluated Norms Mean of Cross-Spectra in Tangent Space 3D Hermitian Tensor Nc x Nc x Nf This is obtained by evaluating the Mean of the normative regression for the Raw Cross-Spectra in Tangent Space, at the subject’s age.
FFT_coefs 3D Complex Matrix Nc x Nf x Ne Complex matrix of FFT coefficients of the EEG data (stored for possible needed further processing for calculating the cross-spectral matrix, like regularization algorithms in case of ill-conditioning).
Data Format description
Data Type Domain Storage Format
3D Hermitian Tensor Frequency Each slide of this tensor is a Hermitian matrix (i.e., a symmetric matrix of complex numbers, where the diagonals are real numbers) of size Nc x Nc. To save space, this matrix can be stored in a compressed way, using the upper triangular of the matrix to store the real part of the complex numbers, and the lower triangular to store the imaginary part of the complex numbers. Just remember that in the Hermitian matrices, the lower triangular are the complex conjugate of the numbers in the upper triangular. Therefore, when restoring the matrix to its original size and format, it must be considered that the imaginary part of the upper triangular has the opposite sign than the lower triangular.
2D Spectral Matrix Frequency This is a matrix of real numbers of size Nc x Nf. By columns, the numbers represent the spectra of all channels at a specific frequency, by rows they represent the spectra of the corresponding channel at all frequencies.
3D Complex Matrix Frequency This is a tensor of complex numbers of size Nc x Nf x Ne.

Where:

  1. Nc: Number of channels
  2. Nf: Number of frequencies
  3. Ne: Number of epochs
Implementation
The derivatives are stored in the HDF5 format, which efficiently allows compressing all the information in a single file, and at the same time space optimization by using a factor of data compression (7/9 in our case, high compression).

Example

About

This toolbox extends frequency domain quantitative electroencephalography (qEEG) methods pursuing higher sensitivity to detect Brain Developmental Disorders. Prior qEEG work lacked integration of cross-spectral information omitting important functional connectivity descriptors.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages