Deep learning for the detection of γ-ray sources:
Bridging the gap between training simulations and real telescope observations using unsupervised domain adaptation

CNRS@CREATE DESCARTES PROGRAM OPEN TALK

Author Under the supervision of
Michaël Dell'aiera (LAPP, LISTIC) Thomas Vuillaume (LAPP)
Alexandre Benoit (LISTIC)
25th April 2024
dellaiera.michael@gmail.com

Presentation outline


  • Introduction
  • The challenging transition from simulations to real data
  • Data adaptation
  • Multi-modality
  • Domain adaptation
  • Transformers
  • Conclusion, perspectives

Introduction

Contextualisation


**[Cherenkov Telescope Array (CTA)](https://www.cta-observatory.org/)**

* Exploring the Universe at very high energies * γ-rays, powerful messenger to study the Universe * Next generation of ground-based observatories * Large-Sized Telescope-1 (LST-1) operational

**[GammaLearn](https://purl.org/gammalearn)**

* Collaboration between LAPP (CNRS) and LISTIC * Fosters innovative methods in AI for CTA * Evaluate the added value of deep learning * [Open-science](https://gitlab.in2p3.fr/gammalearn/gammalearn)

Fig. Principle of detection


Fig. Summary of the principle of detection.

Particle distribution


**Many particles create atmospheric showers**

Fig. Energy flux of protons, electrons and gammas

GammaLearn workflow


Fig. The detection workflow

Physical attribute reconstruction


**Real labelled data are intrinsically unobtainable**

→ Training relying on simulations (Particle shower + instrument response)

Standard analysis

* Machine learning * Morphological prior hypothesis: Ellipsoidal integrated signal * Image cleaning

GammaLearn

* Deep learning (CNN-based) * No prior hypothesis * No image cleaning

Fig. Before cleaning, after cleaning, and moments computation

The challenging transition from simulations to real data

Simulations and real data discrepencies


**Simulations are approximations of the reality**

Fig. Variation of light pollution
Fig. Stars in the FoV, dysfunctioning pixels

Simulations and real data discrepencies


Fig. Simulated pointing positions
Fig. Count map of gamma-like events around Markarian 501 (Jacquemont et al.)

Data adaptation

Data adaptation


**Modify the simulations to fit the acquisitions**

Fig. γ-PhysNet architecture (Jacquemont et al.)

Setup


Fig. Light pollution distributions (Simulations, Crab, Markarian, and data adaptation)

Train

Test

Labelled Labelled
MC+P(λ)

ratio=50%/50%
MC+P(λ)

ratio=50%/50%

Tab. Dataset composition

Results with data adaptation on simulations


Setup


Train

Test

Source
Labelled
Target
Unlabelled

Unlabelled
MC+P(λ)

ratio=50%/50%
Real data

ratio=1γ for > 1000p
Real data

ratio=1γ for > 1000p

Tab. Dataset composition

Results with data adaptation on Crab (real data)


Multi-modality

Multi-modality


**Modify the model to make it robust to noise**

Fig. γ-PhysNet-CBN architecture

Setup


Fig. Light pollution distributions (Simulations, Crab, Markarian, and data adaptation)

Train

Test

Labelled Labelled
MC+P(λ(t))

ratio=50%/50%
MC+P(λ)

ratio=50%/50%

Tab. Dataset composition

Results with multi-modality on simulations


Setup


Train

Test

Source
Labelled
Target
Unlabelled

Unlabelled
MC+P(λ(t))

ratio=50%/50%
Real data

ratio=1γ for > 1000p
Real data

ratio=1γ for > 1000p

Tab. Dataset composition

Results with multi-modality on Crab (real data)


Domain adaptation

Domain adaptation


**[Domain adaptation](https://arxiv.org/abs/2009.00155): Set of algorithms and techniques to reduce domain discrepancies**

* Take into account unknown differences between the source (labelled, simulations) and target (unlabelled, real data) domains * Somehow include unlabelled real data in the training * Selection, implementation and validation of [DANN](https://arxiv.org/abs/1505.07818) (focus of this talk), [DeepJDOT](https://arxiv.org/abs/1803.10081), [DeepCORAL](https://arxiv.org/abs/1607.01719)

Fig. Domain confusion in the feature space

Domain adaptation


**Modify the model to make it domain agnostic**

Fig. γ-PhysNet-DANN architecture

Multi-task balancing


**[Multi-task balancing](https://arxiv.org/abs/1707.08114) (MTB): Simulateneous optimization of multiple tasks**

* In opposition to single-task learning * Correlated tasks help each other to learn better * Conflicting gradients (amplitude and/or direction) * Baseline: * Equal Weighting (EW) * Selection and implementation: * [Uncertainty Weighting](https://arxiv.org/abs/1705.07115) (UW) * [GradNorm](https://arxiv.org/abs/1711.02257) (GN)

Fig. Multi-task balancing

Challenging multi-task optimization


Cosine similarity between the gradients at the last shared layer
Gradient norm at the last shared layer

Setup


Fig. Domain confusion

Train

Test

Source
Labelled
Target
Unlabelled

Unlabelled
MC

ratio=50%/50%
MC+P(λ)

ratio=50%/50%
(No label shift)
MC+P(λ)

ratio=50%/50%

Tab. Dataset composition

Results with domain adaptation on simulations


Setup


Fig. Domain confusion

Train

Test

Source
Labelled
Target
Unlabelled

Unlabelled
MC

ratio=50%/50%
MC+P(λ)

ratio=1γ for > 1000p
(Label shift)
MC+P(λ)

ratio=50%/50%

Tab. Dataset composition

Results with domain adaptation on simulations


Results with domain adaptation on simulations


Setup


Train

Test

Source
Labelled
Target
Unlabelled

Unlabelled
MC+P(λ)

ratio=50%/50%
Real data

ratio=1γ for > 1000p
Real data

ratio=1γ for > 1000p

Tab. Dataset composition

Results with domain adaptation on Crab (real data)


Transformers

Masked Auto-Encoder (MAE)


Fig.

MAE applied to LST


Fig.

Event reconstruction example 1


Fig. Left to right: Initial, masked, reconstructed

Event reconstruction example 2


Fig. Left to right: Initial, masked, reconstructed

Event reconstruction example 3


Fig. Left to right: Initial, masked, reconstructed

Event reconstruction example 4


Fig. Left to right: Initial, masked, reconstructed

Event reconstruction example 5


Fig. Left to right: Initial, masked, reconstructed

Results on simulations


Conclusion and perspectives

Conclusion & Perspectives


  • Novel techniques (Multi-modality, Domain adaptation) to solve simulations vs real data discreprency
    • Tested on simulations, in different settings (Light pollution and label shift)
    • Tested on real data (Crab), both moonlight and no moonlight conditions
  • Standard analysis and γ-PhysNet strongly affected by moonlight
  • Data adaptation and multi-modality increase the performance in degraded conditions
  • The benefits of domain adaptation are not well established yet
    • Advantage demonstrated on MC data with different level of NSB
    • Best results obtained on tuned data and on par with γ-PhysNet
    • γ-PhysNet-CDANN allows to recover from label shift
  • γ-PhysNet-CBN with pedestal image conditioning
  • γ-PhysNet-Transformers with domain adaptation
  • Generalization of the methods on other sources

Acknowledgments


- This project is supported by the facilities offered by the Univ. Savoie Mont Blanc - CNRS/IN2P3 MUST computing center - This project was granted access to the HPC resources of IDRIS under the allocation 2020-AD011011577 made by GENCI - This project is supported by the computing and data processing ressources from the CNRS/IN2P3 Computing Center (Lyon - France) - We gratefully acknowledge the support of the NVIDIA Corporation with the donation of one NVIDIA P6000 GPU for this research. - We gratefully acknowledge financial support from the agencies and organizations listed [here](https://www.cta-observatory.org/consortium\_acknowledgment). - This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 653477 - This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824064