The 2021 main dataset consists of 10% of all the events recorded by the Pierre Auger Observatory that pass some high level quality selection checks (explained below). The periods of data recording (from January 2004 to August 2018 for the SD events, from January 2004 to December 2017 for the hybrid events) are the same as used for the physics results presented by the Auger Collaboration at the 36th International Cosmic Ray Conference held in 2019 in Madison, USA. These Open Data have been subjected to the same selection and reconstruction procedures used by the Auger Collaboration in their official software, Offline [Nucl. Instr. Meth. A 580 (2007) 1485–1496 (arXiv)].
Pseudo-raw data for the observed cosmic rays are released in JSON format files, one for each event, named "Auger_yydddsssssxx.json", where "yydddsssssxx" is the "id" number which identifies the event. Files consist of different sections, whose number and type depend on the kind of event. Sections and variables are listed below.
In addition, a summary file (CSV format) contains the high level information for each reconstructed event. More details are also given below. Note that events observed by multiple eyes appear once per eye in the summary file and this has to be taken into account when working with the CSV file.
Download the CSV summary file (3 MB). This file includes all the reconstruction information and should be enough for most physics analyses.
All Auger Open Data have a unique DOI that you are requested to cite in any applications or publications. The DOI of the main dataset is 10.5281/zenodo.4487613. The Auger Collaboration does not endorse any work, scientific or otherwise, produced using these data, even if available on, or linked from, this portal.
These data should be cited as: Pierre Auger Collaboration (2021), Auger Open Data release 1-2021, DOI:10.5281/zenodo.4487613
In addition to data, auxiliary data are available here, namely the list of the positions of the SD detectors and of the FD pixels, as well as the SD exposure and the FD acceptance.
The "sdMap.csv" file contains the position in UTC coordinate system of all stations of the surface detector and time period of activity, in the following format:
The "fdPixelMap.csv" contains information about the position of a pixel in the FD telescopes and its pointing direction:
The "sdExposure.csv" contains, for each SD event, the value of the exposure cumulated up to the time of its detection. Above 2.5 EeV, the calculation of the exposure is purely geometrical, obtained from the integration of the geometrical aperture over the observation time:
The FD-related "fdXmaxAcceptance.csv" and "fdXmaxResolution.csv" files are CSV versions of the Tables Appendix B.II and Appendix B.III as published in [Phys. Rev. D 90, 122005 (2014) (arXiv)] Appendix A. In these tables energy-dependent properties of the acceptance and resolution of FD-reconstructed Xmax are tabulated:
All Auger Open Data have a unique DOI that you are requested to cite in any applications or publications. These files are part of the main dataset which DOI is 10.5281/zenodo.4487613. The Auger Collaboration does not endorse any work, scientific or otherwise, produced using these data, even if available on, or linked from, this portal.
These data should be cited as: Pierre Auger Collaboration (2021), Auger Open Data release 1-2021, DOI:10.5281/zenodo.4487613
The SD and hybrid Open Data represent 10% of the data set used in the Auger physics analyses presented at the International Cosmic Ray Conference in 2019. They correspond to the events for which the identification number of the event ("sdid") ends with a zero.
The SD Open Data are the result of a set of selection criteria applied to detected events. The first requires that the water-Cherenkov detector with the highest signal is surrounded by a hexagon of six stations that are operational. This requirement ensures adequate sampling of the shower and allows for the evaluation of the aperture of the SD in a purely geometrical manner in the energy regime where the array is fully efficient [Nucl.Instrum.Meth.A 613 (2010) 29-39 (arXiv)]. As the array detection-efficiency is greater than 97% for events with energy larger than 2.5 x 1018eV arriving from a zenith angle (θ) less than 60 degrees, two other criteria are on the direction (θ < 60 deg) and on the energy (E > 2.5x1018 eV) are applied. The SD Open Data have also been subjected to criteria that guarantee good performance of operation and of detectors: for example, time intervals during which the data acquisition was unstable are excluded; photomultipliers with unstable baseline, loss of calibration data, unstable ratio between high- and low-gain channels, etc., are excluded.
The Open Data for the hybrid events are selected by requiring the fulfillment of a set of criteria on hardware status (at the level of the telescope and pixels), reconstruction of shower geometry and profile (including the uncertainties associated with the energy and depth of maximum), and atmospheric quality (including information on the presence of aerosols and clouds, and the vertical optical transparency). Specific fiducial volume cuts are applied for different analyses in order to achieve uniform acceptance and minimize the uncertainties on the corresponding observables. Events passing the selection for the energy spectrum, the calibration, and/or the Depth of Maximum analyses, are flagged accordingly ("hdSpectrum","hdCalib","hdXmax").
To illustrate the reconstruction procedures used for SD and hybrid events (and the related variables) an exemplary golden-hybrid event included in the Open Data (n. 81847956000) is used. The primary energy of the particle that initiated this event is about 56.8 1018 eV, and the zenith and azimuth angles of the event are 54.1 degrees and 53.8 degrees. This event is available on the event display. If you need other information about it, you can find them here : display.php?evid=81847956000. The figures are extracted from the event-display.
Footprint of an extensive air shower hitting SD stations (see text)
In the adjacent figure the ground view of the event is shown. The colored squares indicate the two FDs (CO and LL) that observed the shower. The colored dots correspond to SD stations which were hit at the same time by the shower particles and that have been selected for the reconstruction process ("recstations"). The areas of the dots are proportional to the logarithm of the magnitude of the signal sizes, while the colors represent the time of arrival ("t") at the different stations (green: early stations; red: late stations). The grey dots indicate detectors which have recorded no signal, while the open grey circles represent those which, even if a signal was recorded, were not part of the shower event ("isSelected=0"), but due to an unassociated cosmic ray (usually a muon). The position of the core ("x", "y", "z"), where the highest signal would be observed, is marked by the head of the blue arrow, which indicates the azimuth angle ("phi") of the shower direction of arrival.
The timing and the size of the signal measured in each selected station, as well as the positions of the stations (link to the file with the stations coordinates), are the inputs for the reconstruction of the SD events [JINST 15 (2020) P10021 (arXiv)].
The signal timing and signal sizes are computed from the output of the flash analogue-to-digital converters (FADCs) associated with each photomultiplier (PMT). Examples of such signals in two stations in the event are displayed in the figure below.
FADC traces of the PMTs signals in two different stations hit by the shower
The FADC trace, shown for each of the 3 PMTs with different colors, are for a station 565 m away from the core (top figure) and one 2602 m away (bottom figure). They are expressed in terms of VEMs (Vertical Equivalent Muons) where one VEM is the signal due to a single muon traversing a detector. The FADCs are digitised so as to give a measurement every 25 ns. The traces from the closer detector are relatively smooth and are compressed into ~1000 ns while at the greater distance the signal arrives over a period of ~4000 ns. Most of the large spikes seen in the more distant FADC signals are probably due to muons which cross the detector, though high-energy electrons that would penetrate the full depth of the water may be present close to the shower axis and are expected to arrive early in the time window. More typically, however, the mean energy of an electron or photon in a shower at several 100 metres from the shower axis is ~10 MeV in contrast to typical muon energies of > 500 MeV. The energy loss of a relativistic particle that traverses a tank in a vertical direction is ~250 MeV.
The signal timing, in terms of start- and stop-times (located at "signalStartBin", "signalStopBin" in the trace, respectively), is determined from a separate analysis of the structures of the FADC traces, after the subtraction of the baselines, in the high-gain channel of each working PMTs in a station. By merging the extracted information from the PMTs, the start-time ("t") that is determined represents the best estimate of the beginning of the passing shower front. The procedure applied to determine the stop-time ensures that all particles belonging to the shower are included while excluding as many accidental signals as possible. The signal size ("signal") is obtained by integrating the final trace (converted in VEMs), which consists of the bin-by-bin average of the traces of the PMTs in the high-gain channel ("sat=0"), or low-gain channel if the high-gain is saturated ("sat=1", "sat=2"), between the determined start and stop times.
To initiate the reconstruction of the zenith and azimuth angles of the shower arrival direction ("theta", "phi"), an estimation of the location of the core on the ground is obtained as the signal-weighted center-of-mass of the selected stations in an event. Then the start-times of the signals in each station are fitted to a model that describes the shower particles as moving with the speed of light in a curved shower front. Thus the two directional cosines and the time at which the core strikes the ground are determined. The radius of curvature ("R") is also set as a free parameter when five or more stations are selected for the event reconstruction. The arrival direction is determined to a precision of about 1 degree, a figure that falls as the energy (and hence the multiplicity of stations triggered) rises.
Fall-off of the signals size as a function of the distance to the shower core (blue dots) fitted with the lateral distribution function (yellow line)
This reconstruction of the arrival direction of the shower is followed by a fit to a lateral distribution function (ldf), from which the energy estimator and core position ("x", "y", "z") are reconstructed. In the adjacent figure the fall-off of the signal sizes (blue dots) with distance ("spDistance"), in a plane perpendicular to the direction of the shower, is shown together with a yellow line that defines the ldf used to fit the event. The signal at 1000 m, S(1000), ("s1000"), which can be found accurately independent of knowledge of the ldf, represents the shower size and acts as a surrogate for the energy of the primary particle which has initiated the shower. The uncertainty in this measurement decreases from 15% at a shower size of 10 VEM (roughly corresponding to E ~ 2.5 x 1018 eV) to 5% at the highest shower sizes. The uncertainty on the impact point is of order 50 m. S(1000) is influenced by changes in atmospheric conditions that affect shower development [JINST 12 (2017) P02006 (arXiv)] and by the geomagnetic field that impacts on the shower particle-density [JCAP11 (2011) 022 (arXiv)]. Corrections of order 2% and 1% for the atmospheric and geomagnetic effects ("wcorr", "gcorr"), respectively, are made to S(1000).
For a cosmic ray of a given energy, S(1000) depends on the zenith angle because, once it has passed the depth of shower maximum, a shower is attenuated as it traverses the atmosphere. The intensity of cosmic rays, defined as the number of events per steradian above some S(1000) threshold, is thus dependent on zenith angle. Given the highly isotropic flux, the intensity is expected to be independent from the zenith angle after correction for the attenuation. Based on this principle, an empirical procedure, the so-called Constant Intensity method, is used to determine the attenuation curve as a function of the zenith angle and therefore a shower size estimator, S38, independent of the zenith angle. It can be thought of as being the S(1000) that a shower would have produced had it arrived at 38 degrees, the median angle from the zenith. The energy ("energy") associated with the SD event is derived from a calibration between the corrected S38 ("s38") and the energy measured by the FD ("totalEnergy") in golden-hybrid events. The SD energy resolution is about 20% at 2x1018 eV and about 7% above 2x1019 eV. The systematic uncertainty is 14% [Physical Review D 102, 062005 (2020) (arXiv)].
In the adjacent figure the shower images observed with the LL and CO fluorescence telescopes are displayed. The colors show the time at which the light reaches each pixel ("pixelTime"). The trigger conditions require some pixels to be aligned, but background light can also be recorded (the variable "pixelStatus" will also tell up to which level they are used to reconstruct the shower).
Together with the telescope position, the direction that the pixels point to in the sky (shown in Elevation and Azimuth angles -- from fdPixelMap.csv) determine a plane containing the shower development in the atmosphere ("SDP") . The shower axis within this plane is obtained from the time of arrival of the light at the camera ("TimeFit"), summing the contributions of two distances traveled at the speed of light: the distance crossed by the shower front to a point where light is emitted and the distance this light crosses to the telescope. The time at which the shower front reaches the ground, given by the timing information from the SD station with the highest signal ("hottestStationId"), sets a strong constraint on the hybrid geometrical reconstruction, (providing "theta", "phi", "x", "y", "z"). For this event, the hottest SD station is found at ("distSdpStation") around 500 m from the shower detector plane defined with Los Leones and around 250 m for the plane defined with Coihueco (at slightly larger distances from the reconstructed shower axis, "distAxisStation").
The next figure shows the energy deposited ("energyDepositProf") in the atmosphere as a function of the slant depth crossed by the cosmic ray ("atmDepthProf"), as seen independently in the two FD stations. LL is shown in blue and CO in green: the density of points and the uncertainty changes with the position from which the shower is seen.
The integral of this curve gives a direct measurement of the calorimetric energy ("calEnergy") of the primary particle, while the depth at which the maximum of the energy deposition occurs ("xmax") is used to infer the primary particle properties. The reconstruction of each point in the profile from the light seen on the camera ("pixelCharge") depend on the distance to the telescope and on the height in the atmosphere at which the energy is deposited ("distXmax" and "heightXmax").
The detected fluorescence light is proportional to the energy deposition and is emitted isotropically. Cherenkov light is emitted in the forward direction and enters the telescope directly when the shower axis is viewed from the telescope at a small angle ("minViewAngle"). It can also be scattered and reach the telescope at later times, which usually accounts for a fraction of the total detected photons ("cherenkovFraction"). For this example, the minimum viewing angles are 18º and 52º, at LL and CO, respectively; with corresponding Cherenkov fractions of 17% and 7%. Both Fluorescence and Cherenkov light are used in the reconstruction [Nucl.Instrum.Meth.A 798 (2015) 172-213 (arXiv)]. The light is attenuated and scattered when crossing the atmosphere, so both the distance traveled and the atmospheric parameters must be taken into account when estimating the expected number of detected photons that correspond to the emission at each position in the shower development, which is proportional to the deposited energy. The energy deposited per unit depth (dE/dX) in the atmosphere increases, at first, with the multiplication of particles in the shower, and then decreases as the energy loss by ionisation starts to exceed that by Bremsstrahlung. This behavior gives rise to a reasonably universal profile shape, where the position of the maximum Xmax depends on the primary particle type (and its energy). The shape of the profile is described by xmax and the corresponding dEdXmax and two other variables (upsL and uspR) [JCAP 03 (2019) 018 (arXiv)]. The integration of the profile provides a direct calorimetric measurement of the total energy of the primary cosmic ray (calEnergy), pending the correction from the energy taken away by muons (that can be partially detected in the SD) and neutrinos (which will go undetected) [Phys. Rev. D 100, 082003 (2019) (arXiv)] to finally obtain the totalEnergy.
Data are released in JSON format files, one for each event, named "Auger_yydddsssssxx.json", where "yydddsssssxx" is the "id" number which identify the event. Files consist of different sections, whose number and contents depend on the kind of event.
The datasets consist of an archive of JSON files (one per event). Events (files) in which the JSON data have a section "sdrec" have been reconstructed using data from the SD. Events in which the section "fdrec" is present have been reconstructed using data from the FD (brass-hybrid events). Events in which both the "sdrec" and "fdrec" sections are present have been independently reconstructed using data from the SD and FD (golden-hybrid events). Events which have been observed by more than one FD site are called multi-eye hybrid events.
|Event type||Number of events|
Section "meta" is included in all files and contains general details about the release and softwares used for data reconstruction.
Sections "info" and "flags" are present in all events and give generic information about the event and which reconstruction quality flags are passed.
Section "fdrec" is provided only for hybrid events and is a collection of reconstruction parameters for each telescope that saw the event.
Section "eyes" is a collection of the FD sites which detected the event.
Section "sdrec" is provided only for SD reconstructed events.
Section "stations" is provided for all events. "stations" is an array with size equal to the number of triggered stations.
A summary file (CSV format) contains information for each reconstructed event.
Events in this file are listed according to their 'id', one for each line. In case of multi-eye hybrids, the event is duplicated in additional rows, one for each "eye".
Each line consists of 80 comma separated numeric-fields, a subsample of the variables contained in the json files: variables of the sections 'sdrec' are characterized by the prefix 'sd_', variables of the sections 'fdrec' and 'Eyes' are are characterized by the prefix 'fd_'.
The number of fields in each line is the same for all types of events. However if a section is not present in the json file the corresponding fields are empty.
The summary file also contains the value of the SD integrated exposure ("sd_exposure") for those reconstructed SD events with energy greater than 2.5×1018 eV, the energy-threshold for full-efficiency of the surface detector.
|meta||type||name of the release|
|release||version of the release: it defines the event sample|
|format||version of data format|
|reconstruction||software, version||software-framework used for the event reconstruction and its version|
|info||id||event identification number: YYDDDSSSSSXX|
- YY : last 2 digits of year
- DDD : day number between 1 and 366
- SSSSS: second of the current DAY between 0 and 86399
- XX : order of the event at the current second
Time is expressed in UTC+12h., i.e., the day starting at noon
|sdid||event number from data acquisition|
|date||date and time in ISO 8601 format|
|1: event is used in standard SD analysis|
|1: event used for hybrid energy spectrum analysis|
|1: event used for hybrid energy calibration analysis|
|1: event used for hybrid Xmax analysis|
|1: a multi-eye event|
|Indicates the FD site|
'1': Los Leones
'2': Los Morados
'3': Loma Amarilla
|gpsnanotime [ns]||The GPS time of the event within its GPS second|
|1: Eye used for the spectrum analysis|
|1: Eye used for energy calibration analysis|
|1: Eye used for Xmax analysis|
|The zenith and azimuth angles|
|Uncertainties in zenith and azimuth angles|
|Galactic longitude and latitude of the event|
|Right ascension and declination of the event|
|Total energy of the primary particle initiating the event|
|Uncertainty in the total energy of the event|
|Calorimetric energy of the event|
|Uncertainty in the calorimetric energy of the event|
|Position of the maximum of the energy deposition in the atmosphere|
|Uncertainty in the position of the maximum of the shower development in the atmosphere|
|Height of Xmax above the ground|
|Distance of Xmax to FD eye|
|Maximum energy deposit|
|Uncertainty in the maximum energy deposit|
|x, y, z|
|Coordinates of the shower core projected at ground level (site coordinates system)|
|dx, dy, dz|
|Uncertainty in the coordinates of the shower core projected at ground level (site coordinates system)|
|Eastward and Northward coordinate of the shower core projected at ground level (UTM coordinates system)|
|Altitude of the shower core projected at ground level (UTM coordinates system)|
|cherenkovFraction||Fraction of detected light from Cherenkov emission|
|Light emission angle from the shower towards the FD eye|
|Universal shower profile shape parameter L|
|uspR||Universal shower profile shape parameter R|
|Uncertainty in the Universal Shower Profile parameter L|
|duspR||Uncertainty in the Universal Shower Profile parameter R|
|hottestStationId||id of the SD station with the highest recorded signal|
|Distance of the hottest station to the plane that includes the shower axis and the eye position (SDP)|
|Distance of hottest station to the reconstructed shower axis in the shower plane|
|Id of the FD site:|
1: Los Leones
2: Los Morados
3: Loma Amarilla
|name||Name of the FD site|
|Array of slant depth points measured. The array dimension depends on the number of triggered pixels|
|Array of energy deposit at each slant depth, obtained from the shower profile fit. The array dimension depends on the number of triggered pixels|
|Array of the uncertainty in the energy deposit at each slant depth, obtained from the shower profile fit. The array dimension depends on the number of triggered pixels|
|Array of the pixel ids. The array dimension depends on the number of triggered pixels. The camera of each telescope consists of 440 pixels. The whole eye is composed of 440 x 6 pixels|
|Array of the times of the signal centroid in each pixel. The array dimension depends on the number of triggered pixels|
[number of photons at telescope aperture]
|Array of the light detected in each pixel (proportional to the charge). The array dimension depends on the number of triggered pixels|
|Array that indicates the status of the pixel|
3=SDP (shower detector plane)
|Uncertainty in the zenith angle|
|Uncertainty in the azimuth angle|
|Uncertainty in the energy|
|Galactic longitude and latitude|
|Right ascension and declination|
|Coordinate of the shower core (site coordinates system)|
|Uncertainty in the coordinates of the shower core (site coordinates system)|
|Eastward-,northward-coordinate and altitude of the shower core (UTM coordinates system)|
|Radius of curvature of the shower|
|Uncertainty in the radius of curvature of the shower|
|Expected signal at 1000 m from the core, S(1000), used as estimator of the energy|
|Uncertainty in S(1000)|
|Signal produced at 1000 m by a shower with a zenith angle of 38 deg|
|Geomagnetic correction to S(1000)|
|Weather correction to S(1000)|
|beta,gamma||Slope parameters of the fitted LDF|
|chi2||Chi-square value of the LDF fit|
|ndf||Number of degrees of freedom in the LDF fit|
|geochi2||Chi-square value of the geometric fit|
|geondf||Number of degrees of freedom in the geometric fit|
|nbstat||Number of triggered stations used in reconstruction|
|recstations||List of ids of the triggered stations used in reconstruction|
|Start time of the signal|
|Uncertainty in the start time|
|signalStartBin,signalStopBin||FADC trace bins that indicate the start and stop of the signal|
|Integrated signal in the FADC traces|
|Uncertainty in the integrated signal|
|0: high-gain and low-gain channels not saturated|
1: high-gain channel saturated
2: high-gain and low-gain channels saturated
|1: the station is used in the reconstruction|
|Distance of the station to the core in the plane perpendicular to the shower axis (shower plane)|
|Uncertainty in the distance of the station to the core in the plane perpendicular to the shower axis (shower plane)|
|FADC traces from each photomultiplier. The length of each FADC trace is 768 bins. A bin corresponds to 25 ns|