Breakthrough Listen announces release of 400 terabytes of Green Bank Telescope data from the repeating fast radio burst FRB 121102
Fast radio bursts (FRBs) are one of the most mysterious classes of objects in the Universe. They originate outside our Galaxy, only last for around a millisecond, and can release as much energy in a fraction of a second as our sun does in an entire year. As of the end of 2018, about 65 FRBs had been discovered, out of which only one (FRB 121102) was known to show repeated pulses. Last week, the CHIME collaboration announced the discovery of 13 new FRBs, including the discovery of another “repeater” (FRB 180814). As with FRB 121102, FRB 180814 exhibits remarkable spectral and temporal variability. This variability, together with the ultimate source of FRBs, remains unexplained.
In August 2017, the Breakthrough Listen program conducted a five-hour observation of FRB 121102 using the Green Bank Telescope, as part of its campaign of targeting exotic and anomalous astronomical phenomena. The Listen team recorded around 380 TB of baseband voltage data collected with the Breakthrough Listen digital backend (MacMahon et al. 2018, PASP, 130, 044502), observing across 4 to 8 GHz. These unique data have already led to a number of discoveries. An initial search found around 21 pulses, with emission seen up to 8 GHz, the highest frequency FRBs have ever been observed (Gajjar et al. 2018, ApJ, 863, 9). These pulses were used to investigate the polarization properties of the source, and it was found that FRB 121102 is embedded in a highly dense and magnetized environment (Michilli D. 2018, Nature, 553, 182). Later in 2018, we used novel machine learning tools to search these data and found 72 new bursts (Zhang et al. 2018, ApJ, 866, 149).
Today we are releasing the entire baseband raw-voltage dataset collected during these observations, totaling nearly 400 TB. This data release marks the first time raw voltage data for FRB detections have been released to the public. We hope that the citizen science and engineering community will help us to utilize these data to their maximum potential and extract the additional insights in to FRBs that surely lie buried within them.
The data are available for download through the Breakthrough Initiatives Open Data Portal: under target name FRB121102.
The files are large and in technical formats, as described below.
Good hunting! If you have any questions on these datasets, feel free to contact me at [email protected]
DATA FORMAT:
For these observations, raw voltages were recorded using the Breakthrough Listen digital backend and data recording system, as described in this paper.
The Breakthrough Listen team uses the RAW file format to store channelized voltages from radio telescopes. This format is based on the GUPPI RAW format, which was originally developed to store pulsar data for the “GUPPI” pulsar processor (hereafter; raw-file). Both RAW and GUPPI RAW are loosely related to the FITS file format. The basic structure of a raw-file is a series of "header data units". A header data unit consists of a header section followed by a data section. The header section consists of ASCII text. The data section is binary. In some cases, the header section is followed by padding bytes that are neither part of the header nor part of the data. Every header section is followed by a data section. The header section contains metadata that describes the data section and provides other relevant details (e.g. time, sky position, frequency, etc.) that correspond to the voltage samples in the data section. A detailed description of the RAW format can be found at here and here.
FRB 121102 DATA:
These observations were conducted in 10 sessions, each 30 minutes in length, and are denoted by scan numbers 11 to 20. Scan number 10 was used for noise diode calibration (with switching frequency of 25 Hz or 0.04 seconds) recording on the source for 1 minute. The number of bursts already found during these observations are listed in Table 2 in this paper with the arrival time listed in seconds from the beginning of the observations. Each scan is precisely 1800 seconds long, so for any given burst, the relevant scan number can be determined from the arrival time. Data were recorded across 32 individual compute nodes spanning the entire 4 to 8 GHz of bandwidth. Each node covered a bandwidth of 187.5 MHz. Node names and the corresponding range of frequencies are listed below.
Node Start Frequency (MHz) End Frequency (MHz)
blc00 9220.21484375 9032.71484375
blc01 9032.71484375 8845.21484375
blc02 8845.21484375 8657.71484375
blc03 8657.71484375 8470.21484375
blc04 8470.21484375 8282.71484375
blc05 8282.71484375 8095.21484375
blc06 8095.21484375 7907.71484375
blc07 7907.71484375 7720.21484375
blc10 7907.71484375 7720.21484375
blc11 7720.21484375 7532.71484375
blc12 7532.71484375 7345.21484375
blc13 7345.21484375 7157.71484375
blc14 7157.71484375 6970.21484375
blc15 6970.21484375 6782.71484375
blc16 6782.71484375 6595.21484375
blc17 6595.21484375 6407.71484375
blc20 6595.21484375 6407.71484375
blc21 6407.71484375 6220.21484375
blc22 6220.21484375 6032.71484375
blc23 6032.71484375 5845.21484375
blc24 5845.21484375 5657.71484375
blc25 5657.71484375 5470.21484375
blc26 5470.21484375 5282.71484375
blc27 5282.71484375 5095.21484375
blc30 5282.71484375 5095.21484375
blc31 5095.21484375 4907.71484375
blc32 4907.71484375 4720.21484375
blc33 4720.21484375 4532.71484375
blc34 4532.71484375 4345.21484375
blc35 4345.21484375 4157.71484375
blc36 4157.71484375 3970.21484375
blc37 3970.21484375 3782.71484375
File names of these raw data files contain helpful information. An example of the raw filename format is:
blc00_guppi_57991_49836_DIAG_FRB121102_0010.0000.raw
This is parsed as follows (with _
as the delimiter):
blc00
: the server node which recorded this data (blc00-blc37 octal)
guppi
: keyword noting this was recorded using guppidaq software
57991
: the modified Julian date (MJD) of the start of this observation
49836
: the seconds since midnight (UT) of this observation
DIAG_FRB121102
: the target name
0010.0000.raw
: a sequence and suffix, which is further parsed as (with .
as the delimiter):
0010
: is the "sequence number" or scan number of this target during a single observation session, and is an arbitrary increasing integer
0000
: is the data order number. The first data file starts at 0000
, and when it "rolls over" the next file is 0001
, etc.
raw
: to denote this is a raw data product
We have also made filterbank format data products available for each scan. These files concatenate all data for a given scan/compute node pair and reduce it to a stream of n-bit numbers corresponding to total intensity data for multiple polarization and/or frequency channels.
The file names for the reduced filterbank products are similar:
blc00_guppi_57991_49836_DIAG_FRB121102_0010.gpuspec.0000.fil
The difference being that part of the sequence and suffix (0000.raw
from above) has been replaced by:
gpuspec
: keyword noting this was reduced using gpuspec software
0000
: code noting frequency/time resolution, where:
0000
: fine frequency (~3Hz frequency bins, ~18 second time bins)
0001
: fine time (~360KHz frequency bins, ~350 microsecond time bins)
0002
: mid frequency/time (~3KHz frequency bins, ~1 second time bins)
fil
: to denote this is a filterbank data product
In order to know the frequency (and other necessary information) of the corresponding raw file, one can use Linux ‘fold’ command.
fold -w 80 <raw file> | more
Each scan on each node is further divided into ~23 second segments to keep the file sizes manageable. The Breakthrough Listen team has developed this python package to display and manipulate raw files using python. The most important routine is extract_blocks.py
, which can be used to extract raw voltages around a given time. For example, burst number 1 occurred 16.22 seconds after the recording started (i.e. scan 11). If we want to extract a few seconds of data around this burst from a single compute node (for example blc00
), we can use extract_blocks.py
with the following command line arguments.
python extract_blocks.py <path to all raw files from blc00> blc00_guppi_57991_49905_DIAG_FRB121102_0011 15.7505 17.250 <output path>
This command will examine all the raw files from scan 11 for the blc00 compute node and find the appropriate raw file to extract the requested 1.5 seconds of data. If the given time interval is spread across two raw files, it can combine the data appropriately.
Once individual files are extracted from all compute nodes (32 extracted raw files for a single burst), one can use the splicer_raw.py
routine to combine these raw data files into one single contiguous raw data file. This single raw file can then be coherently dedispersed to any desired spectral and temporal resolution. A description of how to perform coherent dedispersion on RAW data, as well as perform other tasks with standard pulsar tools, can be found here.
Data are released under the CC BY 4.0 license. If you make use of these datasets for academic work, please cite the following papers:
Gajjar et al. 2018, ApJ, 863, 2
Zhang et al. 2018, ApJ, 866, 149