ALICE Data Acquisition
The Large Hadron Collider (LHC) will make protons or ions collide not only at a much higher energy but also at a much larger rate than ever before. To digest the resulting wealth of information, the four LHC experiments have to push data handling technology well beyond the current state-of-the-art, be it in trigger rates, data acquisition bandwidth or data archive. ALICE, the experiment dedicated to the study of nucleus- nucleus collisions, had to design a data acquisition system that operates efficiently in two widely different running modes: the very frequent but small events, with few produced particles encountered in the pp mode, and the relatively rare, but extremely large events, with tens of thousands of new particles produced in ion operation (L = 1027 cm-2 s-1 in Pb-Pb with 100 ns bunch crossings and L = 1030-1031 cm-2 s-1 in pp with 25 ns bunch crossings).
The ALICE data acquisition system needs, in addition, to balance its capacity to record the steady stream of very large events resulting from central collisions, with an ability to select and record rare cross-section processes. These requirements result in an aggregate event building bandwidth of up to 2.5 GByte/s and a storage capability of up to 1.25 GByte/s, giving a total of more than 1 PByte of data every year. As shown in the figure, ALICE needs a data storage capacity that by far exceeds that of the current generation of experiments. This data rate is equivalent to six times the contents of the Encyclopædia Britannica every second.
The figure above shows the architecture of the ALICE trigger and data acquisition systems. For every bunch crossing in the LHC machine, the Central Trigger Processor (CTP) decides within less than one microsecond whether to collect the data resulting from a particular collision. The trigger decision is distributed to the front-end electronics (FEE) of each detector via the corresponding Local Trigger Unit (LTU) and an optical broadcast system: the Trigger, Timing and Control system (TTC). Upon reception of a positive decision, the data are transferred from the detectors over the 400 optical Detector Data Links (DDL) via PCI adapters (RORC) to a farm of 300 individual computers; the Local Data Concentrator/Front-End Processors (LDC/FEP). The several hundred different data fragments corresponding to the information from one event are checked for data integrity, processed and assembled into sub events. These sub events are then sent over a network for the event building to one of the 40 Global Data Collector computers (GDC), which can process up to 40 different events in parallel. 20 Global Data Storage Servers (GDS) store the data locally before their migration and archive in the CERN computing center where they become available for the offline analysis. The hardware of the ALICE DAQ system is largely based on commodity components: PC's running Linux and standard Ethernet switches for the eventbuilding network. The required performances are achieved by the interconnection of hundreds of these PC's into a large DAQ fabric. The software framework of the ALICE DAQ is called DATE (ALICE Data Acquisition and Test Environment). DATE is already in use today, during the construction and testing phase of the experiment, while evolving gradually towards the final production system.
DDL and RORC
The Detector Data Link (DDL) is the common hardware and protocol interface between the front-end electronics and the DAQ system. The DDL is used to transfer the raw physics data from the detectors to the DAQ, and to control the detector front-end electronics or download data blocks to this electronics. The current version of the DDL is based on electronics chips used for the 1 Gbit/s Fibre Channel physical layer (Top picture). The next version is being developed with 2.5 Gbit/s electronics (Middle picture). The interface between the DDL and the I/O bus of the Local Data Concentrator (LDC) is realized by the Read-Out Receiver Card (RORC) (right picture). The current RORC is based on PCI 32 bits 33 MHz. It acts as a PCI master and is using direct-memory access to the LDC memory. It reaches the maximum physical PCI speed (132 MByte/s) as shown on the performance plot. The next RORC version will use PCI 64 bits 66 MHz.
The DATE framework is a distributed process-oriented system. It is designed to run on Unix platforms connected by an IP-capable network and sharing a common file system such as NFS. It uses the standard Unix system tools available for process synchronisation and data transmission. The DATE system performs different functions:
- The Local Data Concentrator (LDC) collects event fragments transferred by the DDL's into its main memory and reassembles these event fragments into subevents. The LDC is also capable of doing local data recording (if used in standalone mode).
- The Global Data Collector (GDC) puts together all the sub-events pertaining to the same physics event, builds the full events and archives them to the mass storage system. The Event Building and Distribution System (EBDS) is balancing the load amongst the GDC's.
- The DATE run-control controls and synchronises the processes running in the LDCs and the GDCs.
- The monitoring programs receive data from LDCs or GDCs streams. They can be executed on any LDC, GDC or any other machine accessible via the network. DATE includes interfaces with the Trigger and the HLT systems.
AFFAIR (A Flexible Fabric and Application Information Recorder) is the performance monitoring software developed by the ALICE Data Acquisition project. AFFAIR is largely based on open source code and is composed of the following components: data gathering, inter-node communication employing DIM, fast and temporary round robin database storage, and permanent storage and plot generation using ROOT. Real time data is monitored via a PHP generated web interface. AFFAIR is successfully used during the ALICE Data Challenges. It is monitoring up to one hundred nodes and generating thousands of plots, accessible on the web.
The ALICE experiment Mass Storage System (MSS) will have to combine a very high bandwidth (1.25 GByte/s) and the capacity to store huge amounts of data, more than 1 Pbytes every year. The mass storage system is made of:
- Global Data Storage (GDS) performing the temporary storage of data at the experimental pit;
- Permanent Data Storage (PDS) for long-term archive of data in the CERN Computing Center;
- The Mass Storage System software managing the creation, the access and the archive of data.
Several disk technologies are being tested by the ALICE DAQ for the GDS: standard disk storage, Network Attached Storage (NAS) and Storage Area Network (NAS). The current baseline for the PDS is to use several magnetic tape devices in parallel to reach the desired bandwidth. A tape robot is coupled with the tape devices to realize the automatic mounting and dismounting of the tapes. The MSS software is the CASTOR system designed and developed in the CERN/IT division.
Since 1998, the ALICE experiment and the CERN/IT division have jointly executed several large-scale high throughput distributed computing exercises: the ALICE Data Challenges (ADC). The goals of these regular exercises are to test hardware and software components of the data acquisition and computing systems in realistic conditions and to execute an early integration of the overall ALICE computing infrastructure. The fourth ALICE Data Challenge (ADC IV) has been performed at CERN in 2002. DATE has demonstrated aggregate performances of more than 1 GByte/s (top figure). The data throughput to the disk server has reached 350 MByte/s (middle figure) and the goal is to reach 200 MBytes/s to tape. The bottom figure shows the consequence of the load balancing on the number of events built on different GDC's.
The goals of the simulation of the Trigger, DAQ and HLT systems design are to verify the overall system design and to evaluate the performances of the experiment for a set of realistic data taking scenarios.
The ALICE experiment has therefore been decomposed into a set of components and its functionality has been formally specified. The Trigger/DAQ/HLT simulation includes a model of the whole experiment and of the major sub-systems: Trigger, Trigger Detectors, Tracking Detectors, DAQ, HLT and Permanent Data Storage. The full simulation involves thousands of independent units representing the ALICE components and simulated in parallel. The performances of the existing prototypes of components have been measured and the results used as input parameters for the simulation program. The simulation allows the system behaviour to be tested under different conditions, and thus finding possible bottlenecks and alternative design solutions.
The simulation has, for example, been used extensively to verify that the Trigger, DAQ and HLT systems are able to preserve the majority of rare triggers that could be measured by the ALICE experiment. It has required the addition to the DAQ of a mechanism, that reserves enough detector lifetime to allocate periods of time to rare triggers.
The figures show the simulated evolution of three major parameters (top: LDC buffer occupancy, middle: trigger level 2 rate, bottom: fraction of bandwidth to mass storage) before and after (left and right columns) the addition of this mechanism for rare triggers. The ALICE Trigger and DAQ simulation program is based on the Ptolemy hierarchical environment, which is an open and free software tool developed at Berkeley.