ALICE Offline Analysis FAQs
Analysis (General) (15)
Robert Grajcarek asks:
What is the difference between tracks
1. passing the TPCOnlyTrackCuts and taking the global parameters
and
2. passing the TPCOnlyTrackCuts and taking the TPC only parameters constrained to SPD-Vertex (the new type of AOD-tracks introduced) ?
To my understanding in both cases you avoid to have the ITS-Phi-holes in the phi-distribution since you do not cut on any ITS-cluser, ITS-refit and so on. In case 1. you use the ITS to improve the parameters (Pt measurement, dca to vertex etc.) ONLY IN CASE it is accessible. So you might end up with tracks with different Pt and also spatial resolutions.
In case 2. you do not care about ITS-information but only take the parameters as measured by the TPC only, so you have only one class of tracks with one class of Pt- and spatial resolutions. Can you confirm? Or is this nonsense what I am writing?
Christian Klein-Boesing answers:
To my understanding in both cases you avoid to have the ITS-Phi-holes in the phi-distribution since you do not cut on anyITS-cluser, ITS-refit and so on. In case 1. you use the ITS to improve the parameters (Pt measurement, dca to vertex etc.) ONLY IN CASE it is accessible. So you might end up with tracks with different Pt and also spatial resolutions.
Exactly, here you make only a very loose selection, and you will end up with a mixed quality of tracks.
In case 2. you do not care about ITS-information but only take the parameters as measured by the TPC only, so you have only one class of tracks with one class of Pt- and spatial resolutions. Can you confirm? Or is this nonsense what I am writing? Thanks!
Correct, in the latter case you have the additional "cut" that the constraint has to work.
In both cases you have a rather loose selection of tracks, with potential for contamination at high p_T. So additional cuts should be applied on AOD level (which we are currently
testing).
Hermes asks:
In the macro AddTaskESDFilter.C used for the AOD production, there are these two lines trackFilter->AddCuts(esdTrackCutsTPCOnly);
if(enableTPCOnlyAODTracks)esdfilter->SetTPCOnlyFilterMask(128);
If I understand correctly, the last line is producing aod tracks with
TPC only cuts, but then, what is the difference between those tracks
and those that one would get if enableTPCOnlyAODTracks=kFALSE
using only trackFilter->AddCuts(esdTrackCutsTPCOnly)?
To phrase my question differently, what is purpose of setting enableTPCOnlyAODTracks=kTRUE in the filter?
In the analysis I use AliAODTrack::TestFilterBit(128);
Christian Kein-Boesing answers
for "enableTPCOnlyAOD = kFALSE" the esdfilter acts as usual, i.e. it flags the tracks which pass the specified cuts for the TPCOnlyTrackCuts ( adds the filterbit 128)
If it is true, the filter takes the selected esd tracks (those passing the filter with bit 128) and extracts the TPConly information from it. Then it constrains the TPConly track parameters to the SPDvertex and add a new AODtrack, which __only__ contains the new/constraint track information.
This is needed since the ESDFilter cannot know otherwise which trackparamters to take (default is the global track parameters), and one has to add a new track since the AODtrack can only contain one set of track parameters.
So if enableFillAOD == true. There will be the normal aod tracks with global track parameters (and bit 128 NOT set) and the aod tracks with TPC constraint track parameters and ONLY bit 128 set.
Alberto Pulvirenti asks:
In ESD events coming from MC productions, I can retrieve the AliGenEventHeader from the AliMCEvent and get the primary vertex position from it. In AOD events, how do I retrieve this information?
The Vertex and other MC Header information is stored in a separate branch "AliAODMCHeader"
Alberto Pulvirenti asks:
In ESD, once I find a TParticle in the stack, I know that the GetDaughter(Int_t) gives me the index of its daughters in the same stack. In an AliAODMCParticle I have also a GetDaughter() function which returns integers. How do I have to interpret them? They are the indexes in the same MC array of the daughters of this MC particle?
All mother and daughter indices refer to the position (index) in the TClonesArray of AliAODMCParticles.
Alberto Pulvirenti asks
If I am not wrong, in ESD tracks the GetLabel() function is the position of the corresponding particle in the stack, in order to be able to retrieve such particle and look at its properties (momentum, true PID, etc.)
In AOD tracks there is also a GetLabel(). Does it return the same number? I mean: when an ESD track is translated into its AOD equivalent, the label will remain the same? And in this case, how do I find the corresponding AliAODMCParticle in the MC array? Does they have the same label, or they are connected by something else (pointers, TRefs)
For AOD tracks GetLabel() returns you the index of the AODMCParticle in the TClonesArray.
1) AODs produced from MonteCarlo contain (a) only the events which pass
the physics selections, or (b) they contain all MC events with a flagfor those which pass the physics selection?
2) is there a way to retrieve, for each AOD the statistics for event
selection, triggering etc?
1. Yes, if these were run using the physics selection.
2. Yes, set in your steering macro:
aodInputHandler->SetCheckStatistics(kTRUE);
then, in your FinishTaskOutput, do the normalization of the histos using:
TH2 *histStat = (TH2*)inputHandler->GetStatistics();
This will get you the histo filled by the physics selection task, merged for all processed events in your input AODs
(Marco van Leeuwen)
It's best to always use IsPhysicalPrimary(). The IsPrimary function basically tells you whether the particle was input to the GEANT
simulation or generated by GEANT. So, all particles produced by Pythia will be called 'Primary'. IsPhysicalPrimary uses our common
definition (all prompt particles, including strong decay products, plus weak decay products from heavy flavour).
In order to optimize the TPC PID cut, I usually adopt two different fiducial zones depending on the total momentum at the TPC inner wall. If this is smaller than a threshold, my cut is 5sigma, otherwisw is 3sigma. This helps in cleaning a litle bit the signal in the low momentum region, where the background is larger. Is there the possibility to retrieve this value from the AODs? Apparently not, but you understand that if I want to check a momentum against a TPC PID signal, I must have that, otherwise I am biased by the energy loss through all the stuff which is there between the vertex and the TPC inner wall.
(Andreas Morsch):
AliAODPid *pidObj = track->GetDetPid();
Double_t mom = pidObj->GetTPCmomentum();
I saw the AliAODpidUtil class which returns to me the nsigmas for TPC, ITS and TOF. Now, for getting these values using the ESD tracks, for each detector there is a specific machinery, since I have to initialize the corresponding PID response with the correct parameters:
- for the TPC I do this directly, passing the necessary parameters to the Bethe-Bloch function which is then used for computing nsigma, even if I cannot control the reference value of this sigma;
- for the ITS I must just tell the PIDResponse object if I am running on data or on MC, since probably this sets some parameters in a different way;
- for the TOF I must initialize an AliTOFcalib and AliTOFT0maker which must recompute the T0, and then remake the PID in the whole ESD, after having specified the assumed resolution for the TOF and eventually (for MC) initialized the T0 properly: this computes a different sigma for each TOF-matched track
Now, is all of this automatically done in the AODpidUtil? I mean, is this by default correctly initialized to give me the number of sigmas correctly, or I have to do the same pre-procesing I usually do on ESD tracks before having the properly initialized PIDResponse objects and then compute correctly these nsigmas?
(Andreas Morsch): Members of the class AliAODpidUtil are Ali*PIDResponse, one for every detector. This means that you can get them and set whatever you need (as the Bethe-Block parameters, if you need different values than standard ones). Concerning TOF, this should be done in the tender, so it should be already recomputed in the AOD itself, that means that the TOF signal should be already the correct one.
I found a method in AOD tracks which is DCA(), but I don't really understand what DCA it is. What I need is retrieving the transvers DCA (in rphi plane) on which I want to apply a cut, while I don't cut on the longitudinal DCA. How can I retrieve this quantity? Is the AliAODTrack::DCA() returning this, or not?
(Andreas Morsch): You should use the method:
PropagateToDCA(const AliVVertex *vtx, Double_t b, Double_t maxd, Double_t dz[2], Double_t covar[3]);
It allows you to calculate the DCA with respect to any vertex. maxd is the max estimative value for the transverse impact parameter, it should be set to something big.
I saw that each AOD track has a pointer to a possibly vertex from which it was produced. If I want to detect if a track is a kink daughter, is that enough that I retrieve its production vertex with track->GetProdVertex() and, if this is not NULL (which I assume means that track is considered primary), then I check if this is of type 'kink' and reject the track in this case? Or the procedure is more complicate in some sense?
(Andreas Morsch): This procedure is correct.
In the ESD tracks, I can retrieve separately the TPC chi2 and the ITS chi2. Then, for TPC+ITS tracks, I cut on the TPC chi2 / TPC number of clusters, where for the ITS standalone tracks I cut on the ITS chi2 / ITS number of clusters. Instead, it seems that in AOD track there is only one information on chi2 per NDF. How should I interpret it? How can I do this kind of cut for the different kinds of tracks (I mean, TPC and ITS-SA)?
(Andreas Morsch): AOD has only the TPC Chi2 per dof.
Is that correct to use the TESTBIT macro to know if a specific layer in ITS or TPC was hit by the track (this is useful to check if it has SPD clusters in it, which I would require)?
(Andreas Morsch): Yes.
Is that correct to use GetTPCNcls() and GetITSNcls() to count how many clusters are in the TPC and/or ITS part of a track?
(Andreas Morsch): GetTPCNcls() returns the number of bits set in the fTPCClusterMap, i.e. the number of pad rows crossed, similarly for GetITSNcls().
To check the AOD tracks, should I need the same flags used in AliESDtrack, with the same meaning for each one (kTPCin, kITSrefit, and so on), or there is some different reference for them?
(Andreas Morsch): Yes, you have the same status flags.
GRID certificate (1)
What you need to do in order to obtain a grid certificate is described in User Registration. You have to complete the all 5 steps of the registration procedure. Detailed instructions for CERN users can be found here.
You send a job that is going to be splited in many subjobs and you request the Outputdir somewhere in your home directory. So I was expecting a feature as you get in /proc///..... but you just get in your outputdir a copy of the output from the first (or last) subjob that finishes. In /proc you get anyway the nice structure with the subjobs, but since it is a temporal place and you dont get any mail when the job has finished the most probable is that you loose your output (it is cancel after a while).
(Yves Schutz): use the alien counter #alien_counter_s# in the definition of the Output directory, e.g. OutputDir="/alice/cern.ch/user/s/schutz/analysis/output/QA/$1/#alien_counter_s#"
(Panos Christakoglou): It is part of the metadata fields at the run level (metadata associated to the file catalog): generator, version, parameters etc.
(Panos Christakoglou): I would suggest you to try to run it on 1M events by splitting your batch job accordingly. In your jdl you should also define the field SplitMaxInputFileNumber in such a way so that every sub-job that will be created, will analyze a small number of files (~100). One other piece of advice would be the following: try to define in your executable (stored inyour AliEn $HOME/bin) something like export XROOTD_MAXWAIT=10. This you do in case you don't want to wait for an infinite amount of time for a requested file to be staged.
(Yves Schutz): Examples on how to implement an analysis task can be found in $ALICE_ROOT/ESDCheck.
(Jan Fiete Grosse-Oetringhaus): The base classes to be used are AliSelector that can be used for analyses that only accessed the ESD, and AliSelectorRL that can be used to access the RunLoader, Kinematics, MC Header etc. An empty sceleton that can be adapted for your own analysis can be found in PWG0/AliEmptySelector.h/.cxx.
(Yves Schutz): You can best do that using the ROOT API. Have a look at the macros in AnalysisGoodies.C and an example of TSelector esdAna.C and esdAna.h.
Vertex (6)
In the ESD event I can choose among 3 different primary vertices: tracks, SPD, TPC; in the AOD event, instead, I have only the possibility to get a default PV with AliAODEvent::GetPrimaryVertex(), and (maybe) the SPD one with AliAODEvent::GetPrimaryVertexSPD(). Does this follow the same philosophy used in AliESDEvent, in the sense that if the PV with tracks is good, then GetPrimaryVertex() returns that, otherwise, if SPD vertex is good, then ite returns SPD PV, otherwise it returns TPC PV?
(Andreas Morsch): AliAODEvent::GetPrimaryVertex() and AliESDEvent::GetPrimarVertex() give the same results.
I am not able to find in the ESD one information which I think is extremely important: what and where is the first MEASURED point of a track. I see that parameters are offered at various positions and extrapolations can be made. But it is still essential, sometimes, to know where the first used cluster lays. For example, for a track nicely extrapolated back to the vertex, I would like to know if the first point is on the innermost layer of the ITS or only on the 2nd or the 3rd ..... Same for tracks of particles which have been produced inside the TPC.
(Yuri Belikov): Yes, this is useful information. In principal, there is the way to recalculate this information. Or, you can figure it out looking at the stored "Track Points". This is not very "natural", but the problem is that direct storing it in the ESD would additionally increase the size of the ESD, and we are already above the limit.
ALiExternalTRackParam offers the various PropagateTo and PropagateToDCA . These offer then more information than the pure impact parameters. Any reccomandation here, what to use best? Are there other methods? Which are the most used?
(Yuri Belikov): PropagateTo propagates this track to an "reference plane" given by the argument. The most typical case when you need it is the tracking. The two PropagateToDCA methods propagate this track to the Distance of Closest Approach either to an arbitrary vertex (given by the argument), or to an arbitrary track (given by the argument). The most typical case when it is used is the secondary vertex reconstruction.
The methods GetImpactParameters return the impact parameters in xy and z (and their errors) referred to what? If RelateToVertex has not been called, is it referred to (0,0,0) and after it has been called, is it to the actual vertex used there?
(Yuri Belikov): Referred to the primary vertex stored in the same ESD. (With the current version, the vertex reconstructed with the SPD). RelateToVertex is anyway called in the reconstruction, with the pointer to reconstructed SPD vertex.
AliESDtrack::RelateToVertex(const AliESDVertex *vtx, Double_t b, Double_t maxd) tells me only if the tracks has a distance from the vertexless than maxd or not.
(Yuri Belikov): Not only. The methods does much more. It gives you the possibility to "re-attach" an ESD track to an arbitrary vertex. You can create your own vertex, pass the pointer to it to the method, and the method will propagate this track to DCA to your vertex and try to "constrain" this track to this vertex. (The method is not const ! It changes the track parameters). If, for some reason, this is not possible, the method returns kFALSE.
Which primary? In AliESD I see GetPrimaryVertex (primary from ESD tracks) and GetPrimary (from SPD). Which one should we use nowadays? (PDC06, for example, pp with standard reconstruction). In AliESDVertex how do I get the coordinates X, Y, Z?
(Yuri Belikov): This is exactly like you are saying. There are two "versions" of the primary vertex stored in the ESD. One is reconstructed using the SPD RecPoints (before the tracking), and another one is reconstructed using the ESD tracks (after the tracking). Potentially, the first "version" of the vertex has somewhat worse precision (as compared with the second option), however the reconstruction efficiency of the first method is, again potentially, somewhat higher. The choice is yours.
Kinematics Tree (1)
There has been a production of jet-kinematics in the PDC06, only 2 files were generated: galice.root and Kinematics.root. I would like to extract the header and kinematics information using the TSelectors in Alien.
(Jan Fiete Grosse-Oetringhaus): Look at AliSelectorRL in the directory PWG0 of a HEAD AliRoot. It accesses header as well as the particle stack.
Data Challenge (1)
(Andrei Gheata): The documentation tells you that that you have to call PostData in UserExec after filling your histograms. Note that PostData is not like TH1::Fill. It just makes the pointer to your output data (which in fact never changes) available so that the framework knows to write this data to the output file(s). PostData does also a notification that make all the client tasks of a given output container active (so that their Exec gets called). Failing to ever call PostData() will typically end-up in missing your output file (or your folder in it). Posting data during execution is required indeed only for those events which are interesting for the task (or its possible client tasks). Now there is a symptom that can happen since we use cuts (namely the physics selection), especially in grid. A given subjob may be assigned with a data sample that contains no interesting events, therefore the task UserExec is never called (neither PostData). If this task runs alone what happens is that the job is not validated since there is no output file. If the task runs in a train, nothing will be visible runtime because the task just fails to fill its folder in the common output file. Still there will be jobs that did selected events and produced output. The analysis will die in this case during merging, since the file merger fails to match the data from different output files. To cure this feature, we had to change the policy so that the output object DO get written even if no event was selected. That is: better empty histograms than no histograms. The simplest way to achieve is to call PostData for every output slot of your task at the end of UserCreateOutputObjects (which is called once per session on the worker). All tasks must do this systematically.
Grid-related (1)
(Latchezar Betev) The long version with all options is available here :
http://alien2.cern.ch/index.php?option=com_content&view=article&id=52&Itemid=100#Splitting_jobs
(documentation section on http://alien.cern.ch)
-> The short version is as follows:
1. The predominantly used option in user analysis is 'splitting by SE', (Split = "SE";) whereby the AliEn Job Optimizer is splitting the master jobs in chunks which match the data distribution at the site(s) which have replicas of the input files.
2. The limiting argument SplitMaxInputFileNumber = "x"; instructs the optimizer to use maximum 'x' files for a given sub-job. This argument should be used only if the user knows why the splitting should be limited.
3. For the majority of the current analysis jobs,'x' can be large or better yet, the argument should not be used altogether.
(Andrei Gheata): The "SE" option is the default for the alien plugin, while the SplitMaxInputFileNumber is set by default to 100. For no limitation of the number of files, one can in principle give a big number as argument to: plugin->SetSplitMaxInputFileNumber(nfiles). Limitations of the number of files to be processed per subjob may be needed if:
- the time to process these files exceeds TTL (time to live). Putting a TTL bigger that 12*3600 seconds will not help - from experience, the success rate goes significantly down beyond 6 hours.
- the memory leaks cumulate and you start getting watchdog messages - in this case one should really start looking at the code.
Alien plugin (1)
What is the logic which is behind the usage of the plugin in "terminate" mode? How does the system know which files should be merged?
(Andrei Gheata): The "terminate" mode should be run using exactly the same analysis macro as in "full" mode but using: plugin->SetRunMode("terminate")
Note that when you quit the alien shell opened by the "full" mode, the plugin will try to go through the "terminate" phase anyway. This may lead to potential problems: if at least one of the subjobs get done, the plugin will merge their outputs and you will get a partial result. Then, when running in terminate mode these files will be skipped. One should, before running in "terminate" mode, make sure that no output file of the analysis is present in the current directory (!) . To have this automatically done and avoid surpises, one should configure the plugin using:
plugin->SetOverwriteMode(kTRUE)
Note that this has recently became the default. The plugin knows what output files your analysis should produce (if you used SetDefaultOutputs, it loops the files pointed by the output containers connected to the analysis manager) and where the output directory in grid is so it can check in "terminate" phase their location using the 'find' command and merge them using a file merger. It does not matter how many runs you processed or the splitting parameters. This can only produce a different number of files to merge, sometimes too big to merge locally. For such case the plugin has the default behavior to resume merging after a failure, but it is recommendable to instruct it to merge via jdl:
plugin->SetMergeViaJDL();
This will run one merging jdl per masterjob, that you can tune via: SetNrunsPerMaster(). The problem with that is currently that there is no "supermerge" that is done at the final stage to merge those. It currently works correctly only if there is only one master job. Still under development.
