Data analysis.

Alibava GUI can store the data with 2 different formats. The first one is a binary file with a proprietary format which is there for historical reasons. The second data format uses HDF5 which can easily be read from python, Matlab or Octave. The following sections describe the two different formats.

8.1. The Alibava Data Format

8.1.1. Binary Data format

The data is stored in binary form. However, the format of the data files is quite simple and it is shown in Table 2. For the sizes used in the tables we follow the convention:

uint32

An unsigned 32 bit integer

uint16

An unsigned 16 bit integer

int16

A signed 16 bit integer

int32

A signed 32 bit

char

An 8bit character (1 byte)

Table 2Data Format
Data size and type Meaning
uint32 Time of start of run
int32 Run type. The run type can have various values:
  1. Calibration run
  2. Laser Sync.
  3. Laser
  4. Rad. source
  5. Pedestal
uint32 Header length (header_length)
header_length * char Header data. The header data contains some information that is useful when analyzing the data. The header is stored as an ASCII string and the format is:
  • In the case of calibration of laser sync:

    • Vn.n|npts;from;to;step
  • In the case of laser or rad. source:

    • Vn.n|num_events;sample_size
256 * double (32 bit) Pedestals (ADC units)
256 * double (32 bit) Noise (ADC units)
Datablock Following the overall header of the file describing the parameters of the alibava run there are a number of DataBlocks each containing specific information. All the data blocks have the same structure, which is described in Table 4. The possible DataBlocks are:
  • NewFile
  • StartOfRun
  • DataBlock
  • CheckPoint
  • EndOfRun

The file data has an overall header, containing the running parameters of Alibava and then a series of data blocks. The data blocks have all the same format, which is described in Table 3. The data itself is one of those data blocks and is the only one which is always written by alibava-gui. The rest are only written when the user activates a plugin and any of the methods returns a data buffer.

Table 3Format of a data Block
Data size and type Meaning
uint32: 0xcafennnn Header of the data block. nnnn is the data block type. The different types can be:
  1. NewFile.
  2. Start of Run
  3. Data
  4. Check Point
  5. End of Run
uint32 The size in bytes of the block data
size * char The block data.

Only the Data block has a fixed format, given by Alibava. The format of the other blocks depends on the plugin activated by the user. The format of the Data block in show in Table 4

Table 4Format of the Data block
Data size and type Meaning
0xcafe0002 The block data
522 The size of the block data
uint32 Clock counter since the last MB reset. The clock is around 40 MHz but for an accurate value it should be calibrated with a pulse generator used as trigger.
uint32 Time as read in the TDC. T = 100.0*(ipart + (fpart/65535.)) where
ipart

( X & 0xFFFF0000)>>16

fpart

sign(ipart)*( X & 0xFFFF)

uint16 Coded Temperature ( T = 0.12*X-39.8)
256 * uint16 The ADC values of the 256 channels
double (32 bit) An extra value that corresponds to the scanned variable in the predefined scans: Calibration (charge) and Laser synchronization (delay)

An example on how to deal with the data can be found in AsciiRoot.cc in the root_macros folder.

8.1.2. The HDF5 data format

In HDF5 the data is structured in groups each having different information. There are 2 main groups. The header group contains general information about the run. It has the setup attribute that specifies the type of run and some other useful information, like the time of the acquisition. It also contains the pedestals and noise of the active channels. The events group has four tables with contain the data collected for each event: the value on each channel in signal, the time given by the TDC (see Figure 6), the temperature measured and a sort of timestamp as a 40MHz clock counter since the last reset of the mother board. See Figure 25.

Figure 25HDF5 file data format
In the Calibration or Laser Scan runs the scan group contains the points at which the scanned values change as well as the description of the scan. Have a look at HDFRoot.cc which provides the data class that handles the hdf5 data.

8.2. Analysing the data

By knowing the data format you can write your own program to analyze the data in your preferred language. This is the recommended way since the Alibuava examples cannot know the particularities of the end-user sensor and data. However, Alibava provides a collection of root macros (still evolving) to read the data files and produce histograms. The root macros are in the root_macros folder of the alibava distribution. If you have ROOT already installed during the alibava installation, you will find, at the end of the installation process the ROOT libraries in INSTALL_DIR/lib/alibava/root. INSTALL_DIR is usually /usr/local unless you specify it differently as explained in Appendix C ― Installing the software.

If you are not planning to modify the source code of the root macros you can use those libraries. To do so, you will need in your working directory a rootlogon.C file that loads them when root is initialized from within that directory. It could look like the one showed in Example 7

Example 7rootlogon.C for using precompiled ROOT libraries

#define DYNPATH "INSTALL_DIR/lib"
#define INCPATH "INSTALL_DIR/include/alibava/root"

void SLload(const char *lnam)
{
   if ( gSystem->Load(lnam) )
      cout << ":> " << lnam << " NOT loaded " << endl;
   else
      cout << ":> " << lnam << " loaded " << endl;

}

void rootlogon()
{
    // Add the library folder in the dynamic path so that ROOT finds
    // the library
    TString ss = gSystem->GetDynamicPath();
    gSystem->SetDynamicPath(ss+":"+DYNPATH);

    // Add the Alibava include path in the ROOT include path so that
    // you can include Alibava header files in your own macros
    gInterpreter->AddIncludePath(INCPATH);
    
    // Load the library
    std::cout << "============================================================" << std::endl;
    SLload("libAlibavaRoot.so");
    std::cout << "============================================================" << std::endl;

    // This is cosmetics
    gROOT->SetStyle("Plain");
    gStyle->SetPalette(1);
    init_landau();
}

If you want to make modifications to the source of the ROOT macros you will need to run make on the root_macros folder and, eventually, make install to install the "modified libraries". You can just copy the libAlibavaRoot.so (libAlibavaRoot.dylib in MAC OSX) in a place where ROOT can find it.

In any of the two cases, the best is to start executing a the sin_preguntas function that will do almost everything for you.

Example 8The make-all-for-you function prototype
void sin_preguntas (DataFileRoot *A,
                    const char
              *data_file0,
                    const char*
              cal_file0,
                    const char
              *ped_file0,
                    int
              polarity0,
                    bool
              dofittrue,
                    int
              tcd05,
                    int
              tdc115);

where the arguments have the following meaning:

A

a pointer to a user supplied DataFileRoot (or descendant) object. Usual implementations are AsciiRoot, to interpret the data with the binary data as described in Section 8.1.1 ― Binary Data format, or an HDFRoot object to interpret the HDF5 data described in Section 8.1.2 ― The HDF5 data format. One can also inherit from any of these two to interpret the data produced by a user defined plugin. See Section 8.3 ― The DataFileRoot class. The easiest way to get the pointer is with the static DataFileRoot method OpenFile with is able to determine the file type and creates the proper class pointer.

DataFile
                  *DataFileRoot::OpenFile (const char
                  *file_path,
                                                    const char
                  *pedfile0,
                                                    const char
                  *gainfile0);

data_file

The path of the data file to be analyzed. If NULL, the current file in A will be used

cal_file

The path of a calibration file. It can be an Alibava data file produced during a calibration run or an ASCII text file with as many lines as channels with gain and offset in each line. If you do not have this file, set 0 here. The only difference is that if the calibration file the histogram units will be in electrons. Otherwise they will be in ADC units.

ped_file

compute pedestals or an ascii text file with as many lines as channels and pedestal and noise for each channel. If no file is given, sin_preguntas will use the data file to compute pedestals.

polarity

this is the expected polarity of the signal (or the bias voltage): -1 for negative signals and +1 for positive signals.

dofit

this is a boolean that specifies whether the program should try to fit a landau to the signal histogram. If true is given it will do the fit.

tdc0, tdc1

Define a time window around the peak of the pulse shape to produce the signal histogram

In any case you should have a look there to see how the data is handled in the usual cases. Have a look at analysis.cc to see how the DataFileRoot class is used and how data is analyzed in the examples provided.

Do not forget that sin_preguntas is just an example that assumes that you are reading silicon strip sensors. For your particular setup, you may need to handle the data differently.

8.3. The DataFileRoot class

In the root_macros folder you will find a number of example files to analyze the data. They do not intend to be a standard but just examples. At least this is how they were born, though they have been evolving and, as of today, they are too complicated an example. However the DaraFileRoot class can still serve as a good tool to read the files and to access the current data to make your own analysis.

Most of the methods in DataFileRoot are applied indistinctible to all the channels in a chip of the DB. However, some of them can be applied just to a set of channels. These sets or regions are defined with the ChanList class. This class is described in Example 9.

Example 9The ChanList class definition
class ChanList
{
public:
    ChanList(const char * list_def = 0);

    void Set(const char * list_def);
// Channel getter/setter functions

const:
    int Nch() const;
    int Chan(int i) const;
    int operator[](int i) const;
// Hit getter/setter functions

:
    void add_hit(const Hit & h);
    bool empty() const;
    int nhits() const;

:
    void clear_hits();
    Hit & get_hit(int i) const;
    double CommonMode() const;
    double Noise() const;
}

A ChanList is defined by string which contains channel numbers or channel ranges separated by commas. For instance "1,2,10-20" creates a channel list containing channel 1, 2 anf channels from 10 to 20, both included. A ChanList may also contain an array of Hit objects that represent the clusters found in this region. The clustering methods of DataFileRoot. The Hit class is defined in Example 10. A Hit represents a cluster, with the center strip ( the one with highest amplitude), the channel number of the left and right limits and the signal.

Example 10The Hit class definition
class Hit
{
:
    Hit(int center = 0, int left = 0, int right = 0, double signal = 0);

:
    ~Hit();

const:
    int center() const;
    int left() const;
    int right() const;
    int width() const;
    double signal() const;
}

The DataFileRoot class definition is shown in Example 11. Only a few methods are show here. For the complete definition of the class, please look in DataFileRoot.h.

Example 11The AsciiRoot class definition
class DataFileRoot
{
public:
    AsciiRoot(const char * data_file);

    ~AsciiRoot();

    enum BlockType = {NewFile=0, StartOfRun, DataBlock, CheckPoint,
            EndOfRun};

    bool valid();
    void open(const char * data_file);
    void close();
    void rewind();
    int read_event();
    void process_event(bool do_common_mode = true);
// Plugin extra data Blocks
    virtual void new_file(int size, const char * data);
    virtual void start_of_run(int size, const char * data);
    virtual void check_point(int size, const char * data);
    virtual void new_data_block(int size, const char * data);
    virtual void end_of_run(int size, const char * data);
    void set_data(int size, const unsigned short * data);
// Analysis methods
    TH2 * compute_pedestals(int mxevts = -1, bool do_cmmd = true);
    void compute_pedestals_fast(int mxevts = -1, double ped_weight = 0.01, double noise_weight = 0.001);
    void load_pedestals(const char * file_name);
    void save_pedestals(const char * file_name);
    void load_gain(const char * file_name);
    void load_masking(const char * file_name);
// Anaylsis in strip regions
    int n_channel_list();
    void add_channel_list(const ChanList & C);
    void clear_channel_lists();
    ChanList get_channel_list(int i);
    void find_clusters(const ChanList & C);
    void common_mode(const ChanList & C, bool correct = false);
// Debugging methods
    void spy_data(bool with_signal = false, double t0 = 0, double t1 = 0, int nevt = 1);
    TH1 * show_pedestals();
    TH1 * show_noise();
}

By default, DataFileRoot only reads the DataBlock which is the only that has a more or less defined format. If the user has created other data blocks with a user-defined plugin, then he/she will have to define a class which derives from AsciiRoot and implements the methods that receive the data from those extra blocks. Those methods are explained below

:
    AsciiRoot(const char * data_file);

The constructor. data_file is the path of the data file.

public:
    int read_event();

Call this method to read the next event in the file. It will return 0 in case of success and non zero otherwise. The usual way to use it is by looping while read_event returns 0.

public:
    void process_event(bool do_common_mode = true);

By calling this method, DataFileRoot will remove pedestals from the raw data and if specified in the input argument it will correct for common mode. The usual procedure is shown in the example below.

DataFileRoot *data;

...

while ( data->read_event() == 0 )
{
   // Remove pedestals and common mode
   process_event();

   // Find clusters, analyze the data, etc.

}

public:
    void new_file(int size, const char * data);

This method is called whenever a NewFile block is found on the file. The arguments are the size of the block data and the data itself (see Table 3).

public:
    void start_of_run(int size, const char * data);

This method is called when a StartOfRun block is found on the data file. The arguments are the size of the block data and the data itself (see ).

public:
    void check_point(int size, const char * data);

This method is called when a CheckPoint block is found in the data file. The arguments are the size of the block data and the data itself (see Table 3).

:
    void new_data_block(int size, const char * data);

This method is called when a DataBlock is found in the data file. The main use of this method is to decode the event data when a Plugin::filter_event method ( see Example 1) has modified the default data format during the acquisition. The arguments are the size of the block data and the data itself (see Table 4). This method should call set_data in order to set the active channels and their ADC values.

Note that when you change the default format in the DataBlock, the pedestal and noise values stored in the file loose their meaning and you will have to recompute them with compute_pedestals or compute_pedestals_fast

:
    void end_of_run(int size, const char * data);

This method is called when an EndOfRun block is found in the data file. The arguments are the size of the block data and the data itself (see Table 3).

:
    void set_data(int size, const unsigned short * data);

This method should be used when the user has modified the DataBlock format. You should provide the number of channels (size) and an array with the ADC values (data)

:
    void load_pedestals(const char * file_name);
:
    void save_pedestals(const char * file_name);

load/save pedestals from/to a file. The file is a simple ASCII file, each line containing the pedestal and noise values of a channel. Line i corresponds to channel i.

:
    void load_gain(const char * file_name);

Load the gain factors (ADC counts to electrons) of the channels. The input file is an ASCII file, each line containing the channel number followed by the gain value.

:
    TH2 * compute_pedestals(int mxevts = -1, bool do_cmmd = true);

This method computes the pedestals in the usual way. What it does is to produce, for each channel, a histogram with all the ADC values and fit a gaussian to the peak with the lowest mean. The pedestal and noise of that channel will be the mean and the sigma of the gaussian fit. It returns a 2D histogram showing the distribution of all the channels. The method parameters are:

  • mxevts: number of events to use in the pedestal calculation. If negative, then all the events in the file will be used.
  • do_cmmd: if set to true, the algorithm will make common mode subtraction on an event by event basis.
:
    void compute_pedestals_fast(int mxevts = -1, double ped_weight = 0.01, double noise_weight = 0.001);

This method computes the pedestals with a somewhat different algorithm than compute_pedestals. It tries to follow any change of the pedestal and the noise of the channels and updates their values. It is the method that alibava-gui uses to monitor the data during the acquisition. For analysis one should use compute_pedestals.

public:
    void find_clusters(const ChanList & C);

This method finds the clusters in a given event. The method will store the clusters as Hit objects in the ChanList given as input. Note that the hits found will be appended to the array so that you might need to clear the hit list before calling find_clusters.

The algorithm to find the clusters is very simple. It starts by searching the channel with highest amplitude. If the channel signal over noise ratio is higher than a given value, the seed cut, then we append the neighbours at the right and left whose signal over noise value is above certain value, the neighbour cut. The procedure is repeated until no channel is found above the seed cut.

:
    void spy_data(bool with_signal = false, double t0 = 0, double t1 = 0, int nevt = -1);

This method is very useful to debug the data. It shows a pannel of histograms for a single event, like the raw data, processed data, common mode noise, found clusters, etc. If the first argument is true it will only show events with signal, skipping the events where no clusters have been found. The second argument is the number of events you want to see. The default is to show only one event at a time, but you could see as many as the number indicated.

For more information take a look at DataFileRoot.h and the source code in DataFileRoot.cc. In the test folder of the distribution bundle you will also find some examples.