Skip to content

Extending DAPHNE with More Scheduling Knobs

This document focuses on how a DAPHNE developer may extend DAPHNE by adding new scheduling techniques.

Guidelines

The DAPHNE developer should consider the following files for adding a new scheduling technique:

  1. src/runtime/local/vectorized/LoadPartitioning.h
  2. src/api/cli/daphne.cpp

Adding the actual code of the technique:

The first file LoadPartitioning.h contains the implementation of the currently supported scheduling techniques, i.e., the current version of DAPHNE uses self-scheduling techniques to partition the tasks. Also, it uses the self-scheduling principle for executing the tasks. For more details, please visit Scheduler design for tasks and pipelines.

In this file, the developer should change two things:

  1. The enumeration SelfSchedulingScheme. The developer will have to add a name for the new technique, e.g., MYTECH

    enum SelfSchedulingScheme { STATIC=0, SS, GSS, TSS, FAC2, TFSS, FISS, VISS, PLS, MSTATIC, MFSC, PSS, MYTECH };
    
  2. The function getNextChunk(). This function has a switch-case-statement that selects the mathematical formula that corresponds to the chosen scheduling method. The developer has to add a new case to handle the new technique.

    uint64_t getNextChunk(){
       // ...
       switch (schedulingMethod){
           // ...
           // only the following part is what the developer has to add. The rest remains the same
           case MYTECH:{ // the new technique
               chunkSize= FORMULA; // some formula to calculate the chunk size (partition size)
               break; 
           }
           // ...
       }
       // ...
       return chunkSize;
    }
    

Enabling the selection of the newly added technique:

The second file daphne.cpp contains the code that parses the command-line arguments and passes them to the DAPHNE compiler and runtime. The developer has to add the new technique as a valid option. Otherwise, the developer will not be able to use the newly added technique. There is a variable called taskPartitioningScheme and it is of type opt<SelfSchedulingScheme>. The developer should extend the declaration of opt<SelfSchedulingScheme> as follows:

opt<SelfSchedulingScheme> taskPartitioningScheme(
        cat(daphneOptions), desc("Choose task partitioning scheme:"),
        values(
            clEnumVal(STATIC , "Static (default)"),
            clEnumVal(SS, "Self-scheduling"),
            clEnumVal(GSS, "Guided self-scheduling"),
            clEnumVal(TSS, "Trapezoid self-scheduling"),
            clEnumVal(FAC2, "Factoring self-scheduling"),
            clEnumVal(TFSS, "Trapezoid Factoring self-scheduling"),
            clEnumVal(FISS, "Fixed-increase self-scheduling"),
            clEnumVal(VISS, "Variable-increase self-scheduling"),
            clEnumVal(PLS, "Performance loop-based self-scheduling"),
            clEnumVal(MSTATIC, "Modified version of Static, i.e., instead of n/p, it uses n/(4*p) where n is number of tasks and p is number of threads"),
            clEnumVal(MFSC, "Modified version of fixed size chunk self-scheduling, i.e., MFSC does not require profiling information as FSC"),
            clEnumVal(PSS, "Probabilistic self-scheduling"),
            clEnumVal(MYTECH, "some meaningful description to the abbreviation of the new technique")
        )
); 

Usage of the new technique:

DAPHNE developers may now pass the new technique as an option when they execute a DaphneDSL script.

daphne --vec --MYTECH --grain-size 10 --num-threads 4 --PERCPU --SEQPRI --hyperthreading --debug-mt my_script.daphne

In this example, DAPHNE will execute my_script.daphne with the following configuration:

  1. the vectorized engine is enabled due to --vec
  2. the DAPHNE runtime will use MYTECH for task partitioning due to --MYTECH
  3. the minimum partition size will be 10 due to --grain-size 10
  4. the vectorized engine will use 4 threads due to --num-threads 4
  5. work stealing will be used with a separate queue for each CPU due to --PERCPU
  6. the work stealing victim selection will be sequential prioritized due to --SEQPRI
  7. the rows will be evenly distributed before the scheduling technique is applied due to --pre-partition
  8. the CPU workers will be pinned to CPU cores due to --pin-workers
  9. if the number of threads were not specified the number of logical CPU cores would be used (instead of physical CPU cores) due to --hyperthreading
  10. Debugging information related to the multithreading of vectorizable operations will be printed due to --debug-mt