Skip to main content

    Why isn’t Semgrep reporting all my tainted data flows?

    One of the reasons behind seeing fewer than expected tainted data flows could be the principle of reporting on shortest paths only.

    By default, Semgrep reports a tainted source-sink permutation only once and reports the data flow that traverses the shortest path. Any longer paths with the same source-sink combination are not shown.

    Analysis of two tainted data flows

    Take a look at these two examples:

    Call stack 1:

    File2 function2(sourceA) 
    >> File1 function1(sourceA/sinkB)

    Call stack 2:

    File 1 function4(sourceA) 
    >> function3(sourceA)
    >> function2(sourceA)
    >> function1(sourceA/sinkB)

    Interfile analysis

    If both tainted data flows are identified in the same scan, and the scan has interfile analysis enabled (--pro, or Pro Engine enabled in the Cloud Platform), only Call stack 1 is reported as a finding. It has a shorter path, and has the same sourceA -> sinkB taint.

    This speeds up triage by ensuring you are only reviewing unique findings. It's especially useful for languages with polymorphic classes that can add noise for a singleton taint.

    Intrafile analysis

    If only intrafile / interprocedural analysis is performed (--pro-intrafile), Semgrep only reports a finding for call stack 2. Call stack 1 would not be identified, because it crosses file boundaries.

    Best practices for testing tainted data flows

    To understand in greater detail how Semgrep detects tainted data flows, you can use your own test cases to review different paths.

    Dry runs

    To avoid sending test data to Semgrep AppSec Platform and potentially confounding existing findings, use semgrep scan or semgrep ci --dry-run.

    When testing locally, adding --dataflow-traces allows you to see the taint traces as you would in the Semgrep AppSec Platform UI.

    Sample taint data-flow reporting

    The following is an example that shows dataflow traces traversing multiple files, demonstrating interfile taint tracking:
         Passing user input to context.buildConstraintViolationWithTemplate() function may lead to
         execution of arbitary commands.

          8┆ context.buildConstraintViolationWithTemplate(template).addConstraintViolation();

          Taint comes from:
          128┆  @Override
          130┆   public boolean isValid1234(MessageParticipantsDto messageParticipantsDto,         
    ConstraintValidatorContext context) {

          Taint flows through these intermediate variables:
          130┆  public boolean isValid1234(MessageParticipantsDto messageParticipantsDto,
     ConstraintValidatorContext context) {

          This is how taint reaches the sink:
          156┆      return ValidationUtil.buildTemplate(context, templateList);
          Taint flows through these intermediate variables:
           5┆ public boolean buildTemplate(ConstraintValidatorContext context, List<String> templateList) {
          then reaches:
           8┆ context.buildConstraintViolationWithTemplate(template).addConstraintViolation();

    Changing Pro analysis options

    You can change whether you are using the --pro or --pro-intrafile option depending on the exact flow you're testing, as described in the preceding section, Analysis of two tainted data flows.

    Altering the paths

    To ensure you see all the flow options, start by testing the shortest path and then change the code so that the shortest path is no longer tainted. Repeat this until you have achieved the longest taint flow path you want to test.

    Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.