Why isn’t Semgrep reporting all my tainted data flows?

One of the reasons behind seeing fewer than expected tainted data flows could be the principle of reporting on shortest paths only.

By default, Semgrep reports a tainted source-sink permutation only once and reports the data flow that traverses the shortest path. Any longer paths with the same source-sink combination are not shown.

Analysis of two tainted data flows

Take a look at these two examples:

Call stack 1:

File2 function2(sourceA) 
      >> File1 function1(sourceA/sinkB)

Call stack 2:

File 1 function4(sourceA) 
       >> function3(sourceA) 
          >> function2(sourceA) 
             >> function1(sourceA/sinkB)

Interfile analysis

If both tainted data flows are identified in the same scan, and the scan has interfile analysis enabled (--pro, or Pro Engine enabled in the Cloud Platform), only Call stack 1 is reported as a finding. It has a shorter path, and has the same sourceA -> sinkB taint.

This speeds up triage by ensuring you are only reviewing unique findings. It's especially useful for languages with polymorphic classes that can add noise for a singleton taint.

Intrafile analysis

If only intrafile / interprocedural analysis is performed (--pro-intrafile), Semgrep only reports a finding for call stack 2. Call stack 1 would not be identified, because it crosses file boundaries.

Best practices for testing tainted data flows

To understand in greater detail how Semgrep detects tainted data flows, you can use your own test cases to review different paths.

Dry runs

To avoid sending test data to Semgrep AppSec Platform and potentially confounding existing findings, use semgrep scan or semgrep ci --dry-run.

When testing locally, adding --dataflow-traces allows you to see the taint traces as you would in the Semgrep AppSec Platform UI.

Sample taint data-flow reporting

The following is an example that shows dataflow traces traversing multiple files, demonstrating interfile taint tracking:

test2.java
    test-spring-insecure-bean-validation
     Passing user input to context.buildConstraintViolationWithTemplate() function may lead to
     execution of arbitary commands.

      8┆ context.buildConstraintViolationWithTemplate(template).addConstraintViolation();

      Taint comes from:
       test1.java
      128┆  @Override
      129┆
      130┆   public boolean isValid1234(MessageParticipantsDto messageParticipantsDto,         
ConstraintValidatorContext context) {

      Taint flows through these intermediate variables:
       test1.java
      130┆  public boolean isValid1234(MessageParticipantsDto messageParticipantsDto,
 ConstraintValidatorContext context) {

      This is how taint reaches the sink:
       test1.java
      156┆      return ValidationUtil.buildTemplate(context, templateList);
      Taint flows through these intermediate variables:
       5┆ public boolean buildTemplate(ConstraintValidatorContext context, List<String> templateList) {
      then reaches:
       8┆ context.buildConstraintViolationWithTemplate(template).addConstraintViolation();

Changing Pro analysis options

You can change whether you are using the --pro or --pro-intrafile option depending on the exact flow you're testing, as described in the preceding section, Analysis of two tainted data flows.

Altering the paths

To ensure you see all the flow options, start by testing the shortest path and then change the code so that the shortest path is no longer tainted. Repeat this until you have achieved the longest taint flow path you want to test.

Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.

Analysis of two tainted data flows​

Interfile analysis​

Intrafile analysis​

Best practices for testing tainted data flows​

Dry runs​

Sample taint data-flow reporting​

Changing Pro analysis options​

Altering the paths​