Skip to main content

Perform cross-file analysis

Use Semgrep Code's cross-file (interfile) analysis to detect vulnerabilities across files and folders.

By design, Semgrep open-source software (OSS) can only analyze interactions within a single function, also known as intraprocedural analysis. This limited scope makes Semgrep OSS fast and easy to integrate into developer workflows.

Semgrep Code runs cross-function (interprocedural) analysis by default, and gives security teams the the option to trade off speed for better results and deeper analysis with cross-file analysis. By analyzing interactions across files and functions, Semgrep Code can reduce noise, uncover new vulnerabilities, and make results easier to understand.

Language support

Refer to Supported languages to see languages supported by Semgrep Code.

Run cross-file analysis

This section guides you through installing the proprietary cross-file (interfile) analysis binary and helps you to scan your projects both in CLI and with Semgrep Cloud Platform (SCP).

Run cross-file analysis with Semgrep Cloud Platform

Prerequisite

You have completed a Semgrep core deployment.

This is the preferred method to run cross-file analysis. It enables you to view and triage your findings from a centralized location. Your source code is not uploaded.

  1. Sign in to Semgrep Cloud Platform.
  2. Click Settings.
  3. In the Deployment tab, click the Cross-file analysis toggle. Cross-file analysis toggle
  4. Ensure that you have the default ruleset added in your Policies page. If this ruleset is not added, go to Semgrep Registry - Default ruleset page, then click Add to Policy. For best results, set this ruleset to the Monitor rule mode.

Full scans now include cross-file analysis. You can trigger a full scan through your CI provider. Note that cross-file analysis does not currently run on diff-aware (pull or merge request) scans.

Run cross-file analysis in the CLI

Prerequisite
  1. Sign up or sign in to Semgrep Cloud Platform.
  2. For first-time users, click Create an organization. Note that you can further integrate organizations (orgs) with GitLab accounts and GitHub accounts, including personal and org accounts, after you complete this procedure.
  3. Click Settings.
  4. In the Deployment tab, click the Cross-file analysis toggle. Cross-file analysis toggle
  5. Ensure that you are in the root directory of the repository you want to scan.
  6. In your CLI, log in to your Semgrep Cloud Platform account and run a scan:
semgrep login && semgrep ci

Update cross-file analysis in the CLI

Cross-file analysis uses a separate semgrep binary. To update to the latest version, follow these steps:

  1. Update Semgrep OSS engine with the following command:

    brew upgrade semgrep

    Alternatively:

    python3 -m pip install --upgrade semgrep
  2. Log in to Semgrep Cloud Platform:

    semgrep login
  3. Update the Semgrep cross-file binary:

    semgrep install-semgrep-pro

Write rules that analyze across files and functions

To create rules that analyze across files and functions, add interfile: true under the options key when defining a rule. This key tells Semgrep to use the rule for both cross-function and cross-file analysis.

Cross-function example

The following example shows how to define the interfile key (see the Rule pane) and the resulting cross-function analysis in the Test code pane.


Click Run to see the true positive in lines 27-30.

Semgrep Code performed cross-function analysis as the userInput() source was called in main() while the exec() sink was called in the DockerCompose class.

Interact with the rule widget to compare Semgrep OSS and Semgrep Code. In the Rule pane, you can remove the lines:

options:
interfile: true

This results in a failure to detect the true positive, because Semgrep did not perform cross-function analysis.

Known limitations of cross-file analysis

CommonJS

Currently Semgrep's cross-file analysis does not handle specific cases of CommmonJS where you define a function and assign it to an export later. Cross-file analysis does not track the code below:

function get_user() {
return get_user_input("example")
}

module.exports = get_user

Regressions in cross-file analysis

Cross-file analysis resolves names differently than Semgrep OSS's analysis. Consequently, rules with interfile: true may produce different results than Semgrep OSS. Some instances could be regarded as regressions; if you encounter them, please file a bug report. When you need to report a bug in Semgrep's cross-file analysis, go through Semgrep Support. You can also contact us through Semgrep Community Slack group.

Appendix

Types of Semgrep Code analysis

Cross-file (interfile) analysis
  • Cross-file analysis finds patterns spanning multiple files to help security engineers deeply understand their organization's security issues. This analysis reduces noise and detects issues that Semgrep OSS can't find.
  • Cross-file analysis runs on full scans. These scans may take longer to complete and can use more memory than Semgrep OSS scans. See the available languages for cross-file analysis in Supported languages.
  • In Semgrep Code, cross-file analysis includes cross-function analysis as well.
Cross-function (interprocedural) analysis
  • Cross-function analysis finds patterns within a single file spanning code blocks and functions.
  • Semgrep Code scans run cross-function analysis by default.
  • See an example of cross-function analysis in Semgrep Code cross-function example.
  • See the available languages for cross-function analysis in Supported languages.

Semgrep Code cross-file CI scan issues

To provide reliably completed scans, Semgrep Code can fall back to the use of Semgrep OSS Engine. This ensures that in the vast majority of cases, scans run successfully.

By default, if a scan uses more than 5 GB of memory during cross-file pre-processing, the scan uses single-function analysis to ensure lower memory consumption. Similarly, if a cross-file scan doesn't complete after 3 hours, the analysis times out and Semgrep re-scans the repository using single-function analysis. Typically, this happens because the repository is very large.

If 1-2 repositories cause CI scan issues and scanning these repositories with interfile analysis is not critical, modify your configuration file to use semgrep ci --oss-only. This overrides the Semgrep Cloud Platform setting for these repositories, and always runs these scans with single-function analysis.

If many repositories cause scan issues, or you have critical repositories you are unable to scan with Semgrep's interfile analysis:

  1. Disable the Cross-file analysis toggle in the Settings page of your organization.
  2. Review scan troubleshooting guides such as A Semgrep scan is having a problem - what next? or Troubleshooting "You are seeing this because the engine was killed."
  3. If you need additional guidance, contact Semgrep Support, or reach out to the Semgrep team in the Semgrep Community Slack so we can help you resolve the issue and create a plan for your organization.

Difference between cross-file analysis and join mode

Cross-file analysis is different from join mode, which also allows you to perform cross-file analyses by letting you join on the metavariable matches in separate rules. Join mode is an experimental feature which is not actively developed or maintained. You may encounter many issues while using join mode.

Feedback for Semgrep Code's advanced analyses

The team at Semgrep is excited to hear what’s on your mind. As you explore these features, we want to know what you'd like to be able to capture with it. We believe that this deeper analysis helps users find more vulnerabilities, build trust with developers, and enforce code standards quickly. Let us know what you think about the results in the Semgrep Community Slack.


Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.