Perform cross-file analysis
Use Semgrep Code's cross-file (interfile) analysis to detect vulnerabilities across files and folders.
By design, Semgrep open source software, Semgrep Community Edition (CE) can only analyze interactions within a single function, also known as intraprocedural analysis. This limited scope makes Semgrep CE fast and easy to integrate into developer workflows.
Semgrep Code runs cross-function (interprocedural) analysis by default, and gives security teams the option to trade off speed for better results and deeper analysis with cross-file analysis. By analyzing interactions across files and functions, Semgrep Code can reduce noise, uncover new vulnerabilities, and make results easier to understand.
Refer to Supported languages to see languages supported by Semgrep Code.
Run cross-file analysis
This section guides you through installing the proprietary cross-file (interfile) analysis binary and helps you to scan your projects both in CLI and with Semgrep AppSec Platform.
Run cross-file analysis with Semgrep AppSec Platform
You have completed a Semgrep core deployment.
This is the preferred method to run cross-file analysis. It enables you to view and triage your findings from a centralized location. Your source code is not uploaded.
- Sign in to Semgrep AppSec Platform.
- Click Settings.
- In the Deployment tab, click the Cross-file analysis toggle.
- Ensure that you have the default ruleset added in your Policies page. If this ruleset is not added, go to Semgrep Registry - Default ruleset page, then click Add to Policy. For best results, set this ruleset to the Monitor rule mode.
Full scans now include cross-file analysis. You can trigger a full scan through your CI provider. Note that cross-file analysis does not currently run on diff-aware (pull or merge request) scans.
Run cross-file analysis in the CLI
- Local installation of Semgrep CLI. See Getting started with Semgrep to install Semgrep CLI.
- Sign up or sign in to Semgrep AppSec Platform.
- For first-time users, click Create an organization. Note that you can further integrate organizations (orgs) with GitLab accounts and GitHub accounts, including personal and org accounts, after you complete this procedure.
- Click Settings.
- In the Deployment tab, click the Cross-file analysis toggle.
- Ensure that you are in the root directory of the repository you want to scan.
- In your CLI, log in to your Semgrep AppSec Platform account and run a scan:
semgrep login && semgrep ci
Update cross-file analysis in the CLI
Cross-file analysis uses a separate semgrep
binary. To update to the latest version, follow these steps:
-
Update your Semgrep CLI tool with the following command:
- macOS
- Linux
- Windows Subsystem for Linux (WSL)
- Docker
brew upgrade semgrep
Alternatively:
python3 -m pip install --upgrade semgrep
python3 -m pip install --upgrade semgrep
# ensure that you have Python 3.9 or later installed
# on WSL before proceeding
python3 -m pip install --upgrade semgrepdocker pull semgrep/semgrep:latest
-
Log in to Semgrep AppSec Platform:
semgrep login
-
Update the Semgrep cross-file binary:
semgrep install-semgrep-pro
Write rules that analyze across files and functions
To create rules that analyze across files and functions, add interfile: true
under the options
key when defining a rule. This key tells Semgrep to use the rule for both cross-function and cross-file analysis.
Cross-function example
The following example shows how to define the interfile
key (see the Rule pane) and the resulting cross-function analysis in the Test code pane.
Click Run to see the true positive in lines 27-30.
Semgrep Code performed cross-function analysis as the userInput()
source was called in main()
while the exec()
sink was called in the DockerCompose
class.
Interact with the rule widget to compare Semgrep Community Edition (CE) and Semgrep Code. In the Rule pane, you can remove the lines:
options:
interfile: true
This results in a failure to detect the true positive, because Semgrep did not perform cross-function analysis.
Known limitations of cross-file analysis
CommonJS
Currently Semgrep's cross-file analysis does not handle specific cases of CommmonJS where you define a function and assign it to an export later. Cross-file analysis does not track the code below:
function get_user() {
return get_user_input("example")
}
module.exports = get_user
Regressions in cross-file analysis
Cross-file analysis resolves names differently than Semgrep CE's analysis. Consequently, rules with interfile: true
may produce different results than Semgrep CE. Some instances could be regarded as regressions; if you encounter them, please file a bug report. When you need to report a bug in Semgrep's cross-file analysis, go through Semgrep Support. You can also contact us through Semgrep Community Slack group.
Appendix
Types of Semgrep Code analysis
- Cross-file (interfile) analysis
- Cross-file analysis finds patterns spanning multiple files to help security engineers deeply understand their organization's security issues. This analysis reduces noise and detects issues that Semgrep CE can't find.
- Cross-file analysis runs on full scans. These scans may take longer to complete and can use more memory than Semgrep CE scans. See the available languages for cross-file analysis in Supported languages.
- In Semgrep Code, cross-file analysis includes cross-function analysis as well.
- Cross-function (interprocedural) analysis
- Cross-function analysis finds patterns within a single file spanning code blocks and functions.
- Semgrep Code scans run cross-function analysis by default.
- See an example of cross-function analysis in Semgrep Code cross-function example.
- See the available languages for cross-function analysis in Supported languages.
Semgrep Code cross-file CI scan issues
To provide reliably completed scans, Semgrep Code can fall back to the use of Semgrep CE. This ensures that in the vast majority of cases, scans run successfully.
By default, if a scan uses more than 5 GB of memory during cross-file pre-processing, the scan uses single-function analysis to ensure lower memory consumption. Similarly, if a cross-file scan doesn't complete after 3 hours, the analysis times out and Semgrep re-scans the repository using single-function analysis. Typically, this happens because the repository is very large.
If 1-2 repositories cause CI scan issues and scanning these repositories with interfile analysis is not critical, modify your configuration file to use semgrep ci --oss-only
. This overrides the Semgrep AppSec Platform setting for these repositories, and always runs these scans with single-function analysis.
If many repositories cause scan issues, or you have critical repositories you are unable to scan with Semgrep's interfile analysis:
- Disable the Cross-file analysis toggle in the Settings page of your organization.
- Review scan troubleshooting guides such as A Semgrep scan is having a problem - what next? or Troubleshooting "You are seeing this because the engine was killed."
- If you need additional guidance, contact Semgrep Support, or reach out to the Semgrep team in the Semgrep Community Slack so we can help you resolve the issue and create a plan for your organization.
Difference between cross-file analysis and join mode
Cross-file analysis is different from join mode, which also allows you to perform cross-file analyses by letting you join on the metavariable matches in separate rules. Join mode is an experimental feature which is not actively developed or maintained. You may encounter many issues while using join mode.
Feedback for Semgrep Code's advanced analyses
The team at Semgrep is excited to hear what’s on your mind. As you explore these features, we want to know what you'd like to be able to capture with it. We believe that this deeper analysis helps users find more vulnerabilities, build trust with developers, and enforce code standards quickly. Let us know what you think about the results in the Semgrep Community Slack.
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.