Semgrep: Find Real Bugs with Pattern-Based Static Analysis
Most linters catch style problems. Semgrep catches real bugs: SQL injection, hardcoded secrets, insecure deserialization, SSRF vulnerabilities, and misuse of cryptographic APIs. It works on source code using pattern matching that understands syntax — not just text — and supports 30+ languages out of the box.
The core idea: write patterns in YAML that look like the code you want to find. Semgrep handles the AST parsing and matching. You describe what you're looking for; Semgrep finds it across your entire codebase.
Why Semgrep vs Other Tools
vs grep/regex: Semgrep understands code structure. re.compile($VAR) in Semgrep matches any regex compilation from any variable, regardless of whitespace, line breaks, or variable names. The equivalent regex would miss most real cases.
vs ESLint/Pylint: Language-specific linters are great for style. Semgrep rules are portable across language variations and focus on security-relevant patterns that lint rules rarely address.
vs Sonarqube: SonarQube is heavyweight and license-constrained. Semgrep is lightweight, open source, and runs in CI without a server.
vs Snyk/Dependabot: Those tools scan dependencies. Semgrep analyzes your actual code.
Installation
# pip
pip install semgrep
# Homebrew (macOS)
brew install semgrep
# Docker
docker pull returntocorp/semgrep
First Run: Using Community Rules
Semgrep Registry has thousands of community-contributed rules. Start with the security audit rules for your language:
# Python security audit
semgrep --config p/python
# JavaScript/TypeScript
semgrep --config p/javascript
semgrep --config p/typescript
# Go
semgrep --config p/go
# Java
semgrep --config p/java
# Run on current directory
semgrep --config p/security-audit .
The p/ prefix fetches from Semgrep Registry. The output shows matching files, line numbers, rule IDs, and explanations.
Writing Custom Rules
Custom rules are YAML files that describe patterns. Here's a rule that detects hardcoded AWS credentials:
# rules/no-hardcoded-aws-keys.yaml
rules:
- id: hardcoded-aws-access-key
patterns:
- pattern: |
$VAR = "AKIA..."
message: "Potential hardcoded AWS access key: $VAR"
languages: [python, javascript, typescript, java, go]
severity: ERROR
metadata:
category: security
cwe: "CWE-798: Use of Hard-coded Credentials"
Run it:
semgrep --config rules/no-hardcoded-aws-keys.yaml ./src
Pattern Syntax
Semgrep's pattern language is powerful but readable:
Metavariables ($VAR): Match any expression and capture it:
pattern: requests.get($URL, verify=False)
# Matches: requests.get(url, verify=False)
# Also: requests.get(user_input, verify=False)
Ellipsis (...): Match any sequence of statements or arguments:
pattern: |
cursor.execute($QUERY, ...)
# Matches: cursor.execute(query)
# Also: cursor.execute(query, params)
Pattern-not: Exclude patterns that are false positives:
patterns:
- pattern: cursor.execute($QUERY)
- pattern-not: cursor.execute("...") # literal strings are fine
Pattern-either: Match any of several patterns:
pattern-either:
- pattern: eval($X)
- pattern: exec($X)
- pattern: __import__($X)
Real-World Rule Examples
SQL Injection Detection
rules:
- id: sql-injection-string-format
languages: [python]
patterns:
- pattern: $DB.execute($QUERY % ...)
- pattern-not: $DB.execute("..." % ...)
message: "SQL query built with string formatting — use parameterized queries"
severity: ERROR
SSRF via User-Controlled URL
rules:
- id: ssrf-user-controlled-url
languages: [python]
patterns:
- pattern-either:
- pattern: requests.get($URL, ...)
- pattern: requests.post($URL, ...)
- pattern: urllib.request.urlopen($URL, ...)
- pattern-inside: |
@app.route(...)
def $FUNC(...):
...
- pattern: |
$URL = request.$ATTR
message: "Potential SSRF: HTTP request to user-controlled URL"
severity: WARNING
Insecure Deserialization
rules:
- id: unsafe-pickle-loads
languages: [python]
pattern-either:
- pattern: pickle.loads(...)
- pattern: pickle.load(...)
message: "Unsafe deserialization with pickle — never deserialize untrusted data"
severity: ERROR
metadata:
cwe: "CWE-502: Deserialization of Untrusted Data"
Taint Mode: Data Flow Analysis
Semgrep's taint mode tracks data from sources (user input) to sinks (dangerous operations), even across function calls:
rules:
- id: taint-sql-injection
mode: taint
languages: [python]
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form.get(...)
pattern-sinks:
- pattern: $DB.execute(...)
message: "User input flows to SQL execution — use parameterized queries"
severity: ERROR
Taint mode is more powerful but slower. Use it for high-value security rules where data flow matters.
CI/CD Integration
GitHub Actions
# .github/workflows/semgrep.yml
name: Semgrep
on:
push:
branches: [main]
pull_request: {}
jobs:
semgrep:
name: semgrep/ci
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: returntocorp/semgrep-action@v1
with:
config: >-
p/security-audit
p/secrets
rules/
This runs Semgrep on every PR and fails the check if errors are found.
Pre-commit Hook
# .pre-commit-config.yaml
repos:
- repo: https://github.com/returntocorp/semgrep
rev: v1.70.0
hooks:
- id: semgrep
args: ["--config", "p/security-audit", "--error"]
Ignoring False Positives
Add inline comments to suppress specific findings:
# nosemgrep: hardcoded-aws-access-key
TEST_ACCESS_KEY = "AKIAIOSFODNN7EXAMPLE" # This is the example from AWS docs
Or add file-level ignores in .semgrepignore:
tests/
fixtures/
*.test.ts
vendor/
Autofix
Some rules can automatically fix what they find:
rules:
- id: use-secrets-manager
pattern: os.environ["$SECRET_NAME"]
fix: get_secret("$SECRET_NAME")
message: "Use get_secret() instead of directly reading environment variables"
languages: [python]
severity: WARNING
Apply fixes:
semgrep --config rules/ --autofix ./src
Preview changes before applying: semgrep --config rules/ --autofix --dryrun ./src
Semgrep OSS vs Semgrep Pro
| Semgrep OSS | Semgrep Pro | |
|---|---|---|
| Price | Free | Paid (team tier) |
| Inter-file analysis | No | Yes |
| Cross-function taint | Limited | Full |
| Managed CI integration | Self-configured | Managed |
| SARIF output | Yes | Yes |
| Custom rules | Unlimited | Unlimited |
Semgrep OSS handles 90% of use cases — custom rules, CI integration, community rule registry. The Pro tier adds inter-file data flow analysis that's valuable for large codebases with complex call graphs.
Building a Rule Library
Start with these steps:
- Run the language security pack (
p/python,p/javascript, etc.) and fix real findings - Add secrets detection (
p/secrets) to catch hardcoded credentials - Write custom rules for your specific frameworks and patterns (internal APIs, deprecated functions)
- Add to pre-commit and CI so rules run on every change
A well-curated rule library catches issues that code review misses: the SQL query built from user input three function calls deep, the API call that skips certificate verification in a utility function used everywhere, the dependency method that returns a coroutine that callers forget to await.
Semgrep doesn't replace code review — it complements it by automating the pattern-matching part of security review, so humans can focus on logic and design.