Semgrep: Find Real Bugs with Pattern-Based Static Analysis

Security 2026-03-04 · 4 min read semgrep static-analysis security code-quality ci-cd devsecops
By DevTools Guide Editorial Team — Software engineers and developer advocates covering tools, workflows, and productivity for modern development teams.

Most linters catch style problems. Semgrep catches real bugs: SQL injection, hardcoded secrets, insecure deserialization, SSRF vulnerabilities, and misuse of cryptographic APIs. It works on source code using pattern matching that understands syntax — not just text — and supports 30+ languages out of the box.

Photo by Daniil Komov on Unsplash

The core idea: write patterns in YAML that look like the code you want to find. Semgrep handles the AST parsing and matching. You describe what you're looking for; Semgrep finds it across your entire codebase.

Why Semgrep vs Other Tools

vs grep/regex: Semgrep understands code structure. re.compile($VAR) in Semgrep matches any regex compilation from any variable, regardless of whitespace, line breaks, or variable names. The equivalent regex would miss most real cases.

vs ESLint/Pylint: Language-specific linters are great for style. Semgrep rules are portable across language variations and focus on security-relevant patterns that lint rules rarely address.

vs Sonarqube: SonarQube is heavyweight and license-constrained. Semgrep is lightweight, open source, and runs in CI without a server.

vs Snyk/Dependabot: Those tools scan dependencies. Semgrep analyzes your actual code.

Installation

# pip
pip install semgrep

# Homebrew (macOS)
brew install semgrep

# Docker
docker pull returntocorp/semgrep

First Run: Using Community Rules

Semgrep Registry has thousands of community-contributed rules. Start with the security audit rules for your language:

# Python security audit
semgrep --config p/python

# JavaScript/TypeScript
semgrep --config p/javascript
semgrep --config p/typescript

# Go
semgrep --config p/go

# Java
semgrep --config p/java

# Run on current directory
semgrep --config p/security-audit .

The p/ prefix fetches from Semgrep Registry. The output shows matching files, line numbers, rule IDs, and explanations.

Want more security guides? Get guides like this in your inbox — DevTools Guide delivers one free deep-dive every week.

Writing Custom Rules

Custom rules are YAML files that describe patterns. Here's a rule that detects hardcoded AWS credentials:

# rules/no-hardcoded-aws-keys.yaml
rules:
  - id: hardcoded-aws-access-key
    patterns:
      - pattern: |
          $VAR = "AKIA..."
    message: "Potential hardcoded AWS access key: $VAR"
    languages: [python, javascript, typescript, java, go]
    severity: ERROR
    metadata:
      category: security
      cwe: "CWE-798: Use of Hard-coded Credentials"

Run it:

semgrep --config rules/no-hardcoded-aws-keys.yaml ./src

Pattern Syntax

Semgrep's pattern language is powerful but readable:

Metavariables ($VAR): Match any expression and capture it:

pattern: requests.get($URL, verify=False)
# Matches: requests.get(url, verify=False)
# Also: requests.get(user_input, verify=False)

Ellipsis (...): Match any sequence of statements or arguments:

pattern: |
  cursor.execute($QUERY, ...)
# Matches: cursor.execute(query)
# Also: cursor.execute(query, params)

Pattern-not: Exclude patterns that are false positives:

patterns:
  - pattern: cursor.execute($QUERY)
  - pattern-not: cursor.execute("...")  # literal strings are fine

Pattern-either: Match any of several patterns:

pattern-either:
  - pattern: eval($X)
  - pattern: exec($X)
  - pattern: __import__($X)

Real-World Rule Examples

SQL Injection Detection

rules:
  - id: sql-injection-string-format
    languages: [python]
    patterns:
      - pattern: $DB.execute($QUERY % ...)
      - pattern-not: $DB.execute("..." % ...)
    message: "SQL query built with string formatting — use parameterized queries"
    severity: ERROR

SSRF via User-Controlled URL

rules:
  - id: ssrf-user-controlled-url
    languages: [python]
    patterns:
      - pattern-either:
          - pattern: requests.get($URL, ...)
          - pattern: requests.post($URL, ...)
          - pattern: urllib.request.urlopen($URL, ...)
      - pattern-inside: |
          @app.route(...)
          def $FUNC(...):
              ...
      - pattern: |
          $URL = request.$ATTR
    message: "Potential SSRF: HTTP request to user-controlled URL"
    severity: WARNING

Insecure Deserialization

rules:
  - id: unsafe-pickle-loads
    languages: [python]
    pattern-either:
      - pattern: pickle.loads(...)
      - pattern: pickle.load(...)
    message: "Unsafe deserialization with pickle — never deserialize untrusted data"
    severity: ERROR
    metadata:
      cwe: "CWE-502: Deserialization of Untrusted Data"

Taint Mode: Data Flow Analysis

Semgrep's taint mode tracks data from sources (user input) to sinks (dangerous operations), even across function calls:

rules:
  - id: taint-sql-injection
    mode: taint
    languages: [python]
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form.get(...)
    pattern-sinks:
      - pattern: $DB.execute(...)
    message: "User input flows to SQL execution — use parameterized queries"
    severity: ERROR

Taint mode is more powerful but slower. Use it for high-value security rules where data flow matters.

CI/CD Integration

GitHub Actions

# .github/workflows/semgrep.yml
name: Semgrep
on:
  push:
    branches: [main]
  pull_request: {}

jobs:
  semgrep:
    name: semgrep/ci
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: returntocorp/semgrep-action@v1
        with:
          config: >-
            p/security-audit
            p/secrets
            rules/

This runs Semgrep on every PR and fails the check if errors are found.

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/returntocorp/semgrep
    rev: v1.70.0
    hooks:
      - id: semgrep
        args: ["--config", "p/security-audit", "--error"]

Ignoring False Positives

Add inline comments to suppress specific findings:

# nosemgrep: hardcoded-aws-access-key
TEST_ACCESS_KEY = "AKIAIOSFODNN7EXAMPLE"  # This is the example from AWS docs

Or add file-level ignores in .semgrepignore:

tests/
fixtures/
*.test.ts
vendor/

Autofix

Some rules can automatically fix what they find:

rules:
  - id: use-secrets-manager
    pattern: os.environ["$SECRET_NAME"]
    fix: get_secret("$SECRET_NAME")
    message: "Use get_secret() instead of directly reading environment variables"
    languages: [python]
    severity: WARNING

Apply fixes:

semgrep --config rules/ --autofix ./src

Preview changes before applying: semgrep --config rules/ --autofix --dryrun ./src

Semgrep OSS vs Semgrep Pro

	Semgrep OSS	Semgrep Pro
Price	Free	Paid (team tier)
Inter-file analysis	No	Yes
Cross-function taint	Limited	Full
Managed CI integration	Self-configured	Managed
SARIF output	Yes	Yes
Custom rules	Unlimited	Unlimited

Semgrep OSS handles 90% of use cases — custom rules, CI integration, community rule registry. The Pro tier adds inter-file data flow analysis that's valuable for large codebases with complex call graphs.

Building a Rule Library

Start with these steps:

Run the language security pack (p/python, p/javascript, etc.) and fix real findings
Add secrets detection (p/secrets) to catch hardcoded credentials
Write custom rules for your specific frameworks and patterns (internal APIs, deprecated functions)
Add to pre-commit and CI so rules run on every change

A well-curated rule library catches issues that code review misses: the SQL query built from user input three function calls deep, the API call that skips certificate verification in a utility function used everywhere, the dependency method that returns a coroutine that callers forget to await.

Semgrep doesn't replace code review — it complements it by automating the pattern-matching part of security review, so humans can focus on logic and design.