Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SARIF parser #3

Merged
merged 24 commits into from
Oct 10, 2023
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
fe6def4
Add SARIF parser
tushar-deepsource Oct 5, 2023
505b62b
style: format code with isort and Black
enterprise-deepsource-icu[bot] Oct 5, 2023
5212304
style: format code with Black and isort
deepsource-autofix[bot] Oct 5, 2023
5d9f071
bugfix
tushar-deepsource Oct 5, 2023
3b81fbe
add checkout action
tushar-deepsource Oct 5, 2023
23b8e9b
Add run_community_analyzer.py and tests
tushar-deepsource Oct 6, 2023
9e9e2d0
Move tox and mypy scripts out
tushar-deepsource Oct 6, 2023
362ed39
Update README and fix tox
tushar-deepsource Oct 6, 2023
e0e0f78
Update tests glob
tushar-deepsource Oct 6, 2023
50c0b55
Fix DeepSource issues
tushar-deepsource Oct 6, 2023
e0e6f11
Add kubelinter test
tushar-deepsource Oct 9, 2023
1aa5fcc
Update tox.yml
tushar-deepsource Oct 10, 2023
453ae64
Remove duplicate setup python step
tushar-deepsource Oct 10, 2023
2c37bb1
Update tox.ini
tushar-deepsource Oct 10, 2023
8458fff
Add more tests
tushar-deepsource Oct 10, 2023
1520982
Docstrings
tushar-deepsource Oct 10, 2023
14cce7c
style: format code with Black and isort
deepsource-autofix[bot] Oct 10, 2023
0e3fad3
Add CLI test and better error type
tushar-deepsource Oct 10, 2023
1940221
no-cache-dir
tushar-deepsource Oct 10, 2023
cce5fe2
remove unused imports
tushar-deepsource Oct 10, 2023
9755dec
add pragma to exclude_lines
tushar-deepsource Oct 10, 2023
050f72a
newline at the end
tushar-deepsource Oct 10, 2023
3f8201b
fix rcfile
tushar-deepsource Oct 10, 2023
7ca9433
Use action
tushar-deepsource Oct 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[report]
exclude_lines =
raise AssertionError
# pragma: no cover
12 changes: 11 additions & 1 deletion .deepsource.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,22 @@
version = 1

test_patterns = ["tests/**"]
test_patterns = ["**/tests/**"]

[[analyzers]]
name = "python"

[analyzers.meta]
runtime_version = "3.x.x"
type_checker = "mypy"

[[analyzers]]
name = "test-coverage"

[[analyzers]]
name = "secrets"

[[analyzers]]
name = "docker"

[[transformers]]
name = "black"
Expand Down
36 changes: 36 additions & 0 deletions .github/workflows/tox.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Tox

on:
push:
branches: [master]
pull_request:
branches: [master]
env:
DEEPSOURCE_DSN: ${{ secrets.DEEPSOURCE_DSN }}

jobs:
build:
runs-on: ubuntu-latest
name: Python tests
steps:
- uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 1

- name: Setup python
uses: actions/setup-python@v3
with:
python-version: 3.11
architecture: x64

- name: Install dependencies
run: pip install tox

- name: Run tests and type checking
run: tox

- name: Report test coverage to DeepSource
srijan-deepsource marked this conversation as resolved.
Show resolved Hide resolved
run: |
curl https://deepsource.io/cli | sh
./bin/deepsource report --analyzer test-coverage --key python --value-file ./coverage.xml
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ __pycache__/
venv*
.tox
.coverage
**/*.egg-info
**/build
**/dist
14 changes: 14 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM python:3.11-alpine

RUN mkdir -p /home/runner /app /artifacts /toolbox \
&& chown -R 1000:3000 /home/runner /app /artifacts /toolbox \
&& chmod -R o-rwx /app /artifacts /toolbox \
&& adduser -D -u 1000 runner

RUN apk add --no-cache git grep

COPY ./sarif-parser /toolbox/sarif-parser

RUN pip install --no-cache-dir /toolbox/sarif-parser

USER runner
35 changes: 29 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,38 @@ Hub of all open-sourced third-party static analyzers supported by DeepSource.

## Supported Analyzers

|Analyzer name|Latest version|Language / Technology|
|:-----------|:------------|:-------------------|
|[facebook/infer](https://github.com/facebook/infer)|v1.1.0|Java, C++, Objective-C|
|[Azure/bicep](https://github.com/Azure/bicep)|v0.20.4|Azure Resource Manager|
|[stackrox/kube-linter](https://github.com/stackrox/kube-linter)|0.6.4|Kubernetes, Helm|
| Analyzer name | Latest version | Language / Technology |
| :-------------------------------------------------------------- | :------------- | :--------------------- |
| [facebook/infer](https://github.com/facebook/infer) | v1.1.0 | Java, C++, Objective-C |
| [Azure/bicep](https://github.com/Azure/bicep) | v0.20.4 | Azure Resource Manager |
| [stackrox/kube-linter](https://github.com/stackrox/kube-linter) | 0.6.4 | Kubernetes, Helm |

---

## Development Guide
...

### Running tests

- Create and activate a virtual environment
- Run `pip install -r requirements-dev.txt` to do an editable install
- Run `pytest` to run tests, and ensure that the coverage report has no missing
lines.

### The test suite

There are minimal tests for the `run_community_analyzer.py` wrapper in
`tests/community_analyzer_test.py` that do sanity checks, to ensure that the
issue map is being respected, etc.

For the SARIF parser itself, the test suite expects you to create two files in
`sarif-parser/tests/sarif_files`, a SARIF input file with `.sarif` extension,
and the expected DeepSource output file with the same name, but `.sarif.json`
extension.

### Type Checking

Run `mypy .`

## Maintenance Guide

...
3 changes: 3 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[mypy]
strict = True
exclude = setup.py|utils|venv*|build|dist
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
-e sarif-parser[dev]
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sarif-parser
38 changes: 38 additions & 0 deletions run_community_analyzer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import argparse
import os
import os.path

from sarif_parser import run_sarif_parser


class CommunityAnalyzerArgs:
analyzer: str


def get_issue_map(analyzer_name: str) -> str:
"""Returns the appropriate issue map filepath for the given analyzer."""
analyzers_dir = os.path.join(os.path.dirname(__file__), "analyzers")
return os.path.join(analyzers_dir, analyzer_name, "utils", "issue_map.json")


def main(argv: list[str] | None = None) -> None:
"""Runs the CLI."""
toolbox_path = os.getenv("TOOLBOX_PATH", "/toolbox")
output_path = os.path.join(toolbox_path, "analysis_results.json")
artifacts_path = os.getenv("ARTIFACTS_PATH", "/artifacts")

parser = argparse.ArgumentParser("sarif_parser")
parser.add_argument(
"--analyzer",
help="Which community analyzer to run. Example: 'kube-linter'",
required=True,
)
args = parser.parse_args(argv, namespace=CommunityAnalyzerArgs)

analyzer_name = args.analyzer
issue_map_path = get_issue_map(analyzer_name)
run_sarif_parser(artifacts_path, output_path, issue_map_path)


if __name__ == "__main__":
main() # pragma: no cover
17 changes: 17 additions & 0 deletions sarif-parser/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# sarif-parser

Parse SARIF reports and covert them to DeepSource issues.

## Usage

```bash
git clone https://github.com/DeepSourceCorp/community-analyzers
cd community-analyzers/sarif-parser
# to install the package
pip install .
# to convert a single sarif file to DeepSource JSON, and output to terminal
sarif-parser path/to/sarif-file.json --output /dev/stdout
# to convert a folder containing ONLY sarif files, to DeepSource JSON.
# output defaults to <TOOLBOX_PATH>/analysis_results.json
sarif-parser path/to/folder
```
49 changes: 49 additions & 0 deletions sarif-parser/setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
[metadata]
name = sarif-parser
version = 0.1.0
description = Parse SARIF reports and covert them to DeepSource issues.
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/DeepSourceCorp/community-analyzers
author = Tushar Sadhwani
author_email = [email protected]
license = MIT
license_file = LICENSE
classifiers =
License :: OSI Approved :: MIT License
Operating System :: OS Independent
Programming Language :: Python :: 3
Programming Language :: Python :: 3 :: Only
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
Programming Language :: Python :: 3.11
Programming Language :: Python :: Implementation :: CPython
Typing :: Typed

[options]
packages = find:

python_requires = >=3.8
package_dir = =src

[options.packages.find]
where = ./src

[options.entry_points]
console_scripts =
sarif-parser=sarif_parser.cli:cli

[options.extras_require]
dev =
black
mypy
pytest
pytest-cov

[options.package_data]
sarif-parser =
py.typed

[tool:pytest]
addopts = --cov --cov-report=term-missing
3 changes: 3 additions & 0 deletions sarif-parser/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from setuptools import setup

setup()
132 changes: 132 additions & 0 deletions sarif-parser/src/sarif_parser/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
"""sarif-parser - Parse SARIF reports and covert them to DeepSource issues."""
from __future__ import annotations

import json
import os.path
from typing import Any, TypedDict


class Issue(TypedDict):
issue_code: str
issue_text: str
location: IssueLocation


class IssueLocation(TypedDict):
path: str
position: IssuePosition


class IssuePosition(TypedDict):
begin: LineColumn
end: LineColumn


class LineColumn(TypedDict):
line: int
column: int


def parse(
sarif_data: dict[str, Any],
work_dir: str = "",
issue_map: dict[str, Any] | None = None,
) -> list[Issue]:
"""Parses a SARIF file and returns a list of DeepSource issues."""
if issue_map is None:
issue_map = {}

deepsource_issues: list[Issue] = []
for run in sarif_data["runs"]:
for issue in run["results"]:
assert len(issue["locations"]) == 1
location = issue["locations"][0]["physicalLocation"]
issue_path = location["artifactLocation"]["uri"]
# remove file:// prefix if present
issue_path = issue_path.removeprefix("file://")
# remove work_dir prefix, if present
issue_path = issue_path.removeprefix(work_dir)
# remove leading "/" if any
issue_path = issue_path.removeprefix("/")

start_line = location.get("contextRegion", {}).get(
"startLine"
) or location.get("region", {}).get("startLine")
start_column = (
location.get("contextRegion", {}).get("startColumn")
or location.get("region", {}).get("startColumn")
or 1 # columns are 1 indexed by default
)
end_line = (
location.get("contextRegion", {}).get("endLine")
or location.get("region", {}).get("endLine")
or start_line
)
end_column = (
location.get("contextRegion", {}).get("endColumn")
or location.get("region", {}).get("endColumn")
or start_column
)

issue_code = issue["ruleId"]
if issue_code in issue_map:
issue_code = issue_map[issue_code]["issue_code"]

deepsource_issue = Issue(
issue_code=issue_code,
issue_text=issue["message"]["text"],
location=IssueLocation(
path=issue_path,
position=IssuePosition(
begin=LineColumn(line=start_line, column=start_column),
end=LineColumn(line=end_line, column=end_column),
),
),
)
deepsource_issues.append(deepsource_issue)

return deepsource_issues


def run_sarif_parser(
filepath: str,
output_path: str,
issue_map_path: str | None,
) -> None:
"""Parse SARIF files from given filepath, and save JSON output in output path."""
# Get list of sarif files
if not os.path.exists(filepath):
raise FileNotFoundError(f"{filepath} does not exist.")

if os.path.isdir(filepath):
artifacts = [os.path.join(filepath, file) for file in os.listdir(filepath)]
else:
artifacts = [filepath]

# Prepare mapping from SARIF rule IDs to DeepSource issue codes
if issue_map_path is not None:
with open(issue_map_path) as file:
issue_map = json.load(file)
else:
issue_map = None

# Run parser
deepsource_issues = []
for artifact_path in artifacts:
with open(artifact_path) as file: # skipcq: PTC-W6004 -- nothing sensitive here
artifact = json.load(file)

sarif_data = json.loads(artifact["data"])
work_dir = artifact["metadata"]["work_dir"]
issues = parse(sarif_data, work_dir, issue_map)
deepsource_issues.extend(issues)

issues_dict = {
"issues": deepsource_issues,
"metrics": [],
"errors": [],
"is_passed": len(deepsource_issues) == 0,
"extra_data": {},
}
with open(output_path, "w") as file:
json.dump(issues_dict, file)
5 changes: 5 additions & 0 deletions sarif-parser/src/sarif_parser/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Support executing the CLI by doing `python -m sarif_parser`."""
from sarif_parser.cli import cli

if __name__ == "__main__":
cli()
Loading