diff --git a/Makefile b/Makefile index 119ce72..31a9d35 100644 --- a/Makefile +++ b/Makefile @@ -16,4 +16,8 @@ release_deploy_jar: .PHONY: format format: - bazelisk run //cli/format \ No newline at end of file + bazelisk run //cli/format + +.PHONY: generate-readme +generate-readme: + bazelisk run //tools:generate-readme \ No newline at end of file diff --git a/README.md b/README.md index 7130730..66537ad 100644 --- a/README.md +++ b/README.md @@ -106,6 +106,7 @@ This will produce an impacted targets json list with target label, target distan ] ``` + ## CLI Interface `bazel-diff` Command @@ -126,53 +127,32 @@ Commands: ### `generate-hashes` command ```terminal -Usage: bazel-diff generate-hashes [-hkvV] [--[no-]useCquery] [-b=] - [--[no-]excludeExternalTargets] - [--[no-]includeTargetType] +Usage: bazel-diff generate-hashes [-hkvV] [--[no-]excludeExternalTargets] [-- + [no-]includeTargetType] [--[no-]useCquery] + [-b=] [--contentHashPath=] - [-s=] -w= + [--cqueryExpression=] + [-d=] + [--fineGrainedHashExternalReposFile=] + [-m=] [-s=] + -w= [-co=]... [--cqueryCommandOptions= ]... [--fineGrainedHashExternalRepos=]... - [-so=]... + [--ignoredRuleHashingAttributes=]... + [-so=]... + [-tt=[,...]]... + Writes to a file the SHA256 hashes for each Bazel Target in the provided workspace. - The filepath to write the resulting JSON to. - If not specified, the JSON will be written to STDOUT. - - By default the JSON schema is a dictionary of target => SHA-256 values. - Example: - { - "//cli:bazel-diff_deploy.jar": "4ae310f8ad2bc728934e3509b6102ca658e828b9cd668f79990e95c6663f9633" - ... - } - - If --includeTargetType is specified, the JSON schema will include the target type (SourceFile/Rule/GeneratedFile) - Example: - { - "//cli:src/test/resources/fixture/integration-test-1.zip": "SourceFile#c259eba8539f4c14e4536c61975457c2990e090067893f4a2981e7bb5f4ef4cf", - "//external:android_gmaven_r8": "Rule#795f583449a40814c05e1cc5d833002afed8d12bce5b835579c7f139c2462d61", - "//cli:bazel-diff_deploy.jar": "GeneratedFile#4ae310f8ad2bc728934e3509b6102ca658e828b9cd668f79990e95c6663f9633", - ... - } - --[no-]excludeExternalTargets - If true, exclude external targets (do not query - //external:all-targets). When Bzlmod is enabled - (detected via bazel mod graph), external targets - are excluded automatically. Set this when using - Bazel with --enable_workspace=false in other - configurations. Defaults to false. - --[no-]includeTargetType - Whether include target type in the generated JSON or not. - If false, the generate JSON schema is: {"": ""} - If true, the generate JSON schema is: {"": "#" - -tt, --targetType= - The type of targets to filter, available options are SourceFile/Rule/GeneratedFile - Only works if the JSON was generated with `--includeTargetType` enabled. - If not specified, all types of impacted targets will be returned. - -b, --bazelPath= + The filepath to write the resulting JSON of + dictionary target => SHA-256 values. If not + specified, the JSON will be written to STDOUT. + -b, --bazelPath= Path to Bazel binary. If not specified, the Bazel binary available in PATH will be used. -co, --bazelCommandOptions= @@ -188,36 +168,65 @@ workspace. when invoking `bazel cquery`. This flag is has no effect if `--useCquery`is false. --cqueryExpression= - Custom cquery expression to use instead of the default - 'deps(//...:all-targets)'. This allows you to exclude - problematic targets (e.g., analysis_test targets that - are designed to fail). Example: 'deps(//:target1) + - deps(//:target2)'. This flag has no effect if - `--useCquery` is false. + Custom cquery expression to use instead of the + default 'deps(//...:all-targets)'. This allows you + to exclude problematic targets (e.g., analysis_test + targets that are designed to fail). Example: 'deps + (//...:all-targets) except //path/to/failing: + target'. This flag has no effect if `--useCquery` + is false. + -d, --depEdgesFile= + Path to the file where dependency edges are written + to. If not specified, the dependency edges will not + be written to a file. Needed for computing build + graph distance metrics. See bazel-diff docs for + more details about build graph distance metrics. + --[no-]excludeExternalTargets + If true, exclude external targets (do not query + //external:all-targets). When Bzlmod is enabled + (detected via bazel mod graph), external targets + are excluded automatically. Set this when using + Bazel with --enable_workspace=false in other + configurations. Defaults to false. --fineGrainedHashExternalRepos= Comma separate list of external repos in which fine-grained hashes are computed for the targets. By default, external repos are treated as an opaque blob. If an external repo is specified here, bazel-diff instead computes the hash for individual - targets. For example, one wants to specify `@maven` - here if they use rules_jvm_external so that + targets. For example, one wants to specify `maven` + here if they user rules_jvm_external so that individual third party dependency change won't - invalidate all targets in the mono repo. Note that - if `--useCquery` is true; or `--useCquery` is false but - `--bazelCommandOptions=--consistent_labels` is provided, - the canonical repo name must be provided, - e.g. `@@maven` or `@@rules_jvm_external~~maven~maven` (bzlmod) - instead of apparent name `@maven` + invalidate all targets in the mono repo. + --fineGrainedHashExternalReposFile= + A text file containing a newline separated list of + external repos. Similar to + --fineGrainedHashExternalRepos but helps you avoid + exceeding max arg length. Mutually exclusive with + --fineGrainedHashExternalRepos. -h, --help Show this help message and exit. --ignoredRuleHashingAttributes= Attributes that should be ignored when hashing rule targets. + --[no-]includeTargetType + Whether include target type in the generated JSON or + not. + If false, the generate JSON schema is: {"": + ""} + If true, the generate JSON schema is: {"": + "#" } -k, --[no-]keep_going This flag controls if `bazel query` will be executed with the `--keep_going` flag or not. Disabling this flag allows you to catch configuration issues in your Bazel graph, but may not work for some Bazel setups. Defaults to `true` + -m, --modified-filepaths= + Experimental: A text file containing a newline + separated list of filepaths (relative to the + workspace) these filepaths should represent the + modified files between the specified revisions and + will be used to scope what files are hashed during + hash generation. -s, --seed-filepaths= A text file containing a newline separated list of filepaths. Each file in this list will be read and @@ -228,6 +237,10 @@ workspace. -so, --bazelStartupOptions= Additional space separated Bazel client startup options used when invoking Bazel + -tt, --targetType=[,...] + The types of targets to filter. Use comma (,) to + separate multiple values, e.g. + '--targetType=SourceFile,Rule,GeneratedFile'. --[no-]useCquery If true, use cquery instead of query when generating dependency graphs. Using cquery would yield more accurate build graph at the cost of slower query @@ -242,95 +255,61 @@ workspace. Path to Bazel workspace directory. ``` -**Note**: `--useCquery` flag may not work with very large repos due to limitation -of Bazel. You may want to fallback to use normal query mode in that case. -See for more details. - -#### Handling Failing Analysis Targets with `--cqueryExpression` - -When using `--useCquery`, Bazel's `cquery` command analyzes all targets (executes their implementation functions). This can cause issues with targets that are intentionally designed to fail during analysis, such as: - -- `analysis_test` targets from the Bazel `rules_testing` library -- Other validation targets that verify build failures - -With regular `bazel query`, these targets don't cause problems because `query` doesn't execute implementation functions. However, `cquery` will fail when it encounters these targets. - -**Solution**: Use the `--cqueryExpression` flag to specify a custom query expression that excludes the problematic targets: - -```bash -bazel-diff generate-hashes \ - --useCquery \ - --cqueryExpression "deps(//:target1) + deps(//:target2)" \ - output.json -``` - -**Important**: When crafting custom cquery expressions: - -- ❌ **Don't use**: `deps(//...:all-targets) except //:failing_target` - - This still analyzes the failing target during pattern expansion - -- ✅ **Do use**: Explicitly specify which targets or packages to include: - ```bash - --cqueryExpression "deps(//:target1) + deps(//:target2)" - --cqueryExpression "deps(//src/...:*) + deps(//lib/...:*)" - ``` - -See [GitHub Issue #301](https://github.com/Tinder/bazel-diff/issues/301) for more details. - -### What does the SHA256 value of `generate-hashes` represent? - -`generate-hashes` is a canonical SHA256 value representing all attributes and inputs into a target. These inputs -are the summation of the rule implementation hash, the SHA256 value -for every attribute of the rule and then the summation of the SHA256 value for -all `rule_inputs` using the same exact algorithm. For source_file inputs the -content of the file are converted into a SHA256 value. - ### `get-impacted-targets` command ```terminal -Usage: bazel-diff get-impacted-targets [-v] -w= - -b= +Missing required options: '--startingHashes=', '--finalHashes=', '--workspacePath=' +Usage: bazel-diff get-impacted-targets [-v] [--[no-]noBazelrc] [-b=] + [-d=] -fh= - -sh= [-o=] - [-d=] - [-tt=] - [-so=] - [--noBazelrc] + -sh= + -w= + [-so=]... + [-tt=[,...]]... Command-line utility to analyze the state of the bazel build graph - -w, --workspacePath= - Path to Bazel workspace directory. Required for module - change detection. - -b, --bazelPath= - Path to Bazel binary. If not specified, the Bazel binary - available in PATH will be used. + -b, --bazelPath= + Path to Bazel binary. If not specified, the Bazel + binary available in PATH will be used. + -d, --depEdgesFile= + Path to the file where dependency edges are. If + specified, build graph distance metrics will be + computed from the given hash data. -fh, --finalHashes= - The path to the JSON file of target hashes for the final - revision. Run 'generate-hashes' to get this value. + The path to the JSON file of target hashes for the + final revision. Run 'generate-hashes' to get this + value. + --[no-]noBazelrc Don't use .bazelrc + -o, --output= + Filepath to write the impacted Bazel targets to. If + using depEdgesFile: formatted in json, otherwise: + newline separated. If not specified, the output will + be written to STDOUT. -sh, --startingHashes= - The path to the JSON file of target hashes for the initial - revision. Run 'generate-hashes' to get this value. - -o, --output= - Filepath to write the impacted Bazel targets to. If using - depEdgesFile: formatted in json, otherwise: newline - separated. If not specified, the output will be written - to STDOUT. - -d, --depEdgesFile= - Path to the file where dependency edges are. If specified, - build graph distance metrics will be computed from the - given hash data. - -tt, --targetType= - The types of targets to filter. Use comma (,) to separate - multiple values, e.g. '--targetType=SourceFile,Rule,GeneratedFile'. - Only works if the JSON was generated with `--includeTargetType` enabled. - If not specified, all types of impacted targets will be returned. + The path to the JSON file of target hashes for the + initial revision. Run 'generate-hashes' to get this + value. -so, --bazelStartupOptions= - Additional space separated Bazel client startup options - used when invoking Bazel - --noBazelrc Don't use .bazelrc - -v, --verbose - Display query string, missing files and elapsed time + Additional space separated Bazel client startup + options used when invoking Bazel + -tt, --targetType=[,...] + The types of targets to filter. Use comma (,) to + separate multiple values, e.g. + '--targetType=SourceFile,Rule,GeneratedFile'. + -v, --verbose Display query string, missing files and elapsed time + -w, --workspacePath= + Path to Bazel workspace directory. Required for module + change detection. ``` + + +### What does the SHA256 value of `generate-hashes` represent? + +`generate-hashes` is a canonical SHA256 value representing all attributes and inputs into a target. These inputs +are the summation of the rule implementation hash, the SHA256 value +for every attribute of the rule and then the summation of the SHA256 value for +all `rule_inputs` using the same exact algorithm. For source_file inputs the +content of the file are converted into a SHA256 value. ## Installing @@ -454,6 +433,60 @@ Now you can simply run `bazel-diff` from your project: bazel run @bazel-diff//cli:bazel-diff -- bazel-diff -h ``` +## Contributors + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Maxwell Elliott
Maxwell Elliott
Honnix
Honnix
eric wang
eric wang
Eric Wang
Eric Wang
Tianyu Geng
Tianyu Geng
Patrick Balestra
Patrick Balestra
Daniel P. Purkhus
Daniel P. Purkhus
Alex Eagle
Alex Eagle
Anton Malinskiy
Anton Malinskiy
Sharmila
Sharmila
Dmitrii Kostyrev
Dmitrii Kostyrev
Jérémy Mathevet
Jérémy Mathevet
Nikhil Birmiwal
Nikhil Birmiwal
Sergei Morozov
Sergei Morozov
Fahrzin Hemmati
Fahrzin Hemmati
Jaime Lennox
Jaime Lennox
Lucas Teixeira
Lucas Teixeira
Guillaume Van Wassenhove
Guillaume Van Wassenhove
Fabian Meumertzheim
Fabian Meumertzheim
Jonathan Block
Jonathan Block
Alex Torok
Alex Torok
Naveen Narayanan
Naveen Narayanan
Mathieu Sabourin
Mathieu Sabourin
André
André
Boris
Boris
Rui Chen
Rui Chen
Sanju Naik
Sanju Naik
Ted Kaplan
Ted Kaplan
Laurenz
Laurenz
mla
mla
tinder-yukisawa
tinder-yukisawa
Kevin Jiao
Kevin Jiao
Vincent Case
Vincent Case
Walt Panfil
Walt Panfil
Mehran Poursadeghi
Mehran Poursadeghi
+ + ## Learn More Take a look at the following bazelcon talks to learn more about `bazel-diff`: @@ -461,6 +494,16 @@ Take a look at the following bazelcon talks to learn more about `bazel-diff`: * [BazelCon 2023: Improving CI efficiency with Bazel querying and bazel-diff](https://www.youtube.com/watch?v=QYAbmE_1fSo) * [BazelCon 2024: Not Going the Distance: Filtering Tests by Build Graph Distance](https://youtu.be/Or0o0Q7Zc1w?si=nIIkTH6TP-pcPoRx) +## Star History + + + + + + Star History Chart + + + ## Running the tests To run the tests simply run diff --git a/tools/BUILD b/tools/BUILD new file mode 100644 index 0000000..00fd024 --- /dev/null +++ b/tools/BUILD @@ -0,0 +1,26 @@ +genrule( + name = "cli_help_output", + outs = [ + "help_root.txt", + "help_generate_hashes.txt", + "help_get_impacted_targets.txt", + ], + cmd = """ + BINARY=$(location //cli:bazel-diff) + $$BINARY --help > $(location help_root.txt) 2>&1 || true + $$BINARY generate-hashes --help > $(location help_generate_hashes.txt) 2>&1 || true + $$BINARY get-impacted-targets --help > $(location help_get_impacted_targets.txt) 2>&1 || true + """, + tools = ["//cli:bazel-diff"], +) + +py_binary( + name = "generate-readme", + srcs = ["generate_readme.py"], + main = "generate_readme.py", + data = [ + ":cli_help_output", + "readme_template.md", + ], + python_version = "PY3", +) diff --git a/tools/generate_readme.py b/tools/generate_readme.py new file mode 100644 index 0000000..064f218 --- /dev/null +++ b/tools/generate_readme.py @@ -0,0 +1,260 @@ +"""Generates README.md by injecting live CLI help and a contributors table.""" + +import json +import os +import re +import subprocess +import sys +import time +import urllib.request +from collections import defaultdict +from pathlib import Path + + +REPO = "Tinder/bazel-diff" + + +# --------------------------------------------------------------------------- +# Runfiles helpers +# --------------------------------------------------------------------------- + +def runfile(rel_path: str) -> Path: + """Resolve a path relative to the Bazel runfiles tree. + + Data files declared in the py_binary's `data` attribute are placed + alongside the script under `_main//` in the runfiles tree. + Since __file__ is already at `…/_main/tools/generate_readme.py`, all + sibling data files are in the same directory — we just strip the leading + package prefix from rel_path. + """ + # Preferred: RUNFILES_DIR is set when run as a data dep of another target. + runfiles_dir = os.environ.get("RUNFILES_DIR") + if runfiles_dir: + return Path(runfiles_dir) / "_main" / rel_path + + # Standard: __file__ is the runfiles copy; data files are siblings. + script_dir = Path(__file__).parent # …/_main/tools/ + parts = rel_path.split("/", 1) + if len(parts) == 2 and parts[0] == "tools": + return script_dir / parts[1] + return script_dir / rel_path + + +# --------------------------------------------------------------------------- +# Sentinel injection +# --------------------------------------------------------------------------- + +def inject_section(text: str, marker: str, content: str) -> str: + """Replace the content between BEGIN/END sentinel comments for *marker*.""" + begin = f"" + end = f"" + pattern = re.compile( + rf"({re.escape(begin)}\n).*?({re.escape(end)})", + re.DOTALL, + ) + replacement = rf"\g<1>{content}\n\g<2>" + result, count = pattern.subn(replacement, text) + if count == 0: + raise ValueError(f"Sentinel markers for '{marker}' not found in template") + return result + + +# --------------------------------------------------------------------------- +# CLI help section +# --------------------------------------------------------------------------- + +def build_cli_help_section(help_root: str, help_gen: str, help_get: str) -> str: + lines = [ + "## CLI Interface", + "", + "`bazel-diff` Command", + "", + "```terminal", + help_root.rstrip(), + "```", + "", + "### `generate-hashes` command", + "", + "```terminal", + help_gen.rstrip(), + "```", + "", + "### `get-impacted-targets` command", + "", + "```terminal", + help_get.rstrip(), + "```", + ] + return "\n".join(lines) + + +# --------------------------------------------------------------------------- +# GitHub user resolution +# --------------------------------------------------------------------------- + +def github_headers() -> dict: + headers = { + "Accept": "application/vnd.github+json", + "X-GitHub-Api-Version": "2022-11-28", + "User-Agent": "bazel-diff-readme-gen", + } + token = os.environ.get("GITHUB_TOKEN") + if token: + headers["Authorization"] = f"Bearer {token}" + return headers + + +def fetch_github_email_map(repo: str) -> dict[str, dict]: + """Page through the GitHub commits API and return email -> {login, avatar_url}.""" + email_map: dict[str, dict] = {} + page = 1 + while True: + url = f"https://api.github.com/repos/{repo}/commits?per_page=100&page={page}" + req = urllib.request.Request(url, headers=github_headers()) + try: + with urllib.request.urlopen(req, timeout=15) as resp: + commits = json.load(resp) + except Exception as exc: + print(f"Warning: GitHub API error on page {page}: {exc}", file=sys.stderr) + break + + for commit in commits: + gh_author = commit.get("author") + email = (commit.get("commit") or {}).get("author", {}).get("email", "") + if gh_author and email and email not in email_map: + email_map[email] = { + "login": gh_author["login"], + "avatar_url": gh_author["avatar_url"], + } + + if len(commits) < 100: + break + page += 1 + time.sleep(0.1) + + return email_map + + +def resolve_noreply_username(email: str) -> str | None: + """Extract a GitHub username from a noreply address as a last-resort fallback.""" + if not email.endswith("@users.noreply.github.com"): + return None + local = email.split("@")[0] + # Strip leading numeric id: "12345+username" -> "username" + return local.split("+")[-1] + + +# --------------------------------------------------------------------------- +# Contributors section +# --------------------------------------------------------------------------- + +def build_contributors_section(workspace_dir: Path, email_map: dict[str, dict]) -> str: + # Collect (name, email) pairs with counts from git log. + result = subprocess.run( + ["git", "-C", str(workspace_dir), "log", "--format=%aN\t%aE"], + capture_output=True, + text=True, + check=True, + ) + + # Aggregate: per (name, email) pair, then roll up by name keeping + # the email that has the most commits for that name. + pair_counts: dict[tuple[str, str], int] = defaultdict(int) + for line in result.stdout.splitlines(): + if "\t" not in line: + continue + name, email = line.split("\t", 1) + name, email = name.strip(), email.strip() + if name: + pair_counts[(name, email)] += 1 + + # Roll up by name: sum counts, pick the email with the highest count. + name_totals: dict[str, int] = defaultdict(int) + name_best_email: dict[str, str] = {} + name_best_count: dict[str, int] = defaultdict(int) + + for (name, email), count in pair_counts.items(): + name_totals[name] += count + if count > name_best_count[name]: + name_best_count[name] = count + name_best_email[name] = email + + # Sort by total commit count descending. + sorted_authors = sorted(name_totals.items(), key=lambda x: x[1], reverse=True) + + COLUMNS = 6 + + cells = [] + for name, _total in sorted_authors: + email = name_best_email[name] + user = email_map.get(email) + + # Fallback: try to extract username from noreply address. + if not user: + login = resolve_noreply_username(email) + if login: + user = { + "login": login, + "avatar_url": f"https://avatars.githubusercontent.com/{login}", + } + + if user: + login = user["login"] + base_avatar = user["avatar_url"].split("?")[0] + avatar = base_avatar + "?s=64" + profile = f"https://github.com/{login}" + cells.append( + f'' + f'' + f'{name}
' + f'{name}' + f'
' + ) + else: + cells.append( + f'' + f'{name}' + f'' + ) + + rows = [""] + for i in range(0, len(cells), COLUMNS): + rows.append(" ") + for cell in cells[i : i + COLUMNS]: + rows.append(f" {cell}") + rows.append(" ") + rows.append("
") + + return "\n".join(rows) + + +# --------------------------------------------------------------------------- +# Main +# --------------------------------------------------------------------------- + +def main() -> None: + workspace_dir = Path(os.environ["BUILD_WORKSPACE_DIRECTORY"]) + output_path = workspace_dir / "README.md" + + template = runfile("tools/readme_template.md").read_text() + help_root = runfile("tools/help_root.txt").read_text() + help_gen = runfile("tools/help_generate_hashes.txt").read_text() + help_get = runfile("tools/help_get_impacted_targets.txt").read_text() + + print("Fetching GitHub user data...") + email_map = fetch_github_email_map(REPO) + + print("Building contributors table...") + contributors_section = build_contributors_section(workspace_dir, email_map) + + cli_help_section = build_cli_help_section(help_root, help_gen, help_get) + + readme = inject_section(template, "cli-help", cli_help_section) + readme = inject_section(readme, "contributors", contributors_section) + + output_path.write_text(readme) + print(f"README.md written to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/tools/readme_template.md b/tools/readme_template.md new file mode 100644 index 0000000..8323d36 --- /dev/null +++ b/tools/readme_template.md @@ -0,0 +1,306 @@ +# bazel-diff + +[![Build status](https://github.com/Tinder/bazel-diff/actions/workflows/ci.yaml/badge.svg?branch=master)](https://github.com/Tinder/bazel-diff/actions/workflows/ci.yaml) + +`bazel-diff` is a command line tool for Bazel projects that allows users to determine the exact affected set of impacted targets between two Git revisions. Using this set, users can test or build the exact modified set of targets. + +`bazel-diff` offers several key advantages over rolling your own target diffing solution + +1. `bazel-diff` is designed for very large Bazel projects. We use Java Protobuf's `parseDelimitedFrom` method alongside Bazel Query's `streamed_proto` output option. These two together allow you to parse Gigabyte or larger protobuf messages. We have tested it with projects containing tens of thousands of targets. +2. We avoid usage of large command line query lists when interacting with Bazel, [issue here](https://github.com/bazelbuild/bazel/issues/8609). When you interact with Bazel with thousands of query parameters you can reach an upper maximum limit, seeing this error `bash: /usr/local/bin/bazel: Argument list too long`. `bazel-diff` is smart enough to avoid these errors. +3. `bazel-diff` has been tested with file renames, deletions, and modifications. Works on `bzl` files, `WORKSPACE` files, `BUILD` files and regular files + +Track the feature request for target diffing in Bazel [here](https://github.com/bazelbuild/bazel/issues/7962) + +This approach was inspired by the [following BazelConf talk](https://www.youtube.com/watch?v=9Dk7mtIm7_A) by Benjamin Peterson. + +> There are simpler and faster ways to approximate the affected set of targets. +> However an incorrect solution can result in a system you can't trust, +> because tests could be broken at a commit where you didn't select to run them. +> Then you can't rely on green-to-red (or red-to-green) transitions and +> lose much of the value from your CI system as breakages can be discovered +> later on unrelated commits. + +## Prerequisites + +* Git +* Bazel 3.3.0 or higher +* Java 8 JDK or higher (Bazel requires this) + +## Getting Started + +To start using `bazel-diff` immediately, simply clone down the repo and then run the example shell script: + +```terminal +git clone https://github.com/Tinder/bazel-diff.git +cd bazel-diff +./bazel-diff-example.sh WORKSPACE_PATH BAZEL_PATH START_GIT_REVISION END_GIT_REVISION +``` + +Here is a breakdown of those arguments: + +* `WORKSPACE_PATH`: Path to directory containing your `WORKSPACE` file in your Bazel project. +* `BAZEL_PATH`: Path to your Bazel executable +* `START_GIT_REVISION`: Starting Git Branch or SHA for your desired commit range +* `END_GIT_REVISION`: Final Git Branch or SHA for your desired commit range + +You can see the example shell script in action below: + +![Demo](demo.gif) + +Open `bazel-diff-example.sh` to see how this is implemented. This is purely an example use-case, but it is a great starting point to using `bazel-diff`. + +## With Aspect CLI + +Aspect's Extension Language (AXL) allows the shell script above to be expressed in Starlark, and exposed as an `impacted` command on your terminal. + +See https://github.com/aspect-extensions/impacted + +## How it works + +`bazel-diff` works as follows + +* The previous revision is checked out, then we run `generate-hashes`. This gives us the hashmap representation for the entire Bazel graph, then we write this JSON to a file. + +* Next we checkout the initial revision, then we run `generate-hashes` and write that JSON to a file. Now we have our final hashmap representation for the Bazel graph. + +* We run `bazel-diff` on the starting and final JSON hash filepaths to get our impacted set of targets. This impacted set of targets is written to a file. + +## Build Graph Distance Metrics + +`bazel-diff` can optionally compute build graph distance metrics between two revisions. This is +useful for understanding the impact of a change on the build graph. Directly impacted targets are +targets that have had their rule attributes or source file dependencies changed. Indirectly impacted +targets are that are impacted only due to a change in one of their target dependencies. + +For each target, the following metrics are computed: + +* `target_distance`: The number of dependency hops that it takes to get from an impacted target to a directly impacted target. +* `package_distance`: The number of dependency hops that cross a package boundary to get from an impacted target to a directly impacted target. + +Build graph distance metrics can be used by downstream tools to power features such as: + +* Only running sanitizers on impacted tests that are in the same package as a directly impacted target. +* Only running large-sized tests that are within a few package hops of a directly impacted target. +* Only running computationally expensive jobs when an impacted target is within a certain distance of a directly impacted target. + +To enable this feature, you must generate a dependency mapping on your final revision when computing hashes, then pass it into the `get-impacted-targets` command. + +```bash +git checkout BASE_REV +bazel-diff generate-hashes -w /path/to/workspace -b bazel starting_hashes.json + +git checkout FINAL_REV +bazel-diff generate-hashes -w /path/to/workspace -b bazel --depEdgesFile deps.json final_hashes.json + +bazel-diff get-impacted-targets -w /path/to/workspace -b bazel -sh starting_hashes.json -fh final_hashes.json --depEdgesFile deps.json -o impacted_targets.json +``` + +This will produce an impacted targets json list with target label, target distance, and package distance: + +```text +[ + {"label": "//foo:bar", "targetDistance": 0, "packageDistance": 0}, + {"label": "//foo:baz", "targetDistance": 1, "packageDistance": 0}, + {"label": "//bar:qux", "targetDistance": 1, "packageDistance": 1} +] +``` + + + + +### What does the SHA256 value of `generate-hashes` represent? + +`generate-hashes` is a canonical SHA256 value representing all attributes and inputs into a target. These inputs +are the summation of the rule implementation hash, the SHA256 value +for every attribute of the rule and then the summation of the SHA256 value for +all `rule_inputs` using the same exact algorithm. For source_file inputs the +content of the file are converted into a SHA256 value. + +## Installing + +### Integrate into your project (recommended) + +First, add the following snippet to your project: + +#### Bzlmod snippet + +```bazel +bazel_dep(name = "bazel-diff", version = "17.1.0") +``` + +You can now run the tool with: + +```terminal +bazel run @bazel-diff//cli:bazel-diff +``` + +#### WORKSPACE snippet + +```bazel +http_jar = use_repo_rule("@bazel_tools//tools/build_defs/repo:http.bzl", "http_jar") +http_jar( + name = "bazel-diff", + urls = [ + "https://github.com/Tinder/bazel-diff/releases/download/7.0.0/bazel-diff_deploy.jar" + ], + sha256 = "0b9e32f9c20e570846b083743fe967ae54d13e2a1f7364983e0a7792979442be", +) +``` + +Second, add in your root `BUILD.bazel` file: + +```bazel +load("@rules_java//java:defs.bzl", "java_binary") + +java_binary( + name = "bazel-diff", + main_class = "com.bazel_diff.Main", + runtime_deps = ["@bazel-diff//jar"], +) +``` + +That's it! You can now run the tool with: + +```terminal +bazel run //:bazel-diff +``` + +> Note, in releases prior to 2.0.0 the value for the `main_class` attribute is just `BazelDiff` + +### Run Via JAR Release + +```terminal +curl -Lo bazel-diff.jar https://github.com/Tinder/bazel-diff/releases/latest/download/bazel-diff_deploy.jar +java -jar bazel-diff.jar -h +``` + +### Build from Source + +After cloning down the repo, you are good to go, Bazel will handle the rest + +To run the project + +```terminal +bazel run :bazel-diff -- bazel-diff -h +``` + +#### Debugging (when running from source) + +To run `bazel-diff` with debug logging, run your commands with the `verbose` config like so: + +```terminal +bazel run :bazel-diff --config=verbose -- bazel-diff -h +``` + +### Build your own deployable JAR + +```terminal +bazel build //cli:bazel-diff_deploy.jar +java -jar bazel-bin/cli/bazel-diff_deploy.jar # This JAR can be run anywhere +``` + +### Build from source in your Bazel Project + +Add the following to your `WORKSPACE` file to add the external repositories, replacing the `RELEASE_ARCHIVE_URL` with the archive url of the bazel-diff release you wish to depend on: + +```bazel +load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") + +http_archive( + name = "bazel-diff", + urls = [ + "RELEASE_ARCHIVE_URL", + ], + sha256 = "UPDATE_ME", + strip_prefix = "UPDATE_ME" +) + +load("@bazel-diff//:repositories.bzl", "bazel_diff_dependencies") + +bazel_diff_dependencies() + +load("@rules_jvm_external//:defs.bzl", "maven_install") +load("@bazel-diff//:artifacts.bzl", "BAZEL_DIFF_MAVEN_ARTIFACTS") + +maven_install( + name = "bazel_diff_maven", + artifacts = BAZEL_DIFF_MAVEN_ARTIFACTS, + repositories = [ + "http://uk.maven.org/maven2", + "https://jcenter.bintray.com/", + ], +) +``` + +Now you can simply run `bazel-diff` from your project: + +```terminal +bazel run @bazel-diff//cli:bazel-diff -- bazel-diff -h +``` + +## Contributors + + + + +## Learn More + +Take a look at the following bazelcon talks to learn more about `bazel-diff`: + +* [BazelCon 2023: Improving CI efficiency with Bazel querying and bazel-diff](https://www.youtube.com/watch?v=QYAbmE_1fSo) +* [BazelCon 2024: Not Going the Distance: Filtering Tests by Build Graph Distance](https://youtu.be/Or0o0Q7Zc1w?si=nIIkTH6TP-pcPoRx) + +## Star History + + + + + + Star History Chart + + + +## Running the tests + +To run the tests simply run + +```terminal +bazel test //... +``` + +## Versioning + +We use [SemVer](http://semver.org/) for versioning. For the versions available, +see the [tags on this repository](https://github.com/Tinder/bazel-diff/tags). + +## License + +--- + +```text +Copyright (c) 2020, Match Group, LLC +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + * Neither the name of Match Group, LLC nor the names of its contributors + may be used to endorse or promote products derived from this software + without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL MATCH GROUP, LLC BE LIABLE FOR ANY +DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +```