diff --git a/README.md b/README.md index 6891a425..67c7a6d5 100644 --- a/README.md +++ b/README.md @@ -135,7 +135,9 @@ jsdiff's diff functions all take an old text and a new text and perform three st * `formatPatch(patch[, headerOptions])` - creates a unified diff patch. - `patch` may be either a single structured patch object (as returned by `structuredPatch`) or an array of them (as returned by `parsePatch`). The optional `headerOptions` argument behaves the same as the `headerOptions` option of `createTwoFilesPatch`. + `patch` may be either a single structured patch object (as returned by `structuredPatch`) or an array of them (as returned by `parsePatch`). The optional `headerOptions` argument behaves the same as the `headerOptions` option of `createTwoFilesPatch`, except that it is ignored for Git patches (i.e. patches where `isGit` is `true`). + + When a patch has `isGit: true`, `formatPatch` output is changed to more closely match Git's output: it emits a `diff --git` header, emits Git extended headers as appropriate based on properties like `isRename`, `isCreate`, `newMode`, etc, and always emits `---`/`+++` file headers when hunks are present but omits them when there are no hunks (e.g. renames without content changes). The `headerOptions` parameter has no effect on Git patches since the header format is fully determined by the Git extended header properties. * `structuredPatch(oldFileName, newFileName, oldStr, newStr[, oldHeader[, newHeader[, options]]])` - returns an object with an array of hunk objects. @@ -184,14 +186,26 @@ jsdiff's diff functions all take an old text and a new text and perform three st Once all patches have been applied or an error occurs, the `options.complete(err)` callback is made. -* `parsePatch(diffStr)` - Parses a patch into structured data +* `parsePatch(diffStr)` - Parses a unified diff format patch into a structured patch object. + + Returns a JSON object representation of the patch, suitable for use with the `applyPatch` method. This parses to the same structure returned by `structuredPatch`, except that `oldFileName` and `newFileName` may be `undefined` if the patch doesn't contain enough information to determine them (e.g. a hunk-only patch with no file headers). - Return a JSON object representation of the patch, suitable for use with the `applyPatch` method. This parses to the same structure returned by `structuredPatch`. + `parsePatch` has some understanding of [Git's particular dialect of unified diff format](https://git-scm.com/docs/git-diff#generate_patch_text_with_p). When parsing a Git patch, each index in the result may contain the following additional fields not included in the data structure returned by `structuredPatch`: + - `isGit` - set to `true` when parsing from a Git-style patch. + - `isRename` - set to `true` when parsing a Git diff that includes `rename from`/`rename to` extended headers, indicating the file was renamed (and the old file no longer exists). Consumers applying the patch should delete the old file. + - `isCopy` - set to `true` when parsing a Git diff that includes `copy from`/`copy to` extended headers, indicating the file was copied (and the old file still exists). Consumers applying the patch should NOT delete the old file. + - `isCreate` - set to `true` when parsing a Git diff that includes a `new file mode` extended header, indicating the file was newly created. + - `isDelete` - set to `true` when parsing a Git diff that includes a `deleted file mode` extended header, indicating the file was deleted. + - `oldMode` - the file mode (e.g. `'100644'`, `'100755'`) of the old file, parsed from Git extended headers (`old mode` or `deleted file mode`). + - `newMode` - the file mode (e.g. `'100644'`, `'100755'`) of the new file, parsed from Git extended headers (`new mode` or `new file mode`). + - `isBinary` - set to `true` when parsing a Git diff that includes a `Binary files ... differ` line, indicating a binary file change. Binary patches have no hunks, so the patch content alone is not sufficient to apply the change; consumers should handle this case specially (e.g. by warning the user or fetching the binary content separately). * `reversePatch(patch)` - Returns a new structured patch which when applied will undo the original `patch`. `patch` may be either a single structured patch object (as returned by `structuredPatch`) or an array of them (as returned by `parsePatch`). + When `patch` is a Git-style patch, `reversePatch` handles extended header information (relating to renames, file modes, etc.) to the extent that doing so is possible, but note one fundamental limitation: the correct inverse of a patch featuring `copy from`/`copy to` headers cannot, in general, be determined based on the information contained in the patch alone, and so `reversePatch`'s output when passed such a patch will usually be rejected by `git apply`. (The correct inverse would be a patch that deletes the newly-created file, but for Git to apply such a patch, the patch must explicitly delete every line of content in the file too, and that content cannot be determined from the original patch on its own. `reversePatch` therefore does the only vaguely reasonable thing it can do in this scenario: it outputs a patch with a `deleted file mode` header - indicating that the file should be deleted - but no hunks.) + * `convertChangesToXML(changes)` - converts a list of change objects to a serialized XML format * `convertChangesToDMP(changes)` - converts a list of change objects to the format returned by Google's [diff-match-patch](https://github.com/google/diff-match-patch) library @@ -360,6 +374,70 @@ applyPatches(patch, { }); ``` +##### Applying a multi-file Git patch that may include renames and mode changes + +[Git patches](https://git-scm.com/docs/git-diff#generate_patch_text_with_p) can include file renames and copies (with or without content changes), which need to be handled in the callbacks you provide to `applyPatches`. `parsePatch` sets `isRename` or `isCopy` on the structured patch object so you can distinguish these cases. Patches can also potentially include file *swaps* (renaming `a → b` and `b → a`), in which case it is incorrect to simply apply each change atomically in sequence. The pattern with the `pendingWrites` Map below handles all of these nuances: + +``` +const {applyPatches} = require('diff'); +const patch = fs.readFileSync("git-diff.patch").toString(); +const DELETE = Symbol('delete'); +const pendingWrites = new Map(); // filePath → {content, mode} or DELETE sentinel +applyPatches(patch, { + loadFile: (patch, callback) => { + if (patch.isCreate) { + // Newly created file — no old content to load + callback(undefined, ''); + return; + } + try { + // Git diffs use a/ and b/ prefixes; strip them to get the real path + const filePath = patch.oldFileName.replace(/^a\//, ''); + callback(undefined, fs.readFileSync(filePath).toString()); + } catch (e) { + callback(`No such file: ${patch.oldFileName}`); + } + }, + patched: (patch, patchedContent, callback) => { + if (patchedContent === false) { + callback(`Failed to apply patch to ${patch.oldFileName}`); + return; + } + const oldPath = patch.oldFileName.replace(/^a\//, ''); + const newPath = patch.newFileName.replace(/^b\//, ''); + if (patch.isDelete) { + if (!pendingWrites.has(oldPath)) { + pendingWrites.set(oldPath, DELETE); + } + } else { + pendingWrites.set(newPath, {content: patchedContent, mode: patch.newMode}); + // For renames, delete the old file (but not for copies, + // where the old file should be kept) + if (patch.isRename && !pendingWrites.has(oldPath)) { + pendingWrites.set(oldPath, DELETE); + } + } + callback(); + }, + complete: (err) => { + if (err) { + console.log("Failed with error:", err); + return; + } + for (const [filePath, entry] of pendingWrites) { + if (entry === DELETE) { + fs.unlinkSync(filePath); + } else { + fs.writeFileSync(filePath, entry.content); + if (entry.mode) { + fs.chmodSync(filePath, entry.mode.slice(-3)); + } + } + } + } +}); +``` + ## Compatibility jsdiff should support all ES5 environments. If you find one that it doesn't support, please [open an issue](https://github.com/kpdecker/jsdiff/issues). diff --git a/karma.conf.js b/karma.conf.js index 82c09b40..1f28cf87 100644 --- a/karma.conf.js +++ b/karma.conf.js @@ -7,6 +7,11 @@ export default function(config) { files: [ 'test/**/*.js' ], + exclude: [ + // The code being tested by this suite heavily involves Node.js + // filesystem operations, so doesn't make sense to run in a browser: + 'test/patch/readme-rename-example.js' + ], preprocessors: { 'test/**/*.js': ['webpack', 'sourcemap'] }, diff --git a/release-notes.md b/release-notes.md index 00a27708..881e0557 100644 --- a/release-notes.md +++ b/release-notes.md @@ -1,5 +1,45 @@ # Release Notes +## 9.0.0 (prerelease) + +(All changes part of PR [#672](https://github.com/kpdecker/jsdiff/pull/672).) + +- **C-style quoted strings in filename headers are now properly supported**. + + When the name of either the old or new file in a patch contains "special characters", both GNU `diff` and Git quote the filename in the patch's headers and escape special characters using the same escape sequences that are used in string literals in C, including octal escapes for all non-ASCII characters. Previously, jsdiff had very little support for this; `parsePatch` would remove the quotes, and unescape any escaped backslashes, but would not unescape other escape sequences. `formatPatch`, meanwhile, did not quote or escape special characters at all. + + Now, `parsePatch` parses all the possible escape sequences that GNU diff (or Git) ever output, and `formatPatch` quotes and escapes filenames containing special characters in the same way GNU diff does. + +- **`formatPatch` now omits file headers when `oldFileName` or `newFileName` in the provided patch object are `undefined`**, regardless of the `headerOptions` parameter. (Previously, it would treat the absence of `oldFileName` or `newFileName` as indicating the filename was the word "undefined" and emit headers `--- undefined` / `+++ undefined`.) + +- **`formatPatch` no longer outputs trailing tab characters at the end of `---`/`+++` headers.** + + Previously, if `formatPatch` was passed a patch object to serialize that had empty strings for the `oldHeader` or `newHeader` property, it would include a trailing tab character after the filename in the `---` and/or `+++` file header. Now, this scenario is treated the same as when `oldHeader`/`newHeader` is `undefined` - i.e. the trailing tab is omitted. + +- **`formatPatch` no longer mutates its input** when serializing a patch containing a hunk where either the old or new content contained zero lines. (Such a hunk occurs only when the hunk has no context lines and represents a pure insertion or pure deletion, which for instance will occur whenever one of the two files being diffed is completely empty.) Previously `formatPatch` would provide the correct output but also mutate the `oldLines` or `newLines` property on the hunk, changing the meaning of the underlying patch. + +- **Git-style patches are now supported by `parsePatch`, `formatPatch`, and `reversePatch`**. + + Patches output by `git diff` can include some features that are unlike those output by GNU `diff`, and therefore not handled by an ordinary unified diff format parser. An ordinary diff simply describes the differences between the *content* of two files, but Git diffs can also indicate, via "extended headers", the creation or deletion of (potentially empty) files, indicate that a file was renamed, and contain information about file mode changes. Furthermore, when these changes appear in a diff in the absence of a content change (e.g. when an empty file is created, or a file is renamed without content changes), the patch will contain no associated `---`/`+++` file headers nor any hunks. + + jsdiff previously did not support parsing Git's extended headers, nor hunkless patches. Now `parsePatch` parses some of the extended headers, parses hunkless Git patches, and can determine filenames (e.g. from the extended headers) when parsing a patch that includes no `---` or `+++` file headers. The additional information conveyed by the extended headers we support is recorded on new fields on the result object returned by `parsePatch`. See `isGit` and subsequent properties in the docs in the README.md file. + + `formatPatch` now outputs extended headers based on these new Git-specific properties, and `reversePatch` respects them as far as possible (with one unavoidable caveat noted in the README.md file). + +- **Unpaired file headers now cause `parsePatch` to throw**. + + It remains acceptable to have a patch with no file headers whatsoever (e.g. one that begins with a `@@` hunk header on the very first line), but a patch with *only* a `---` header or only a `+++` header is now considered an error. + +- **`parsePatch` is now more tolerant of "trailing garbage"** + + That is: after a patch, or between files/indexes in a patch, it is now acceptable to have arbitrary lines of "garbage" (so long as they unambiguously have no syntactic meaning - e.g. trailing garbage that leads with a `+`, `-`, or ` ` and thus is interpretable as part of a hunk still triggers a throw). + + This means we no longer reject patches output by tools that include extra data in "garbage" lines not understood by generic unified diff parsers. (For example, SVN patches can include "Property changes on:" lines that generic unified diff parsers should discard as garbage; jsdiff previously threw errors when encountering them.) + + This change brings jsdiff's behaviour more in line with GNU `patch`, which is highly permissive of "garbage". + +- **The `oldFileName` and `newFileName` fields of `StructuredPatch` are now typed as `string | undefined` instead of `string`**. This type change reflects the (pre-existing) reality that `parsePatch` can produce patches without filenames (e.g. when parsing a patch that simply contains hunks with no file headers). + ## 8.0.4 - [#667](https://github.com/kpdecker/jsdiff/pull/667) - **fix another bug in `diffWords` when used with an `Intl.Segmenter`**. If the text to be diffed included a combining mark after a whitespace character (i.e. roughly speaking, an accented space), `diffWords` would previously crash. Now this case is handled correctly. diff --git a/src/patch/create.ts b/src/patch/create.ts index 13c2a2b1..14da369c 100644 --- a/src/patch/create.ts +++ b/src/patch/create.ts @@ -1,6 +1,76 @@ import {diffLines} from '../diff/line.js'; import type { StructuredPatch, DiffLinesOptionsAbortable, DiffLinesOptionsNonabortable, AbortableDiffOptions, ChangeObject } from '../types.js'; +/** + * Returns true if the filename contains characters that require C-style + * quoting (as used by Git and GNU diffutils in diff output). + */ +function needsQuoting(s: string): boolean { + for (let i = 0; i < s.length; i++) { + if (s[i] < '\x20' || s[i] > '\x7e' || s[i] === '"' || s[i] === '\\') { + return true; + } + } + return false; +} + +/** + * C-style quotes a filename, encoding special characters as escape sequences + * and non-ASCII bytes as octal escapes. This is the inverse of + * `parseQuotedFileName` in parse.ts. + * + * Non-ASCII bytes are encoded as UTF-8 before being emitted as octal escapes. + * This matches the behaviour of both Git and GNU diffutils, which always emit + * UTF-8 octal escapes regardless of the underlying filesystem encoding (e.g. + * Git for Windows converts from NTFS's UTF-16 to UTF-8 internally). + * + * If the filename doesn't need quoting, returns it as-is. + */ +function quoteFileNameIfNeeded(s: string): string { + if (!needsQuoting(s)) { + return s; + } + + let result = '"'; + const bytes = new TextEncoder().encode(s); + let i = 0; + while (i < bytes.length) { + const b = bytes[i]; + + // See https://en.wikipedia.org/wiki/Escape_sequences_in_C#Escape_sequences + if (b === 0x07) { + result += '\\a'; + } else if (b === 0x08) { + result += '\\b'; + } else if (b === 0x09) { + result += '\\t'; + } else if (b === 0x0a) { + result += '\\n'; + } else if (b === 0x0b) { + result += '\\v'; + } else if (b === 0x0c) { + result += '\\f'; + } else if (b === 0x0d) { + result += '\\r'; + } else if (b === 0x22) { + result += '\\"'; + } else if (b === 0x5c) { + result += '\\\\'; + } else if (b >= 0x20 && b <= 0x7e) { + // Just a printable ASCII character that is neither a double quote nor a + // backslash; no need to escape it. + result += String.fromCharCode(b); + } else { + // Either part of a non-ASCII character or a control character without a + // special escape sequence; needs escaping as a 3-digit octal escape + result += '\\' + b.toString(8).padStart(3, '0'); + } + i++; + } + result += '"'; + return result; +} + type StructuredPatchCallbackAbortable = (patch: StructuredPatch | undefined) => void; type StructuredPatchCallbackNonabortable = (patch: StructuredPatch) => void; @@ -274,14 +344,23 @@ export function structuredPatch( /** * creates a unified diff patch. - * @param patch either a single structured patch object (as returned by `structuredPatch`) or an array of them (as returned by `parsePatch`) + * + * @param patch either a single structured patch object (as returned by `structuredPatch`) or an + * array of them (as returned by `parsePatch`). + * @param headerOptions behaves the same as the `headerOptions` option of `createTwoFilesPatch`. + * Ignored for patches where `isGit` is `true`. + * + * When a patch has `isGit: true`, `formatPatch` output is changed to more closely match Git's + * output: it emits a `diff --git` header, emits Git extended headers as appropriate based on + * properties like `isRename`, `isCreate`, `newMode`, etc, and will omit `---`/`+++` file + * headers for patches with no hunks (e.g. renames without content changes). */ export function formatPatch(patch: StructuredPatch | StructuredPatch[], headerOptions?: HeaderOptions): string { if (!headerOptions) { headerOptions = INCLUDE_HEADERS; } if (Array.isArray(patch)) { - if (patch.length > 1 && !headerOptions.includeFileHeaders) { + if (patch.length > 1 && !headerOptions.includeFileHeaders && !patch.every(p => p.isGit)) { throw new Error( 'Cannot omit file headers on a multi-file patch. ' + '(The result would be unparseable; how would a tool trying to apply ' @@ -292,15 +371,56 @@ export function formatPatch(patch: StructuredPatch | StructuredPatch[], headerOp } const ret = []; - if (headerOptions.includeIndex && patch.oldFileName == patch.newFileName) { - ret.push('Index: ' + patch.oldFileName); - } - if (headerOptions.includeUnderline) { - ret.push('==================================================================='); + + // Git patches have a fixed header format (diff --git, extended headers, + // and ---/+++ when hunks are present), so headerOptions is ignored. + if (patch.isGit) { + headerOptions = INCLUDE_HEADERS; + // Emit Git-style diff --git header and extended headers. + // Git never puts /dev/null in the "diff --git" line; for file + // creations/deletions it uses the real filename on both sides. + let gitOldName = patch.oldFileName ?? ''; + let gitNewName = patch.newFileName ?? ''; + if (patch.isCreate && gitOldName === '/dev/null') { + gitOldName = gitNewName.replace(/^b\//, 'a/'); + } else if (patch.isDelete && gitNewName === '/dev/null') { + gitNewName = gitOldName.replace(/^a\//, 'b/'); + } + ret.push('diff --git ' + quoteFileNameIfNeeded(gitOldName) + ' ' + quoteFileNameIfNeeded(gitNewName)); + if (patch.isDelete) { + ret.push('deleted file mode ' + (patch.oldMode ?? '100644')); + } + if (patch.isCreate) { + ret.push('new file mode ' + (patch.newMode ?? '100644')); + } + if (patch.oldMode && patch.newMode && !patch.isDelete && !patch.isCreate) { + ret.push('old mode ' + patch.oldMode); + ret.push('new mode ' + patch.newMode); + } + if (patch.isRename) { + ret.push('rename from ' + quoteFileNameIfNeeded((patch.oldFileName ?? '').replace(/^a\//, ''))); + ret.push('rename to ' + quoteFileNameIfNeeded((patch.newFileName ?? '').replace(/^b\//, ''))); + } + if (patch.isCopy) { + ret.push('copy from ' + quoteFileNameIfNeeded((patch.oldFileName ?? '').replace(/^a\//, ''))); + ret.push('copy to ' + quoteFileNameIfNeeded((patch.newFileName ?? '').replace(/^b\//, ''))); + } + } else { + if (headerOptions.includeIndex && patch.oldFileName == patch.newFileName && patch.oldFileName !== undefined) { + ret.push('Index: ' + patch.oldFileName); + } + if (headerOptions.includeUnderline) { + ret.push('==================================================================='); + } } - if (headerOptions.includeFileHeaders) { - ret.push('--- ' + patch.oldFileName + (typeof patch.oldHeader === 'undefined' ? '' : '\t' + patch.oldHeader)); - ret.push('+++ ' + patch.newFileName + (typeof patch.newHeader === 'undefined' ? '' : '\t' + patch.newHeader)); + + // Emit --- / +++ file headers. For Git patches with no hunks (e.g. + // pure renames, mode-only changes), Git omits these, so we do too. + const hasHunks = patch.hunks.length > 0; + if (headerOptions.includeFileHeaders && patch.oldFileName !== undefined && patch.newFileName !== undefined + && (!patch.isGit || hasHunks)) { + ret.push('--- ' + quoteFileNameIfNeeded(patch.oldFileName) + (patch.oldHeader ? '\t' + patch.oldHeader : '')); + ret.push('+++ ' + quoteFileNameIfNeeded(patch.newFileName) + (patch.newHeader ? '\t' + patch.newHeader : '')); } for (let i = 0; i < patch.hunks.length; i++) { @@ -308,15 +428,11 @@ export function formatPatch(patch: StructuredPatch | StructuredPatch[], headerOp // Unified Diff Format quirk: If the chunk size is 0, // the first number is one lower than one would expect. // https://www.artima.com/weblogs/viewpost.jsp?thread=164293 - if (hunk.oldLines === 0) { - hunk.oldStart -= 1; - } - if (hunk.newLines === 0) { - hunk.newStart -= 1; - } + const oldStart = hunk.oldLines === 0 ? hunk.oldStart - 1 : hunk.oldStart; + const newStart = hunk.newLines === 0 ? hunk.newStart - 1 : hunk.newStart; ret.push( - '@@ -' + hunk.oldStart + ',' + hunk.oldLines - + ' +' + hunk.newStart + ',' + hunk.newLines + '@@ -' + oldStart + ',' + hunk.oldLines + + ' +' + newStart + ',' + hunk.newLines + ' @@' ); for (const line of hunk.lines) { diff --git a/src/patch/parse.ts b/src/patch/parse.ts index ac6bb2b5..d894f302 100755 --- a/src/patch/parse.ts +++ b/src/patch/parse.ts @@ -1,49 +1,194 @@ import type { StructuredPatch } from '../types.js'; /** - * Parses a patch into structured data, in the same structure returned by `structuredPatch`. + * Parses a unified diff format patch into a structured patch object. * - * @return a JSON object representation of the patch, suitable for use with the `applyPatch` method. + * `parsePatch` has some understanding of Git's particular dialect of unified diff format. + * When parsing a Git patch, each index in the result may contain additional + * fields (`isRename`, `isBinary`, etc) not included in the data structure returned by + * `structuredPatch`; see the `StructuredPatch` interface for a full list. + * + * @return a JSON object representation of the patch, suitable for use with the `applyPatch` + * method. This parses to the same structure returned by `structuredPatch`, except that + * `oldFileName` and `newFileName` may be `undefined` if the patch doesn't contain enough + * information to determine them (e.g. a hunk-only patch with no file headers). */ export function parsePatch(uniDiff: string): StructuredPatch[] { const diffstr = uniDiff.split(/\n/), list: Partial[] = []; let i = 0; + // These helper functions identify line types that can appear between files + // in a multi-file patch. Keeping them in one place avoids subtle + // inconsistencies from having the same regexes duplicated in multiple places. + + // Matches `diff --git ...` lines specifically. + function isGitDiffHeader(line: string): boolean { + return (/^diff --git /).test(line); + } + + // Matches lines that denote the start of a new diff's section in a + // multi-file patch: `diff --git ...`, `Index: ...`, or `diff -r ...`. + function isDiffHeader(line: string): boolean { + return isGitDiffHeader(line) + || (/^Index:\s/).test(line) + || (/^diff(?: -r \w+)+\s/).test(line); + } + + // Matches `--- ...` and `+++ ...` file header lines. + function isFileHeader(line: string): boolean { + return (/^(---|\+\+\+)\s/).test(line); + } + + // Matches `@@ ...` hunk header lines. + function isHunkHeader(line: string): boolean { + return (/^@@\s/).test(line); + } + function parseIndex() { const index: Partial = {}; + index.hunks = []; list.push(index); // Parse diff metadata + let seenDiffHeader = false; while (i < diffstr.length) { const line = diffstr[i]; - // File header found, end parsing diff metadata - if ((/^(---|\+\+\+|@@)\s/).test(line)) { + // File header (---, +++) or hunk header (@@) found; end parsing diff metadata + if (isFileHeader(line) || isHunkHeader(line)) { break; } - // Try to parse the line as a diff header, like - // Index: README.md - // or - // diff -r 9117c6561b0b -r 273ce12ad8f1 .hgignore - // or - // Index: something with multiple words - // and extract the filename (or whatever else is used as an index name) - // from the end (i.e. 'README.md', '.hgignore', or - // 'something with multiple words' in the examples above). + // The next two branches handle recognized diff headers. Note that + // isDiffHeader deliberately does NOT match arbitrary `diff` + // commands like `diff -u -p -r1.1 -r1.2`, because in some + // formats (e.g. CVS diffs) such lines appear as metadata within + // a single file's header section, after an `Index:` line. See the + // diffx documentation (https://diffx.org) for examples. // - // TODO: It seems awkward that we indiscriminately trim off trailing - // whitespace here. Theoretically, couldn't that be meaningful - - // e.g. if the patch represents a diff of a file whose name ends - // with a space? Seems wrong to nuke it. - // But this behaviour has been around since v2.2.1 in 2015, so if - // it's going to change, it should be done cautiously and in a new - // major release, for backwards-compat reasons. - // -- ExplodingCabbage - const headerMatch = (/^(?:Index:|diff(?: -r \w+)+)\s+/).exec(line); - if (headerMatch) { - index.index = line.substring(headerMatch[0].length).trim(); + // In both branches: if we've already seen a diff header for *this* + // file and now we encounter another one, it must belong to the + // next file, so break. + + if (isGitDiffHeader(line)) { + if (seenDiffHeader) { + return; + } + seenDiffHeader = true; + index.isGit = true; + + // Parse the old and new filenames from the `diff --git` header and + // tentatively set oldFileName and newFileName from them. These may + // be overridden below by `rename from` / `rename to` or `copy from` / + // `copy to` extended headers, or by --- and +++ lines. But for Git + // diffs that lack all of those (e.g. mode-only changes, binary + // file changes without rename), these are the only filenames we + // get. + // parseGitDiffHeader returns null if the header can't be parsed + // (e.g. unterminated quoted filename, or unexpected format). In + // that case we skip setting filenames here; they may still be + // set from --- / +++ or rename from / rename to lines below. + const paths = parseGitDiffHeader(line); + if (paths) { + index.oldFileName = paths.oldFileName; + index.newFileName = paths.newFileName; + } + + // Consume Git extended headers (`old mode`, `new mode`, `rename from`, + // `rename to`, `similarity index`, `index`, `Binary files ... differ`, + // etc.) + i++; + while (i < diffstr.length) { + const extLine = diffstr[i]; + + // Stop consuming extended headers if we hit a file header, + // hunk header, or another diff header. + if (isFileHeader(extLine) || isHunkHeader(extLine) || isDiffHeader(extLine)) { + break; + } + + // Parse `rename from` / `rename to` lines - these give us + // unambiguous filenames. These lines don't include the + // a/ and b/ prefixes that appear in the `diff --git` header + // and --- / +++ lines, so we add them for consistency. + // Git C-style quotes filenames containing special characters + // (tabs, newlines, backslashes, double quotes), so we must + // unquote them when present. + const renameFromMatch = (/^rename from (.*)/).exec(extLine); + if (renameFromMatch) { + index.oldFileName = 'a/' + unquoteIfQuoted(renameFromMatch[1]); + index.isRename = true; + } + const renameToMatch = (/^rename to (.*)/).exec(extLine); + if (renameToMatch) { + index.newFileName = 'b/' + unquoteIfQuoted(renameToMatch[1]); + index.isRename = true; + } + + // Parse copy from / copy to lines similarly + const copyFromMatch = (/^copy from (.*)/).exec(extLine); + if (copyFromMatch) { + index.oldFileName = 'a/' + unquoteIfQuoted(copyFromMatch[1]); + index.isCopy = true; + } + const copyToMatch = (/^copy to (.*)/).exec(extLine); + if (copyToMatch) { + index.newFileName = 'b/' + unquoteIfQuoted(copyToMatch[1]); + index.isCopy = true; + } + + const newFileModeMatch = (/^new file mode (\d+)/).exec(extLine); + if (newFileModeMatch) { + index.isCreate = true; + index.newMode = newFileModeMatch[1]; + } + const deletedFileModeMatch = (/^deleted file mode (\d+)/).exec(extLine); + if (deletedFileModeMatch) { + index.isDelete = true; + index.oldMode = deletedFileModeMatch[1]; + } + const oldModeMatch = (/^old mode (\d+)/).exec(extLine); + if (oldModeMatch) { + index.oldMode = oldModeMatch[1]; + } + const newModeMatch = (/^new mode (\d+)/).exec(extLine); + if (newModeMatch) { + index.newMode = newModeMatch[1]; + } + + if ((/^Binary files /).test(extLine)) { + index.isBinary = true; + } + + i++; + } + continue; + } else if (isDiffHeader(line)) { + if (seenDiffHeader) { + return; + } + seenDiffHeader = true; + + // For Mercurial-style headers like + // diff -r 9117c6561b0b -r 273ce12ad8f1 .hgignore + // or Index: headers like + // Index: something with multiple words + // we extract the trailing filename as the index. + // + // TODO: It seems awkward that we indiscriminately trim off + // trailing whitespace here. Theoretically, couldn't that + // be meaningful - e.g. if the patch represents a diff of a + // file whose name ends with a space? Seems wrong to nuke + // it. But this behaviour has been around since v2.2.1 in + // 2015, so if it's going to change, it should be done + // cautiously and in a new major release, for + // backwards-compat reasons. + // -- ExplodingCabbage + const headerMatch = (/^(?:Index:|diff(?: -r \w+)+)\s+/).exec(line); + if (headerMatch) { + index.index = line.substring(headerMatch[0].length).trim(); + } } i++; @@ -54,23 +199,216 @@ export function parsePatch(uniDiff: string): StructuredPatch[] { parseFileHeader(index); parseFileHeader(index); - // Parse hunks - index.hunks = []; + // If we got one file header but not the other, that's a malformed patch. + if ((index.oldFileName === undefined) !== (index.newFileName === undefined)) { + throw new Error( + 'Missing ' + (index.oldFileName !== undefined ? '"+++ ..."' : '"--- ..."') + + ' file header for ' + (index.oldFileName ?? index.newFileName) + ); + } while (i < diffstr.length) { const line = diffstr[i]; - if ((/^(Index:\s|diff\s|---\s|\+\+\+\s|===================================================================)/).test(line)) { + if (isDiffHeader(line) || isFileHeader(line) || (/^===================================================================/).test(line)) { break; - } else if ((/^@@/).test(line)) { + } else if (isHunkHeader(line)) { index.hunks.push(parseHunk()); - } else if (line) { - throw new Error('Unknown line ' + (i + 1) + ' ' + JSON.stringify(line)); } else { + // Skip blank lines and any other unrecognized content between + // or after hunks. Real-world examples of such content include: + // - `Only in : ` from GNU `diff -r` + // - `Property changes on:` sections from `svn diff` + // - Trailing prose or commentary in email patches + // GNU `patch` tolerates all of these, and so do we. i++; } } } + /** + * Parses the old and new filenames from a `diff --git` header line. + * + * The format is: + * diff --git a/ b/ + * + * When filenames contain special characters (including newlines, tabs, + * backslashes, or double quotes), Git quotes them with C-style escaping: + * diff --git "a/file\twith\ttabs.txt" "b/file\twith\ttabs.txt" + * + * When filenames don't contain special characters and the old and new names + * are the same, we can unambiguously split on ` b/` by finding where the + * two halves (including their a/ and b/ prefixes) yield matching bare names. + * + * A pathological case exists in which we cannot reliably determine the paths + * from the `diff --git` header. This case is when the following are true: + * - the old and new file paths differ + * - they are both unquoted (i.e. contain no special characters) + * - at least one of the underlying file paths includes the substring ` b/` + * In this scenario, we do not know which occurrence of ` b/` indicates the + * start of the new file path, so the header is inherently ambiguous. We thus + * select a possible interpretation arbitrarily and return that. + * + * Fortunately, this ambiguity should never matter, because in any patch + * genuinely output by Git in which this pathological scenario occurs, there + * must also be `rename from`/`rename to` or `copy from`/`copy to` extended + * headers present below the `diff --git` header. `parseIndex` will parse + * THOSE headers, from which we CAN unambiguously determine the filenames, + * and will discard the result returned by this function. + * + * Returns null if the header can't be parsed at all — e.g. a quoted filename + * has an unterminated quote, or if the unquoted header doesn't match the + * expected `a/... b/...` format. In that case, the caller (parseIndex) + * skips setting oldFileName/newFileName from this header, but they may + * still be set later from `---`/`+++` lines or `rename from`/`rename to` + * extended headers; if none of those are present either, they'll remain + * undefined in the output. + */ + function parseGitDiffHeader(line: string): { oldFileName: string, newFileName: string } | null { + // Strip the "diff --git " prefix + const rest = line.substring('diff --git '.length); + + // Handle quoted paths: "a/path" "b/path" + // Git quotes paths when they contain characters like newlines, tabs, + // backslashes, or double quotes (but notably not spaces). + if (rest.startsWith('"')) { + const oldPath = parseQuotedFileName(rest); + if (oldPath === null) { return null; } + const afterOld = rest.substring(oldPath.rawLength + 1); // +1 for space + let newFileName: string; + if (afterOld.startsWith('"')) { + const newPath = parseQuotedFileName(afterOld); + if (newPath === null) { return null; } + newFileName = newPath.fileName; + } else { + newFileName = afterOld; + } + return { + oldFileName: oldPath.fileName, + newFileName + }; + } + + // Check if the second path is quoted + // e.g. diff --git a/simple "b/renamed\nnewline.txt" + const quoteIdx = rest.indexOf('"'); + if (quoteIdx > 0) { + const oldFileName = rest.substring(0, quoteIdx - 1); + const newPath = parseQuotedFileName(rest.substring(quoteIdx)); + if (newPath === null) { return null; } + return { + oldFileName, + newFileName: newPath.fileName + }; + } + + // Unquoted paths. Try to find the split point. + // The format is: a/ b/ + // + // Note the potential ambiguity caused by the possibility of the file paths + // themselves containing the substring ` b/`, plus the pathological case + // described in the comment above. + // + // Strategy: find all occurrences of " b/" and split on the middle + // one. When old and new names are the same (which is the only case where + // we can't rely on extended headers later in the patch so HAVE to get + // this right), this will always be the correct split. + if (rest.startsWith('a/')) { + const splits = []; + let idx = 0; + while (true) { + idx = rest.indexOf(' b/', idx + 1); + if (idx === -1) { break; } + splits.push(idx); + } + if (splits.length > 0) { + const mid = splits[Math.floor(splits.length / 2)]; + return { + oldFileName: rest.substring(0, mid), + newFileName: rest.substring(mid + 1) + }; + } + } + + // Fallback: can't parse, return null + return null; + } + + /** + * If `s` starts with a double quote, unquotes it using C-style escape + * rules (as used by Git). Otherwise returns `s` as-is. + */ + function unquoteIfQuoted(s: string): string { + if (s.startsWith('"')) { + const parsed = parseQuotedFileName(s); + if (parsed) { + return parsed.fileName; + } + } + return s; + } + + /** + * Parses a C-style quoted filename as used by Git or GNU `diff -u`. + * Returns the unescaped filename and the raw length consumed (including quotes). + */ + function parseQuotedFileName(s: string): { fileName: string, rawLength: number } | null { + if (!s.startsWith('"')) { return null; } + let result = ''; + let j = 1; // skip opening quote + while (j < s.length) { + if (s[j] === '"') { + return { fileName: result, rawLength: j + 1 }; + } + if (s[j] === '\\' && j + 1 < s.length) { + j++; + switch (s[j]) { + case 'a': result += '\x07'; break; + case 'b': result += '\b'; break; + case 'f': result += '\f'; break; + case 'n': result += '\n'; break; + case 'r': result += '\r'; break; + case 't': result += '\t'; break; + case 'v': result += '\v'; break; + case '\\': result += '\\'; break; + case '"': result += '"'; break; + case '0': case '1': case '2': case '3': + case '4': case '5': case '6': case '7': { + // C-style octal escapes represent raw bytes. Collect + // consecutive octal-escaped bytes and decode as UTF-8. + // Validate that we have a full 3-digit octal escape + if (j + 2 >= s.length || s[j + 1] < '0' || s[j + 1] > '7' || s[j + 2] < '0' || s[j + 2] > '7') { + return null; + } + const bytes = [parseInt(s.substring(j, j + 3), 8)]; + j += 3; + while (s[j] === '\\' && s[j + 1] >= '0' && s[j + 1] <= '7') { + if (j + 3 >= s.length || s[j + 2] < '0' || s[j + 2] > '7' || s[j + 3] < '0' || s[j + 3] > '7') { + return null; + } + bytes.push(parseInt(s.substring(j + 1, j + 4), 8)); + j += 4; + } + result += new TextDecoder('utf-8').decode(new Uint8Array(bytes)); + continue; // j already points at the next character + } + // Note that in C, there are also three kinds of hex escape sequences: + // - \xhh + // - \uhhhh + // - \Uhhhhhhhh + // We do not bother to parse them here because, so far as we know, + // they are never emitted by any tools that generate unified diff + // format diffs, and so for now jsdiff does not consider them legal. + default: return null; + } + } else { + result += s[j]; + } + j++; + } + // Unterminated quote + return null; + } + // Parses the --- and +++ headers, if none are found, no lines // are consumed. function parseFileHeader(index: Partial) { @@ -79,9 +417,11 @@ export function parsePatch(uniDiff: string): StructuredPatch[] { const prefix = fileHeaderMatch[1], data = diffstr[i].substring(3).trim().split('\t', 2), header = (data[1] || '').trim(); - let fileName = data[0].replace(/\\\\/g, '\\'); - if (fileName.startsWith('"') && fileName.endsWith('"')) { - fileName = fileName.substr(1, fileName.length - 2); + let fileName = data[0]; + if (fileName.startsWith('"')) { + fileName = unquoteIfQuoted(fileName); + } else { + fileName = fileName.replace(/\\\\/g, '\\'); } if (prefix === '---') { index.oldFileName = fileName; @@ -160,6 +500,20 @@ export function parsePatch(uniDiff: string): StructuredPatch[] { throw new Error('Removed line count did not match for hunk at line ' + (chunkHeaderIndex + 1)); } + // Check for extra hunk-body-like lines after the declared line counts + // were exhausted. If the very next line starts with ' ', '+', or '-', + // the hunk's line counts were probably wrong — unless it's a file + // header (--- or +++), which legitimately appears immediately after a + // hunk in multi-file diffs without Index lines. + if (i < diffstr.length && diffstr[i] && (/^[+ -]/).test(diffstr[i]) + && !isFileHeader(diffstr[i])) { + throw new Error( + 'Hunk at line ' + (chunkHeaderIndex + 1) + + ' has more lines than expected (expected ' + + hunk.oldLines + ' old lines and ' + hunk.newLines + ' new lines)' + ); + } + return hunk; } diff --git a/src/patch/reverse.ts b/src/patch/reverse.ts index 65a5abc3..f21a841c 100644 --- a/src/patch/reverse.ts +++ b/src/patch/reverse.ts @@ -1,8 +1,34 @@ import type { StructuredPatch } from '../types.js'; +function swapPrefix(fileName: string | undefined): string | undefined { + if (fileName === undefined || fileName === '/dev/null') { + return fileName; + } + if (fileName.startsWith('a/')) { + return 'b/' + fileName.slice(2); + } + if (fileName.startsWith('b/')) { + return 'a/' + fileName.slice(2); + } + return fileName; +} + /** - * @param patch either a single structured patch object (as returned by `structuredPatch`) or an array of them (as returned by `parsePatch`). - * @returns a new structured patch which when applied will undo the original `patch`. + * Returns a new structured patch which when applied will undo the original `patch`. + * + * When `patch` is a Git-style patch, `reversePatch` handles extended header information (relating + * to renames, file modes, etc.) to the extent that doing so is possible, but note one fundamental + * limitation: the correct inverse of a patch featuring `copy from`/`copy to` headers cannot, in + * general, be determined based on the information contained in the patch alone, and so + * `reversePatch`'s output when passed such a patch will usually be rejected by `git apply`. (The + * correct inverse would be a patch that deletes the newly-created file, but for Git to apply such + * a patch, the patch must explicitly delete every line of content in the file too, and that + * content cannot be determined from the original patch on its own. `reversePatch` therefore does + * the only vaguely reasonable thing it can do in this scenario: it outputs a patch with a + * `deleted file mode` header - indicating that the file should be deleted - but no hunks.) + * + * @param patch either a single structured patch object (as returned by `structuredPatch`) or an + * array of them (as returned by `parsePatch`). */ export function reversePatch(structuredPatch: StructuredPatch): StructuredPatch; export function reversePatch(structuredPatch: StructuredPatch[]): StructuredPatch[]; @@ -13,12 +39,16 @@ export function reversePatch(structuredPatch: StructuredPatch | StructuredPatch[ return structuredPatch.map(patch => reversePatch(patch)).reverse(); } - return { + const reversed: StructuredPatch = { ...structuredPatch, - oldFileName: structuredPatch.newFileName, + oldFileName: structuredPatch.isGit ? swapPrefix(structuredPatch.newFileName) : structuredPatch.newFileName, oldHeader: structuredPatch.newHeader, - newFileName: structuredPatch.oldFileName, + newFileName: structuredPatch.isGit ? swapPrefix(structuredPatch.oldFileName) : structuredPatch.oldFileName, newHeader: structuredPatch.oldHeader, + oldMode: structuredPatch.newMode, + newMode: structuredPatch.oldMode, + isCreate: structuredPatch.isDelete, + isDelete: structuredPatch.isCreate, hunks: structuredPatch.hunks.map(hunk => { return { oldLines: hunk.newLines, @@ -33,4 +63,31 @@ export function reversePatch(structuredPatch: StructuredPatch | StructuredPatch[ }; }) }; + + if (structuredPatch.isCopy) { + // Reversing a copy means deleting the file that was created by the copy. + // The "old" file in the reversed patch is the copy destination (which + // exists and should be removed), and the "new" file is /dev/null. + // + // Note: we clear the hunks because the original copy's hunks describe + // the diff between the source and destination, not the full content of + // the destination file, so they can't be meaningfully reversed into a + // deletion hunk. This means the resulting patch is not something + // `git apply` will accept (it requires deletion patches to include a + // hunk removing every line). Producing a correct deletion hunk would + // require knowing the full content of the copy destination, which we + // don't have. Consumers that need a `git apply`-compatible patch will + // need to resolve the full file content themselves. + reversed.newFileName = '/dev/null'; + reversed.newHeader = undefined; + reversed.isDelete = true; + delete reversed.isCreate; + delete reversed.isCopy; + delete reversed.isRename; + reversed.hunks = []; + } + // Reversing a rename is just a rename in the opposite direction; + // isRename stays set and the filenames are already swapped above. + + return reversed; } diff --git a/src/types.ts b/src/types.ts index 1ae11370..b5a843d1 100644 --- a/src/types.ts +++ b/src/types.ts @@ -225,12 +225,59 @@ export type AllDiffOptions = DiffJsonOptions; export interface StructuredPatch { - oldFileName: string, - newFileName: string, + oldFileName: string | undefined, + newFileName: string | undefined, oldHeader: string | undefined, newHeader: string | undefined, hunks: StructuredPatchHunk[], index?: string, + /** + * Set to true when the patch was parsed from a Git-style diff (one with a + * `diff --git` header). Controls whether `formatPatch` emits a `diff --git` + * header (instead of `Index:` / underline headers) when formatting the patch. + */ + isGit?: boolean, + /** + * Set to true when parsing a Git diff that includes `rename from`/`rename to` + * extended headers, indicating the file was renamed (and the old file no + * longer exists). Consumers applying the patch should delete the old file. + */ + isRename?: boolean, + /** + * Set to true when parsing a Git diff that includes `copy from`/`copy to` + * extended headers, indicating the file was copied (and the old file still + * exists). Consumers applying the patch should NOT delete the old file. + */ + isCopy?: boolean, + /** + * Set to true when parsing a Git diff that includes a `new file mode` extended + * header, indicating the file was newly created. + */ + isCreate?: boolean, + /** + * Set to true when parsing a Git diff that includes a `deleted file mode` + * extended header, indicating the file was deleted. + */ + isDelete?: boolean, + /** + * The file mode (e.g. `'100644'`, `'100755'`) of the old file, parsed from + * Git extended headers (`old mode` or `deleted file mode`). + */ + oldMode?: string, + /** + * The file mode (e.g. `'100644'`, `'100755'`) of the new file, parsed from + * Git extended headers (`new mode` or `new file mode`). + */ + newMode?: string, + /** + * Set to true when parsing a Git diff that includes a + * `Binary files ... differ` line, indicating a binary file change. + * Binary patches have no hunks, so the patch content alone is not + * sufficient to apply the change; consumers should handle this case + * specially (e.g. by warning the user or fetching the binary content + * separately). + */ + isBinary?: boolean, } export interface StructuredPatchHunk { diff --git a/test/patch/create.js b/test/patch/create.js index 204818d8..d6b12c18 100644 --- a/test/patch/create.js +++ b/test/patch/create.js @@ -1175,6 +1175,252 @@ describe('patch/create', function() { // eslint-disable-next-line dot-notation expect(() => formatPatch(patchArray, OMIT_HEADERS)).to.throw(); }); + + it('should silently skip headers when filenames are undefined', function() { + const patchWithNoFilenames = { + oldFileName: undefined, + newFileName: undefined, + oldHeader: undefined, + newHeader: undefined, + hunks: [{ + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-old', '+new'] + }] + }; + // All header options should silently skip headers when filenames + // are undefined, rather than emitting "--- undefined" etc. + const expectedOutput = + '@@ -1,1 +1,1 @@\n' + + '-old\n' + + '+new\n'; + const expectedWithUnderline = + '===================================================================\n' + + '@@ -1,1 +1,1 @@\n' + + '-old\n' + + '+new\n'; + expect(formatPatch(patchWithNoFilenames, OMIT_HEADERS)).to.equal(expectedOutput); + expect(formatPatch(patchWithNoFilenames, FILE_HEADERS_ONLY)).to.equal(expectedOutput); + expect(formatPatch(patchWithNoFilenames, INCLUDE_HEADERS)).to.equal(expectedWithUnderline); + expect(formatPatch(patchWithNoFilenames)).to.equal(expectedWithUnderline); + }); + + it('should emit diff --git header for patches with isGit flag', function() { + const patch = { + oldFileName: 'a/file.txt', + newFileName: 'b/file.txt', + oldHeader: '', + newHeader: '', + isGit: true, + hunks: [{ + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-old', '+new'] + }] + }; + expect(formatPatch(patch)).to.equal( + 'diff --git a/file.txt b/file.txt\n' + + '--- a/file.txt\n' + + '+++ b/file.txt\n' + + '@@ -1,1 +1,1 @@\n' + + '-old\n' + + '+new\n' + ); + }); + + it('should not mutate hunk objects', function() { + const patch = { + oldFileName: 'a/file.txt', + newFileName: 'b/file.txt', + oldHeader: '', + newHeader: '', + hunks: [{ + oldStart: 1, oldLines: 0, + newStart: 1, newLines: 0, + lines: [] + }] + }; + formatPatch(patch); + expect(patch.hunks[0].oldStart).to.equal(1); + expect(patch.hunks[0].newStart).to.equal(1); + }); + + it('should ignore headerOptions for multi-file patches with isGit flag', function() { + const patches = [ + { + oldFileName: 'a/one.txt', + newFileName: 'b/one.txt', + oldHeader: '', + newHeader: '', + isGit: true, + hunks: [{ + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-a', '+b'] + }] + }, + { + oldFileName: 'a/two.txt', + newFileName: 'b/two.txt', + oldHeader: '', + newHeader: '', + isGit: true, + hunks: [{ + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-c', '+d'] + }] + } + ]; + const expected = + 'diff --git a/one.txt b/one.txt\n' + + '--- a/one.txt\n' + + '+++ b/one.txt\n' + + '@@ -1,1 +1,1 @@\n' + + '-a\n' + + '+b\n' + + '\n' + + 'diff --git a/two.txt b/two.txt\n' + + '--- a/two.txt\n' + + '+++ b/two.txt\n' + + '@@ -1,1 +1,1 @@\n' + + '-c\n' + + '+d\n'; + // All three headerOptions values should produce identical output; + // Git patches are self-delimiting via diff --git headers, so + // OMIT_HEADERS should also not throw for multi-file patches: + expect(formatPatch(patches, INCLUDE_HEADERS)).to.equal(expected); + expect(formatPatch(patches, FILE_HEADERS_ONLY)).to.equal(expected); + expect(formatPatch(patches, OMIT_HEADERS)).to.equal(expected); + }); + + it('should emit rename headers for patches with isGit and isRename', function() { + const patch = { + oldFileName: 'a/old.txt', + newFileName: 'b/new.txt', + oldHeader: undefined, + newHeader: undefined, + isGit: true, + isRename: true, + hunks: [] + }; + expect(formatPatch(patch)).to.equal( + 'diff --git a/old.txt b/new.txt\n' + + 'rename from old.txt\n' + + 'rename to new.txt\n' + ); + }); + + it('should emit copy headers for patches with isGit and isCopy', function() { + const patch = { + oldFileName: 'a/original.txt', + newFileName: 'b/copy.txt', + oldHeader: undefined, + newHeader: undefined, + isGit: true, + isCopy: true, + hunks: [] + }; + expect(formatPatch(patch)).to.equal( + 'diff --git a/original.txt b/copy.txt\n' + + 'copy from original.txt\n' + + 'copy to copy.txt\n' + ); + }); + + it('should emit new file mode header for patches with isGit and isCreate', function() { + const patch = { + oldFileName: '/dev/null', + newFileName: 'b/newfile.txt', + oldHeader: '', + newHeader: '', + isGit: true, + isCreate: true, + hunks: [{ + oldStart: 1, oldLines: 0, + newStart: 1, newLines: 1, + lines: ['+hello'] + }] + }; + expect(formatPatch(patch)).to.equal( + 'diff --git a/newfile.txt b/newfile.txt\n' + + 'new file mode 100644\n' + + '--- /dev/null\n' + + '+++ b/newfile.txt\n' + + '@@ -0,0 +1,1 @@\n' + + '+hello\n' + ); + }); + + it('should emit deleted file mode header for patches with isGit and isDelete', function() { + const patch = { + oldFileName: 'a/doomed.txt', + newFileName: '/dev/null', + oldHeader: '', + newHeader: '', + isGit: true, + isDelete: true, + hunks: [{ + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 0, + lines: ['-goodbye'] + }] + }; + expect(formatPatch(patch)).to.equal( + 'diff --git a/doomed.txt b/doomed.txt\n' + + 'deleted file mode 100644\n' + + '--- a/doomed.txt\n' + + '+++ /dev/null\n' + + '@@ -1,1 +0,0 @@\n' + + '-goodbye\n' + ); + }); + + it('should still emit rename headers with file headers if hunks are present', function() { + const patch = { + oldFileName: 'a/old.txt', + newFileName: 'b/new.txt', + oldHeader: '', + newHeader: '', + isGit: true, + isRename: true, + hunks: [{ + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-aaa', '+bbb'] + }] + }; + expect(formatPatch(patch)).to.equal( + 'diff --git a/old.txt b/new.txt\n' + + 'rename from old.txt\n' + + 'rename to new.txt\n' + + '--- a/old.txt\n' + + '+++ b/new.txt\n' + + '@@ -1,1 +1,1 @@\n' + + '-aaa\n' + + '+bbb\n' + ); + }); + + it('should round-trip a Git rename patch through formatPatch and parsePatch', function() { + const original = { + oldFileName: 'a/old.txt', + newFileName: 'b/new.txt', + oldHeader: '', + newHeader: '', + isGit: true, + isRename: true, + hunks: [{ + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-aaa', '+bbb'] + }] + }; + const formatted = formatPatch(original); + const parsed = parsePatch(formatted); + expect(parsed).to.have.length(1); + expect(parsed[0]).to.deep.equal(original); + }); }); }); }); diff --git a/test/patch/parse.js b/test/patch/parse.js index ed604ab1..3fa88985 100644 --- a/test/patch/parse.js +++ b/test/patch/parse.js @@ -299,6 +299,217 @@ diff -r 9117c6561b0b -r 273ce12ad8f1 README ]); }); + it('should parse multi-file output from GNU `diff -u -r`', function() { + // GNU `diff -u -r` (recursive unified diff) produces a `diff -u -r ...` + // header before each file. + expect(parsePatch( +`diff -u -r old/file1.txt new/file1.txt +--- old/file1.txt\t2026-01-01 00:00:00.000000000 +0000 ++++ new/file1.txt\t2026-01-01 00:00:00.000000000 +0000 +@@ -1,3 +1,4 @@ + alpha ++beta + gamma + delta +diff -u -r old/file2.txt new/file2.txt +--- old/file2.txt\t2026-01-01 00:00:00.000000000 +0000 ++++ new/file2.txt\t2026-01-01 00:00:00.000000000 +0000 +@@ -1,3 +1,3 @@ + one +-two ++TWO + three`)) + .to.eql([{ + oldFileName: 'old/file1.txt', + oldHeader: '2026-01-01 00:00:00.000000000 +0000', + newFileName: 'new/file1.txt', + newHeader: '2026-01-01 00:00:00.000000000 +0000', + hunks: [ + { + oldStart: 1, oldLines: 3, + newStart: 1, newLines: 4, + lines: [ + ' alpha', + '+beta', + ' gamma', + ' delta' + ] + } + ] + }, { + oldFileName: 'old/file2.txt', + oldHeader: '2026-01-01 00:00:00.000000000 +0000', + newFileName: 'new/file2.txt', + newHeader: '2026-01-01 00:00:00.000000000 +0000', + hunks: [ + { + oldStart: 1, oldLines: 3, + newStart: 1, newLines: 3, + lines: [ + ' one', + '-two', + '+TWO', + ' three' + ] + } + ] + }]); + }); + + it('should tolerate `Only in` lines from GNU `diff -r`', function() { + // When comparing directories, `diff -r` emits `Only in : ` + // for files that exist on only one side. These appear between file diffs. + expect(parsePatch( +`diff -u -r old/file1.txt new/file1.txt +--- old/file1.txt\t2026-01-01 ++++ new/file1.txt\t2026-01-01 +@@ -1,2 +1,3 @@ + alpha + beta ++gamma +Only in old: removed.txt +Only in new: added.txt +diff -u -r old/file2.txt new/file2.txt +--- old/file2.txt\t2026-01-01 ++++ new/file2.txt\t2026-01-01 +@@ -1,3 +1,3 @@ + one +-two ++TWO + three`)) + .to.have.length(2); + }); + + it('should tolerate SVN property change sections between file diffs', function() { + // `svn diff` can emit `Property changes on:` sections after a file's + // hunks. These are not part of the unified diff format but appear + // interspersed with it in SVN output. + expect(parsePatch( +`Index: file.txt +=================================================================== +--- file.txt\t(revision 1) ++++ file.txt\t(working copy) +@@ -1,3 +1,4 @@ + alpha ++beta + gamma + delta + +Property changes on: file.txt +___________________________________________________________________ +Added: svn:eol-style +## -0,0 +1 ## ++native +Index: other.txt +=================================================================== +--- other.txt\t(revision 1) ++++ other.txt\t(working copy) +@@ -1 +1 @@ +-old ++new`)) + .to.eql([{ + index: 'file.txt', + oldFileName: 'file.txt', + oldHeader: '(revision 1)', + newFileName: 'file.txt', + newHeader: '(working copy)', + hunks: [ + { + oldStart: 1, oldLines: 3, + newStart: 1, newLines: 4, + lines: [ + ' alpha', + '+beta', + ' gamma', + ' delta' + ] + } + ] + }, { + index: 'other.txt', + oldFileName: 'other.txt', + oldHeader: '(revision 1)', + newFileName: 'other.txt', + newHeader: '(working copy)', + hunks: [ + { + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: [ + '-old', + '+new' + ] + } + ] + }]); + }); + + it('should tolerate trailing garbage after the last hunk', function() { + // GNU `patch` ignores trailing content that doesn't look like part of + // a patch, and so should we. + expect(parsePatch( +`--- file.txt ++++ file.txt +@@ -1,3 +1,4 @@ + alpha ++beta + gamma + delta +This is trailing garbage +More garbage here`)) + .to.eql([{ + oldFileName: 'file.txt', + oldHeader: '', + newFileName: 'file.txt', + newHeader: '', + hunks: [ + { + oldStart: 1, oldLines: 3, + newStart: 1, newLines: 4, + lines: [ + ' alpha', + '+beta', + ' gamma', + ' delta' + ] + } + ] + }]); + }); + + it('should tolerate garbage between file headers and hunks', function() { + // GNU `patch` ignores unrecognized lines between the --- / +++ headers + // and the first @@ hunk header, and so do we. + expect(parsePatch( +`--- file.txt ++++ file.txt +some garbage here +more garbage +@@ -1,3 +1,4 @@ + alpha ++beta + gamma + delta`)) + .to.eql([{ + oldFileName: 'file.txt', + oldHeader: '', + newFileName: 'file.txt', + newHeader: '', + hunks: [ + { + oldStart: 1, oldLines: 3, + newStart: 1, newLines: 4, + lines: [ + ' alpha', + '+beta', + ' gamma', + ' delta' + ] + } + ] + }]); + }); + it('should parse multiple files without the Index line', function() { expect(parsePatch( `--- from\theader1 @@ -441,6 +652,15 @@ diff -r 9117c6561b0b -r 273ce12ad8f1 README }]); }); + it('should throw if --- and +++ file headers are not paired', function() { + expect(function() { + parsePatch('Index: foo\n+++ bar\nblah'); + }).to['throw']('Missing "--- ..." file header for bar'); + expect(function() { + parsePatch('--- bar\n@@ -1 +1 @@\n-old\n+new'); + }).to['throw']('Missing "+++ ..." file header for bar'); + }); + it('should perform sanity checks on line numbers', function() { parsePatch('@@ -1 +1 @@'); @@ -452,18 +672,7 @@ diff -r 9117c6561b0b -r 273ce12ad8f1 README }).to['throw']('Removed line count did not match for hunk at line 1'); }); - it('should not throw on invalid input', function() { - expect(parsePatch('blit\nblat\nIndex: foo\nfoo')) - .to.eql([{ - hunks: [], - index: 'foo' - }]); - }); - it('should throw on invalid input', function() { - expect(function() { - parsePatch('Index: foo\n+++ bar\nblah'); - }).to['throw'](/Unknown line 3 "blah"/); - }); + it('should handle OOM case', function() { parsePatch('Index: \n===================================================================\n--- \n+++ \n@@ -1,1 +1,2 @@\n-1\n\\ No newline at end of file\n+1\n+2\n'); @@ -691,6 +900,25 @@ diff -r 9117c6561b0b -r 273ce12ad8f1 README }]); }); + it('should emit an error if a hunk has wrong line counts causing extra hunk-body lines to spill out', () => { + // The hunk declares oldLines=2, newLines=2, so the parser consumes + // ' a', '-b', '+B' and then stops. The next line ' c' starts with + // a space (context line) immediately after the hunk ended, which + // indicates the line counts were wrong. + const patchStr = `--- file.txt ++++ file.txt +@@ -1,2 +1,2 @@ + a +-b ++B + c + d + e`; + + // eslint-disable-next-line dot-notation + expect(() => {parsePatch(patchStr);}).to.throw(/Hunk at line 3 has more lines than expected/); + }); + it('should emit an error if a hunk contains an invalid line', () => { // Within a hunk, every line must either start with '+' (insertion), '-' (deletion), // ' ' (context line, i.e. not deleted nor inserted) or a backslash (for @@ -710,5 +938,619 @@ line3 // eslint-disable-next-line dot-notation expect(() => {parsePatch(patchStr);}).to.throw('Hunk at line 5 contained invalid line line3'); }); + + it('should parse a single-file `diff --git` patch', function() { + expect(parsePatch( +`diff --git a/file.txt b/file.txt +index abc1234..def5678 100644 +--- a/file.txt ++++ b/file.txt +@@ -1,3 +1,4 @@ + line1 + line2 ++line3 + line4`)) + .to.eql([{ + oldFileName: 'a/file.txt', + oldHeader: '', + newFileName: 'b/file.txt', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 3, + newStart: 1, newLines: 4, + lines: [ + ' line1', + ' line2', + '+line3', + ' line4' + ] + } + ] + }]); + }); + + it('should parse a multi-file `diff --git` patch', function() { + expect(parsePatch( +`diff --git a/file1.txt b/file1.txt +index abc1234..def5678 100644 +--- a/file1.txt ++++ b/file1.txt +@@ -1,3 +1,4 @@ + line1 + line2 ++line3 + line4 +diff --git a/file2.txt b/file2.txt +index 1234567..abcdef0 100644 +--- a/file2.txt ++++ b/file2.txt +@@ -1,3 +1,4 @@ + lineA + lineB ++lineC + lineD`)) + .to.eql([{ + oldFileName: 'a/file1.txt', + oldHeader: '', + newFileName: 'b/file1.txt', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 3, + newStart: 1, newLines: 4, + lines: [ + ' line1', + ' line2', + '+line3', + ' line4' + ] + } + ] + }, { + oldFileName: 'a/file2.txt', + oldHeader: '', + newFileName: 'b/file2.txt', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 3, + newStart: 1, newLines: 4, + lines: [ + ' lineA', + ' lineB', + '+lineC', + ' lineD' + ] + } + ] + }]); + }); + + it('should parse a `diff --git` rename with no content change', function() { + expect(parsePatch( +`diff --git a/README.md b/README-2.md +similarity index 100% +rename from README.md +rename to README-2.md`)) + .to.eql([{ + oldFileName: 'a/README.md', + newFileName: 'b/README-2.md', + isGit: true, + hunks: [], + isRename: true + }]); + }); + + it('should parse a `diff --git` rename with content change', function() { + expect(parsePatch( +`diff --git a/old-name.txt b/new-name.txt +similarity index 85% +rename from old-name.txt +rename to new-name.txt +index abc1234..def5678 100644 +--- a/old-name.txt ++++ b/new-name.txt +@@ -1,3 +1,4 @@ + line1 + line2 ++line3 + line4`)) + .to.eql([{ + oldFileName: 'a/old-name.txt', + oldHeader: '', + newFileName: 'b/new-name.txt', + newHeader: '', + isGit: true, + isRename: true, + hunks: [ + { + oldStart: 1, oldLines: 3, + newStart: 1, newLines: 4, + lines: [ + ' line1', + ' line2', + '+line3', + ' line4' + ] + } + ] + }]); + }); + + it('should parse a `diff --git` mode-only change', function() { + expect(parsePatch( +`diff --git a/script.sh b/script.sh +old mode 100644 +new mode 100755`)) + .to.eql([{ + oldFileName: 'a/script.sh', + newFileName: 'b/script.sh', + isGit: true, + oldMode: '100644', + newMode: '100755', + hunks: [] + }]); + }); + + it('should parse a `diff --git` binary file change', function() { + expect(parsePatch( +`diff --git a/image.png b/image.png +index abc1234..def5678 100644 +Binary files a/image.png and b/image.png differ`)) + .to.eql([{ + oldFileName: 'a/image.png', + newFileName: 'b/image.png', + isGit: true, + isBinary: true, + hunks: [] + }]); + }); + + it('should not lose files when a hunkless `diff --git` file is followed by one with hunks', function() { + expect(parsePatch( +`diff --git a/file1.txt b/file1.txt +--- a/file1.txt ++++ b/file1.txt +@@ -1 +1 @@ +-old ++new +diff --git a/image.png b/image.png +Binary files a/image.png and b/image.png differ +diff --git a/file3.txt b/file3.txt +--- a/file3.txt ++++ b/file3.txt +@@ -1 +1 @@ +-foo ++bar`)) + .to.eql([{ + oldFileName: 'a/file1.txt', + oldHeader: '', + newFileName: 'b/file1.txt', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-old', '+new'] + } + ] + }, { + oldFileName: 'a/image.png', + newFileName: 'b/image.png', + isGit: true, + isBinary: true, + hunks: [] + }, { + oldFileName: 'a/file3.txt', + oldHeader: '', + newFileName: 'b/file3.txt', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-foo', '+bar'] + } + ] + }]); + }); + + it('should parse a `diff --git` copy', function() { + expect(parsePatch( +`diff --git a/original.txt b/copy.txt +similarity index 100% +copy from original.txt +copy to copy.txt`)) + .to.eql([{ + oldFileName: 'a/original.txt', + newFileName: 'b/copy.txt', + isGit: true, + hunks: [], + isCopy: true + }]); + }); + + it('should parse a `diff --git` new file', function() { + expect(parsePatch( +`diff --git a/newfile.txt b/newfile.txt +new file mode 100644 +index 0000000..abc1234 +--- /dev/null ++++ b/newfile.txt +@@ -0,0 +1,2 @@ ++hello ++world`)) + .to.eql([{ + oldFileName: '/dev/null', + oldHeader: '', + newFileName: 'b/newfile.txt', + newHeader: '', + isGit: true, + isCreate: true, + newMode: '100644', + hunks: [ + { + oldStart: 1, oldLines: 0, + newStart: 1, newLines: 2, + lines: ['+hello', '+world'] + } + ] + }]); + }); + + it('should parse a `diff --git` deleted file', function() { + expect(parsePatch( +`diff --git a/old.txt b/old.txt +deleted file mode 100644 +index ce01362..0000000 +--- a/old.txt ++++ /dev/null +@@ -1 +0,0 @@ +-goodbye`)) + .to.eql([{ + oldFileName: 'a/old.txt', + oldHeader: '', + newFileName: '/dev/null', + newHeader: '', + isGit: true, + isDelete: true, + oldMode: '100644', + hunks: [ + { + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 0, + lines: ['-goodbye'] + } + ] + }]); + }); + + it('should parse a `diff --git` empty file creation (no --- / +++ or hunks)', function() { + expect(parsePatch( +`diff --git a/empty.txt b/empty.txt +new file mode 100644 +index 0000000..e69de29`)) + .to.eql([{ + oldFileName: 'a/empty.txt', + newFileName: 'b/empty.txt', + isGit: true, + isCreate: true, + newMode: '100644', + hunks: [] + }]); + }); + + it('should parse a `diff --git` empty file deletion (no --- / +++ or hunks)', function() { + expect(parsePatch( +`diff --git a/empty.txt b/empty.txt +deleted file mode 100644 +index e69de29..0000000`)) + .to.eql([{ + oldFileName: 'a/empty.txt', + newFileName: 'b/empty.txt', + isGit: true, + isDelete: true, + oldMode: '100644', + hunks: [] + }]); + }); + + it('should unquote C-style quoted filenames in rename from/to', function() { + expect(parsePatch( +`diff --git "a/file\\twith\\ttabs.txt" b/normal.txt +similarity index 100% +rename from "file\\twith\\ttabs.txt" +rename to normal.txt`)) + .to.eql([{ + oldFileName: 'a/file\twith\ttabs.txt', + newFileName: 'b/normal.txt', + isGit: true, + hunks: [], + isRename: true + }]); + }); + + it('should handle all Git C-style escape sequences in quoted filenames', function() { + expect(parsePatch( +`diff --git "a/\\a\\b\\f\\r\\v\\001file.txt" "b/\\a\\b\\f\\r\\v\\001file.txt" +old mode 100644 +new mode 100755`)) + .to.eql([{ + oldFileName: 'a/\x07\b\f\r\v\x01file.txt', + newFileName: 'b/\x07\b\f\r\v\x01file.txt', + isGit: true, + oldMode: '100644', + newMode: '100755', + hunks: [] + }]); + }); + + it('should handle multi-byte UTF-8 octal escapes in quoted filenames', function() { + // 🎉 is U+1F389, UTF-8 bytes F0 9F 8E 89 = octal 360 237 216 211 + expect(parsePatch( +`diff --git "a/caf\\303\\251-file\\360\\237\\216\\211.txt" "b/caf\\303\\251-file\\360\\237\\216\\211.txt" +new file mode 100644 +index 0000000..ce01362 +--- /dev/null ++++ "b/caf\\303\\251-file\\360\\237\\216\\211.txt" +@@ -0,0 +1 @@ ++hello`)) + .to.eql([{ + oldFileName: '/dev/null', + oldHeader: '', + newFileName: 'b/café-file🎉.txt', + newHeader: '', + isGit: true, + isCreate: true, + newMode: '100644', + hunks: [ + { + oldStart: 1, oldLines: 0, + newStart: 1, newLines: 1, + lines: ['+hello'] + } + ] + }]); + }); + + it('should parse `diff --git` with unquoted filenames containing spaces (same old and new)', function() { + expect(parsePatch( +`diff --git a/file with spaces.txt b/file with spaces.txt +old mode 100644 +new mode 100755`)) + .to.eql([{ + oldFileName: 'a/file with spaces.txt', + newFileName: 'b/file with spaces.txt', + isGit: true, + oldMode: '100644', + newMode: '100755', + hunks: [] + }]); + }); + + it('should parse `diff --git` rename with unquoted filenames containing spaces', function() { + // Typical, easy case where the `diff --git` line is unambiguous. + // See a later test for the pathological case. + expect(parsePatch( +`diff --git a/file with spaces.txt b/another file with spaces.txt +similarity index 100% +rename from file with spaces.txt +rename to another file with spaces.txt`)) + .to.eql([{ + oldFileName: 'a/file with spaces.txt', + newFileName: 'b/another file with spaces.txt', + isGit: true, + hunks: [], + isRename: true + }]); + }); + + it('should handle `diff --git` with a filename containing " b/"', function() { + // The filename literally contains " b/" which is also the separator + // between the old and new paths. Since old === new, the parser can + // find the unique split where both halves match. + expect(parsePatch( +`diff --git a/x b/y.txt b/x b/y.txt +old mode 100644 +new mode 100755`)) + .to.eql([{ + oldFileName: 'a/x b/y.txt', + newFileName: 'b/x b/y.txt', + isGit: true, + oldMode: '100644', + newMode: '100755', + hunks: [] + }]); + }); + + it('should handle `diff --git` rename where filenames contain " b/"', function() { + // The diff --git line "diff --git a/x b/y b/z" is ambiguous: it + // could be split as old="a/x" new="b/y b/z" or old="a/x b/y" + // new="b/z". We parse two patches with the SAME diff --git line + // but different rename from/rename to headers to prove the + // extended headers win and correctly disambiguate the split. + + // Split interpretation 1: old="a/x", new="b/y b/z" + expect(parsePatch( +`diff --git a/x b/y b/z +similarity index 100% +rename from x +rename to y b/z`)) + .to.eql([{ + oldFileName: 'a/x', + newFileName: 'b/y b/z', + isGit: true, + hunks: [], + isRename: true + }]); + + // Split interpretation 2: old="a/x b/y", new="b/z" + expect(parsePatch( +`diff --git a/x b/y b/z +similarity index 100% +rename from x b/y +rename to z`)) + .to.eql([{ + oldFileName: 'a/x b/y', + newFileName: 'b/z', + isGit: true, + hunks: [], + isRename: true + }]); + }); + + it('should handle `diff --git` rename where filenames contain " b/", without rename from/to', function() { + // Same ambiguous `diff --git` line as previous test, but here + // disambiguated by the ---/+++ headers. + + // Split interpretation 1: old="a/x", new="b/y b/z" + expect(parsePatch( +`diff --git a/x b/y b/z +--- a/x ++++ b/y b/z +@@ -1 +1 @@ +-hello ++world`)) + .to.eql([{ + oldFileName: 'a/x', + oldHeader: '', + newFileName: 'b/y b/z', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-hello', '+world'] + } + ] + }]); + + // Split interpretation 2: old="a/x b/y", new="b/z" + expect(parsePatch( +`diff --git a/x b/y b/z +--- a/x b/y ++++ b/z +@@ -1 +1 @@ +-hello ++world`)) + .to.eql([{ + oldFileName: 'a/x b/y', + oldHeader: '', + newFileName: 'b/z', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-hello', '+world'] + } + ] + }]); + }); + + describe('unparseable `diff --git headers', function() { + // So far as we know, Git never actually produces diff --git headers that + // can't be parsed (e.g. with unterminated quotes or missing a/b prefixes). + // But we test these cases to confirm parsePatch doesn't crash and instead + // gracefully falls back to getting filenames from --- / +++ lines. + it('should handle an unparseable `diff --git` header with unterminated quote', function() { + expect(parsePatch( + `diff --git "a/unterminated +--- a/file.txt ++++ b/file.txt +@@ -1 +1 @@ +-old ++new`)) + .to.eql([{ + oldFileName: 'a/file.txt', + oldHeader: '', + newFileName: 'b/file.txt', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-old', '+new'] + } + ] + }]); + }); + + it('should handle an unparseable `diff --git` header with no a/b prefixes', function() { + expect(parsePatch( + `diff --git file.txt file.txt +--- a/file.txt ++++ b/file.txt +@@ -1 +1 @@ +-old ++new`)) + .to.eql([{ + oldFileName: 'a/file.txt', + oldHeader: '', + newFileName: 'b/file.txt', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-old', '+new'] + } + ] + }]); + }); + + + it('should handle an incomplete octal escape in a quoted `diff --git` filename', function() { + // The quoted filename has a truncated octal escape (\36 instead of \360). + // parseQuotedFileName should return null, so parseGitDiffHeader returns + // null and we fall back to --- / +++ lines for filenames. + expect(parsePatch( + `diff --git "a/file\\36" "b/file\\36" +--- a/file.txt ++++ b/file.txt +@@ -1 +1 @@ +-old ++new`)) + .to.eql([{ + oldFileName: 'a/file.txt', + oldHeader: '', + newFileName: 'b/file.txt', + newHeader: '', + isGit: true, + hunks: [ + { + oldStart: 1, oldLines: 1, + newStart: 1, newLines: 1, + lines: ['-old', '+new'] + } + ] + }]); + }); + + it('should handle an unparseable `diff --git` header with no --- or +++ fallback', function() { + // When both the `diff --git` header is unparseable AND there are no + // --- / +++ lines, filenames remain undefined. + expect(parsePatch( + `diff --git file.txt file.txt +old mode 100644 +new mode 100755`)) + .to.eql([{ + isGit: true, + oldMode: '100644', + newMode: '100755', + hunks: [] + }]); + }); + }); }); }); diff --git a/test/patch/readme-rename-example.js b/test/patch/readme-rename-example.js new file mode 100644 index 00000000..3945accd --- /dev/null +++ b/test/patch/readme-rename-example.js @@ -0,0 +1,240 @@ +import {applyPatches} from '../../libesm/patch/apply.js'; + +import {expect} from 'chai'; +import fs from 'fs'; +import os from 'os'; +import path from 'path'; + +describe('README Git rename example', function() { + let tmpDir; + let originalCwd; + + beforeEach(function() { + tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'jsdiff-readme-test-')); + originalCwd = process.cwd(); + process.chdir(tmpDir); + }); + + afterEach(function() { + process.chdir(originalCwd); + fs.rmSync(tmpDir, {recursive: true, force: true}); + }); + + /** + * Extract the Git rename example code from the README and return it as a + * function that takes (applyPatches, patch, fs, path) and runs the example. + */ + const applyGitPatch = (function() { + const readme = fs.readFileSync( + path.join(__dirname, '../../README.md'), + 'utf-8' + ); + + // Find the heading + const headingIndex = readme.indexOf('##### Applying a multi-file Git patch'); + if (headingIndex === -1) { + throw new Error('Could not find the Git rename example heading in README.md'); + } + + // Find the code block after the heading + const afterHeading = readme.substring(headingIndex); + const codeBlockStart = afterHeading.indexOf('\n```\n'); + if (codeBlockStart === -1) { + throw new Error('Could not find the code block in the Git rename example'); + } + const codeStart = codeBlockStart + 4; // skip past the \n```\n + const codeBlockEnd = afterHeading.indexOf('\n```\n', codeStart); + if (codeBlockEnd === -1) { + throw new Error('Could not find the end of the code block in the Git rename example'); + } + + let code = afterHeading.substring(codeStart, codeBlockEnd); + + // Strip the require line — we'll provide applyPatches as an argument. + // Strip the fs.readFileSync for the patch — we'll provide patch as an argument. + code = code + .replace(/const \{applyPatches\}.*\n/, '') + .replace(/const patch = .*\n/, ''); + + // eslint-disable-next-line no-new-func + return new Function('applyPatches', 'patch', 'fs', code); + }()); + + it('should handle a simple rename with content change', function() { + fs.writeFileSync('old.txt', 'line1\nline2\nline3\n'); + + const patch = +`diff --git a/old.txt b/new.txt +similarity index 80% +rename from old.txt +rename to new.txt +--- a/old.txt ++++ b/new.txt +@@ -1,3 +1,3 @@ + line1 +-line2 ++line2modified + line3 +`; + + applyGitPatch(applyPatches, patch, fs); + + expect(fs.existsSync('old.txt')).to.equal(false); + expect(fs.readFileSync('new.txt', 'utf-8')) + .to.equal('line1\nline2modified\nline3\n'); + }); + + it('should handle a swap rename (a→b, b→a)', function() { + fs.writeFileSync('a.txt', 'content of a\n'); + fs.writeFileSync('b.txt', 'content of b\n'); + + const patch = +`diff --git a/a.txt b/b.txt +similarity index 100% +rename from a.txt +rename to b.txt +diff --git a/b.txt b/a.txt +similarity index 100% +rename from b.txt +rename to a.txt +`; + + applyGitPatch(applyPatches, patch, fs); + + expect(fs.readFileSync('a.txt', 'utf-8')).to.equal('content of b\n'); + expect(fs.readFileSync('b.txt', 'utf-8')).to.equal('content of a\n'); + }); + + it('should handle a swap rename with content changes', function() { + fs.writeFileSync('a.txt', 'aaa\n'); + fs.writeFileSync('b.txt', 'bbb\n'); + + const patch = +`diff --git a/a.txt b/b.txt +similarity index 50% +rename from a.txt +rename to b.txt +--- a/a.txt ++++ b/b.txt +@@ -1 +1 @@ +-aaa ++aaa-modified +diff --git a/b.txt b/a.txt +similarity index 50% +rename from b.txt +rename to a.txt +--- a/b.txt ++++ b/a.txt +@@ -1 +1 @@ +-bbb ++bbb-modified +`; + + applyGitPatch(applyPatches, patch, fs); + + expect(fs.readFileSync('a.txt', 'utf-8')).to.equal('bbb-modified\n'); + expect(fs.readFileSync('b.txt', 'utf-8')).to.equal('aaa-modified\n'); + }); + + it('should handle a three-way rotation (a→b, b→c, c→a)', function() { + fs.writeFileSync('a.txt', 'content of a\n'); + fs.writeFileSync('b.txt', 'content of b\n'); + fs.writeFileSync('c.txt', 'content of c\n'); + + const patch = +`diff --git a/a.txt b/b.txt +similarity index 100% +rename from a.txt +rename to b.txt +diff --git a/b.txt b/c.txt +similarity index 100% +rename from b.txt +rename to c.txt +diff --git a/c.txt b/a.txt +similarity index 100% +rename from c.txt +rename to a.txt +`; + + applyGitPatch(applyPatches, patch, fs); + + expect(fs.readFileSync('a.txt', 'utf-8')).to.equal('content of c\n'); + expect(fs.readFileSync('b.txt', 'utf-8')).to.equal('content of a\n'); + expect(fs.readFileSync('c.txt', 'utf-8')).to.equal('content of b\n'); + }); + + it('should handle a file deletion', function() { + fs.writeFileSync('doomed.txt', 'goodbye\n'); + + const patch = +`diff --git a/doomed.txt b/doomed.txt +deleted file mode 100644 +index 2b31011..0000000 +--- a/doomed.txt ++++ /dev/null +@@ -1 +0,0 @@ +-goodbye +`; + + applyGitPatch(applyPatches, patch, fs); + + expect(fs.existsSync('doomed.txt')).to.equal(false); + }); + + it('should handle a file creation', function() { + const patch = +`diff --git a/brand-new.txt b/brand-new.txt +new file mode 100644 +index 0000000..fa49b07 +--- /dev/null ++++ b/brand-new.txt +@@ -0,0 +1 @@ ++hello world +`; + + applyGitPatch(applyPatches, patch, fs); + + expect(fs.readFileSync('brand-new.txt', 'utf-8')).to.equal('hello world\n'); + }); + + it('should create a new executable file with correct mode', function() { + const patch = +`diff --git a/run.sh b/run.sh +new file mode 100755 +index 0000000..abc1234 +--- /dev/null ++++ b/run.sh +@@ -0,0 +1,2 @@ ++#!/bin/bash ++echo hello +`; + + applyGitPatch(applyPatches, patch, fs); + + expect(fs.readFileSync('run.sh', 'utf-8')).to.equal('#!/bin/bash\necho hello\n'); + const mode = fs.statSync('run.sh').mode & 0o777; + expect(mode).to.equal(0o755); + }); + + it('should set the mode when a file is modified with a mode change', function() { + fs.writeFileSync('script.sh', 'echo old\n'); + fs.chmodSync('script.sh', 0o644); + + const patch = +`diff --git a/script.sh b/script.sh +old mode 100644 +new mode 100755 +--- a/script.sh ++++ b/script.sh +@@ -1 +1 @@ +-echo old ++echo new +`; + + applyGitPatch(applyPatches, patch, fs); + + expect(fs.readFileSync('script.sh', 'utf-8')).to.equal('echo new\n'); + const mode = fs.statSync('script.sh').mode & 0o777; + expect(mode).to.equal(0o755); + }); +}); diff --git a/test/patch/reverse.js b/test/patch/reverse.js index d151b439..bff011d4 100644 --- a/test/patch/reverse.js +++ b/test/patch/reverse.js @@ -28,69 +28,173 @@ describe('patch/reverse', function() { it('should support taking an array of structured patches, as output by parsePatch', function() { const patch = parsePatch( - 'diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md\n' + - 'index 20b807a..4a96aff 100644\n' + - '--- a/CONTRIBUTING.md\n' + - '+++ b/CONTRIBUTING.md\n' + - '@@ -2,6 +2,8 @@\n' + - ' \n' + - ' ## Pull Requests\n' + - ' \n' + - '+bla bla bla\n' + - '+\n' + - ' We also accept [pull requests][pull-request]!\n' + - ' \n' + - ' Generally we like to see pull requests that\n' + - 'diff --git a/README.md b/README.md\n' + - 'index 06eebfa..40919a6 100644\n' + - '--- a/README.md\n' + - '+++ b/README.md\n' + - '@@ -1,5 +1,7 @@\n' + - ' # jsdiff\n' + - ' \n' + - '+foo\n' + - '+\n' + - ' [![Build Status](https://secure.travis-ci.org/kpdecker/jsdiff.svg)](http://travis-ci.org/kpdecker/jsdiff)\n' + - ' [![Sauce Test Status](https://saucelabs.com/buildstatus/jsdiff)](https://saucelabs.com/u/jsdiff)\n' + - ' \n' + - "@@ -225,3 +227,5 @@ jsdiff deviates from the published algorithm in a couple of ways that don't affe\n" + - ' \n' + - " * jsdiff keeps track of the diff for each diagonal using a linked list of change objects for each diagonal, rather than the historical array of furthest-reaching D-paths on each diagonal contemplated on page 8 of Myers's paper.\n" + - ' * jsdiff skips considering diagonals where the furthest-reaching D-path would go off the edge of the edit graph. This dramatically reduces the time cost (from quadratic to linear) in cases where the new text just appends or truncates content at the end of the old text.\n' + - '+\n' + - '+bar\n' + 'Index: file1.txt\n' + + '===================================================================\n' + + '--- file1.txt\n' + + '+++ file1.txt\n' + + '@@ -1,4 +1,5 @@\n' + + ' alpha\n' + + '+beta\n' + + ' gamma\n' + + ' delta\n' + + ' epsilon\n' + + 'Index: file2.txt\n' + + '===================================================================\n' + + '--- file2.txt\n' + + '+++ file2.txt\n' + + '@@ -2,3 +2,3 @@\n' + + ' two\n' + + '-three\n' + + '+THREE\n' + + ' four\n' ); expect(formatPatch(reversePatch(patch))).to.equal( + 'Index: file2.txt\n' + '===================================================================\n' + - '--- b/README.md\t\n' + - '+++ a/README.md\t\n' + - '@@ -1,7 +1,5 @@\n' + - ' # jsdiff\n' + - ' \n' + - '-foo\n' + - '-\n' + - ' [![Build Status](https://secure.travis-ci.org/kpdecker/jsdiff.svg)](http://travis-ci.org/kpdecker/jsdiff)\n' + - ' [![Sauce Test Status](https://saucelabs.com/buildstatus/jsdiff)](https://saucelabs.com/u/jsdiff)\n' + - ' \n' + - '@@ -227,5 +225,3 @@\n' + - ' \n' + - " * jsdiff keeps track of the diff for each diagonal using a linked list of change objects for each diagonal, rather than the historical array of furthest-reaching D-paths on each diagonal contemplated on page 8 of Myers's paper.\n" + - ' * jsdiff skips considering diagonals where the furthest-reaching D-path would go off the edge of the edit graph. This dramatically reduces the time cost (from quadratic to linear) in cases where the new text just appends or truncates content at the end of the old text.\n' + - '-\n' + - '-bar\n' + + '--- file2.txt\n' + + '+++ file2.txt\n' + + '@@ -2,3 +2,3 @@\n' + + ' two\n' + + '+three\n' + + '-THREE\n' + + ' four\n' + '\n' + + 'Index: file1.txt\n' + '===================================================================\n' + - '--- b/CONTRIBUTING.md\t\n' + - '+++ a/CONTRIBUTING.md\t\n' + - '@@ -2,8 +2,6 @@\n' + - ' \n' + - ' ## Pull Requests\n' + - ' \n' + - '-bla bla bla\n' + - '-\n' + - ' We also accept [pull requests][pull-request]!\n' + - ' \n' + - ' Generally we like to see pull requests that\n' + '--- file1.txt\n' + + '+++ file1.txt\n' + + '@@ -1,5 +1,4 @@\n' + + ' alpha\n' + + '-beta\n' + + ' gamma\n' + + ' delta\n' + + ' epsilon\n' + ); + }); + + it('should reverse a rename patch into a rename in the opposite direction', function() { + const patch = parsePatch( + 'diff --git a/old.txt b/new.txt\n' + + 'similarity index 85%\n' + + 'rename from old.txt\n' + + 'rename to new.txt\n' + + '--- a/old.txt\n' + + '+++ b/new.txt\n' + + '@@ -1,3 +1,3 @@\n' + + ' line1\n' + + '-line2\n' + + '+line2modified\n' + + ' line3\n' + ); + expect(formatPatch(reversePatch(patch))).to.equal( + 'diff --git a/new.txt b/old.txt\n' + + 'rename from new.txt\n' + + 'rename to old.txt\n' + + '--- a/new.txt\n' + + '+++ b/old.txt\n' + + '@@ -1,3 +1,3 @@\n' + + ' line1\n' + + '+line2\n' + + '-line2modified\n' + + ' line3\n' + ); + }); + + it('should reverse a copy patch into a deletion', function() { + const patch = parsePatch( + 'diff --git a/original.txt b/copy.txt\n' + + 'similarity index 85%\n' + + 'copy from original.txt\n' + + 'copy to copy.txt\n' + + '--- a/original.txt\n' + + '+++ b/copy.txt\n' + + '@@ -1,3 +1,3 @@\n' + + ' line1\n' + + '-line2\n' + + '+line2modified\n' + + ' line3\n' + ); + expect(formatPatch(reversePatch(patch))).to.equal( + 'diff --git a/copy.txt b/copy.txt\n' + + 'deleted file mode 100644\n' + ); + }); + + it('should reverse a hunk-less copy into a deletion', function() { + const patch = parsePatch( + 'diff --git a/original.txt b/copy.txt\n' + + 'similarity index 100%\n' + + 'copy from original.txt\n' + + 'copy to copy.txt\n' + ); + expect(formatPatch(reversePatch(patch))).to.equal( + 'diff --git a/copy.txt b/copy.txt\n' + + 'deleted file mode 100644\n' + ); + }); + + it('should reverse a hunk-less rename', function() { + const patch = parsePatch( + 'diff --git a/old.txt b/new.txt\n' + + 'similarity index 100%\n' + + 'rename from old.txt\n' + + 'rename to new.txt\n' + ); + expect(formatPatch(reversePatch(patch))).to.equal( + 'diff --git a/new.txt b/old.txt\n' + + 'rename from new.txt\n' + + 'rename to old.txt\n' + ); + }); + + it('should reverse a creation into a deletion, swapping isCreate/isDelete and oldMode/newMode', function() { + const patch = parsePatch( + 'diff --git a/newfile.txt b/newfile.txt\n' + + 'new file mode 100755\n' + + '--- /dev/null\n' + + '+++ b/newfile.txt\n' + + '@@ -0,0 +1 @@\n' + + '+hello\n' + ); + expect(formatPatch(reversePatch(patch))).to.equal( + 'diff --git a/newfile.txt b/newfile.txt\n' + + 'deleted file mode 100755\n' + + '--- a/newfile.txt\n' + + '+++ /dev/null\n' + + '@@ -1,1 +0,0 @@\n' + + '-hello\n' + ); + }); + + it('should reverse a deletion into a creation, swapping isCreate/isDelete and oldMode/newMode', function() { + const patch = parsePatch( + 'diff --git a/oldfile.txt b/oldfile.txt\n' + + 'deleted file mode 100644\n' + + '--- a/oldfile.txt\n' + + '+++ /dev/null\n' + + '@@ -1 +0,0 @@\n' + + '-goodbye\n' + ); + expect(formatPatch(reversePatch(patch))).to.equal( + 'diff --git a/oldfile.txt b/oldfile.txt\n' + + 'new file mode 100644\n' + + '--- /dev/null\n' + + '+++ b/oldfile.txt\n' + + '@@ -0,0 +1,1 @@\n' + + '+goodbye\n' + ); + }); + + it('should swap oldMode and newMode when reversing a mode change', function() { + const patch = parsePatch( + 'diff --git a/script.sh b/script.sh\n' + + 'old mode 100644\n' + + 'new mode 100755\n' + ); + expect(formatPatch(reversePatch(patch))).to.equal( + 'diff --git a/script.sh b/script.sh\n' + + 'old mode 100755\n' + + 'new mode 100644\n' ); }); });