Skip to content

Create the side-by-side option (-y) feature for the diff command (Incomplete) #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 3, 2025

Conversation

sami-daniel
Copy link
Contributor

@sami-daniel sami-daniel commented Apr 22, 2025

  • Create the function, in the utils package, limited_string that allows you to truncate a string based on a delimiter (May break the encoding of the character where it was cut)

  • Create tests for limited_string function

  • Add support for -y and --side-by-side flags that enables diff output for side-by-side mode

  • Create implementation of the diff -y (SideBySide) command, base command for sdiff, using the crate diff as engine. Currently it does not fully represent GNU diff -y, some flags (|, (, ), , /) could not be developed due to the limitation of the engine we currently use (crate diff), which did not allow perform logic around it. Only the use of '<' and '>' were enabled.

TL;DR

A new limited_string function was added to the utils package, allowing strings to be truncated based on a delimiter (note: this may break character encoding at the cut point). Unit tests were created to ensure correct behavior of the function.

Support for the -y and --side-by-side flags was added to enable side-by-side diff output. The SideBySide implementation, serving as the base for the sdiff command, was developed using the diff crate as the comparison engine. Due to limitations of the crate, only the < and > markers are currently supported—other markers like |, (, ),\ and / could not be implemented.

Clarification
The main goal of the limited_string function is to simplify and standardize the process of truncating strings for everyone working on the project. The idea was to provide a utility that anyone could use without having to reimplement or worry too much about string manipulation logic. However, a known limitation of the current implementation is that it can break character encoding, especially when the cut happens mid-way through a multi-byte character (e.g., in UTF-8).

I am open to suggestions on how to improve or safely handle encoding, whether by detecting boundaries or using a safer string-splitting strategy that respects character integrity.

Regarding the SideBySide implementation: I recognize that the output and edit script provided by the diff crate is limited. These limitations currently prevent us from supporting additional characters such as |, (, ), , or / that are used in GNU diff -y to more accurately reflect changes in lines.

For this initial version, I think the use of < and >, are sufficient for a first pass of side-by-side comparison. Moving forward, I did like to open a follow-up issue where we can evaluate better strategies (possibly even considering another diff engine) to handle richer diff representations and support for the missing symbols.

Suggestions and ideas are very welcome!

Copy link

codecov bot commented Apr 22, 2025

Codecov Report

Attention: Patch coverage is 96.34146% with 33 lines in your changes missing coverage. Please review.

Project coverage is 86.19%. Comparing base (5b791e8) to head (fce0881).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/params.rs 33.33% 21 Missing and 1 partial ⚠️
src/side_diff.rs 98.96% 8 Missing and 1 partial ⚠️
src/diff.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #117      +/-   ##
==========================================
+ Coverage   84.50%   86.19%   +1.69%     
==========================================
  Files          12       13       +1     
  Lines        5318     6216     +898     
  Branches      476      509      +33     
==========================================
+ Hits         4494     5358     +864     
- Misses        809      839      +30     
- Partials       15       19       +4     
Flag Coverage Δ
macos_latest 86.12% <96.34%> (+1.74%) ⬆️
ubuntu_latest 86.21% <96.34%> (+1.72%) ⬆️
windows_latest 19.49% <0.55%> (-3.31%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sami-daniel
Copy link
Contributor Author

I'll fix the alerts before reopening the PR, 😁

@sami-daniel sami-daniel reopened this Apr 22, 2025
@sami-daniel
Copy link
Contributor Author

Is done : )

@oSoMoN
Copy link
Collaborator

oSoMoN commented May 6, 2025

Hi Sami. Thanks for your work on this feature!
I don't have time to spare right now to review it, but please be assured that I will as soon as I can.

@sylvestre sylvestre requested a review from Copilot May 6, 2025 09:17
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new string truncation utility (limited_string) with unit tests, and implements a side-by-side diff mode using the diff crate for output.

  • Added limited_string utility in the utils package with tests
  • Implemented the -y/--side-by-side flags and corresponding diff output in side_diff
  • Updated module re-exports and parameter parsing to support the new feature

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/utils.rs Added limited_string function and tests
src/side_diff.rs Implemented side-by-side diff output using diff crate
src/params.rs Added parameter handling for -y/--side-by-side flag
src/main.rs Integrated side_diff module into main diff workflow
src/lib.rs Re-exported side_diff diff function with an alias
src/diff.rs Updated to include the new side_diff option
Comments suppressed due to low confidence (2)

src/side_diff.rs:22

  • There are multiple spelling issues: 'nescessary' should be 'necessary' and 'lenght' should be 'length'; please correct these in the comment.
    // recalculate how many spaces are nescessary, cause we need to take into consideration the lenght of the word before print it.

src/lib.rs:15

  • [nitpick] The re-export alias 'side_by_syde_diff' appears to have a typo; consider renaming it to 'side_by_side_diff' for clarity.
pub use side_diff::diff as side_by_syde_diff;

@sami-daniel sami-daniel reopened this May 26, 2025
@sami-daniel
Copy link
Contributor Author

@sylvestre I think there is now a cool version of the side. Can you take a look? : )

@sylvestre sylvestre requested a review from Copilot May 27, 2025 15:44
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a side-by-side diff mode by adding a new "SideBySide" flag and related functionality, including a new utility function (limited_string), corresponding tests, and integration into the diff command using the diff crate.

  • Added SideBySide flag support and width parameter in params.
  • Created and exposed the side_diff module with diff implementation for side-by-side output.
  • Introduced a new fuzz target and updated CI configuration for fuzz testing the side-by-side feature.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/utils.rs Reordered import statements.
src/params.rs Added SideBySide flag, width parameter, and updated default values and tests.
src/main.rs Imported the new side_diff module.
src/lib.rs Exposed side_diff module and re-exported its diff function as side_by_side_diff.
src/diff.rs Added side-by-side diff branch handling in the main diff command function.
fuzz/fuzz_targets/fuzz_side.rs Created a new fuzz target to test side-by-side diff behavior.
fuzz/Cargo.toml Defined a new binary for the fuzz_side target.
.github/workflows/fuzzing.yml Updated workflow to include the fuzz_side job (with a minor typo in the key).

Comment on lines +26 to +41
File::create("target/fuzz.file.original")
.unwrap()
.write_all(&original)
.unwrap();
File::create("target/fuzz.file.new")
.unwrap()
.write_all(&new)
.unwrap();
File::create("target/fuzz.file")
.unwrap()
.write_all(&original)
.unwrap();
File::create("target/fuzz.diff")
.unwrap()
.write_all(&output_buf)
.unwrap();
Copy link
Preview

Copilot AI May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider removing disk writes from the fuzz target to avoid unnecessary I/O overhead and side effects during fuzzing. Using in-memory buffers for debugging outputs can improve iteration speed.

Suggested change
File::create("target/fuzz.file.original")
.unwrap()
.write_all(&original)
.unwrap();
File::create("target/fuzz.file.new")
.unwrap()
.write_all(&new)
.unwrap();
File::create("target/fuzz.file")
.unwrap()
.write_all(&original)
.unwrap();
File::create("target/fuzz.diff")
.unwrap()
.write_all(&output_buf)
.unwrap();
let mut debug_original = Vec::new();
debug_original.extend_from_slice(&original);
let mut debug_new = Vec::new();
debug_new.extend_from_slice(&new);
let mut debug_diff = Vec::new();
debug_diff.extend_from_slice(&output_buf);

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind changing it, but it would be out of standard compared to other files related to fuzz.

sami-daniel and others added 5 commits June 2, 2025 22:32
…omplete).

- Create the function, in the utils package, limited_string that allows you to truncate a string based on a
delimiter (May break the encoding of the character where it was cut)

- Create tests for limited_string function

- Add support for -y and --side-by-side flags that enables diff output for side-by-side mode

- Create implementation of the diff -y (SideBySide) command, base command for sdiff, using the crate
diff as engine. Currently it does not fully represent GNU diff -y, some flags (|, (, ), , /) could
not be developed due to the limitation of the engine we currently use (crate diff), which did not
allow perform logic around it. Only the use of '<' and '>' were enabled.

- Create tests for SideBySide implementation
    Create the diff -y utility, this time introducing tests and changes focused
    mainly on the construction of the utility and issues related to alignment
    and response tabulation. New parameters were introduced such as the size
    of the total width of the output in the parameters. A new calculation was
    introduced to determine the size of the output columns and the maximum
    total column size. The tab and spacing mechanism has the same behavior
     as the original diff, with tabs and spaces formatted in the same way.

    - Introducing tests for the diff 'main' function
    - Introducing fuzzing for side diff utility
    - Introducing tests for internal mechanisms
    - Modular functions that allow consistent changes across the entire project
Configuring CI to run fuzz from fuzz_side

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@sylvestre sylvestre merged commit dee3bc1 into uutils:main Jun 3, 2025
28 checks passed
@sylvestre
Copy link
Collaborator

excellent!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants