Skip to content

Conversation

@rotu
Copy link
Contributor

@rotu rotu commented Apr 3, 2024

Where the schema expects a declared character encoding, validate that it is one of the character encodings recognized by the compiler. When using a smart editor, this makes it easier to author a .ksy file.

image

I took this list from the recognized encoding list in Kaitai Struct compiler.

"windows-1258",
"IBM437",
"IBM866"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to keep this list in sync with https://github.com/kaitai-io/kaitai_struct_compiler/blob/master/shared/src/main/scala/io/kaitai/struct/EncodingList.scala

I wonder if there is any easy way out here? I can imagine:

  • some tool which will copy ksc's EncodingList.scala list to here automatically
  • some tool which will copy this file to EncodingList.scala
  • getting ksc to read ksy_schema.json as source of truth (but then we'll need to keep a list of aliases somewhere?)

Copy link
Member

@generalmimon generalmimon Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • some tool which will copy ksc's EncodingList.scala list to here automatically

I think this is the way to go (but I don't think it has to be "automatically"). Practically speaking, I'm thinking of adding a CI job to kaitai_struct_compiler that compares EncodingList.scala with the latest ksy_schema.json. If it finds any inconsistencies, it can do one of two things:

  1. just fail the CI job and thus prompt a human operator to look into the CI log, which would contain the details about the failed comparison. The human would then go here to https://github.com/kaitai-io/ksy_schema and update the list manually.

    This should be simple to implement and would probably do the job good enough. Yes, the updating would still need to be done manually, but I don't think the encoding list in KSC will be changing so rapidly that this becomes annoying.

  2. have @kaitai-bot open a PR with the update in https://github.com/kaitai-io/ksy_schema/pulls and wait for the human operator to merge it. This would perhaps be more convenient if we were changing the encoding list in KSC often, but as I already mentioned, I don't think we will, so this is probably an overkill. It adds a bunch of implementation challenges on top of the variant 1 - for example, editing the JSON file so that the overall formatting stays intact, opening a PR via some API, etc.

    To be honest, I don't think we need this, it seems overengineered. I suppose that there will be like less than 10 updates of the encoding list in the next 5 years, and we can certainly handle that much manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to keep this list in sync

The same could be said for the list of attributes (e.g. the valid attribute, implemented by the compiler but missing from the spec kaitai-io/kaitai_struct#944).

Practically speaking, I'm thinking of adding a CI job to kaitai_struct_compiler that compares EncodingList.scala with the latest ksy_schema.json

I think it would be ideal to add a CI job in https://github.com/kaitai-io/kaitai_struct_formats which checks that *.ksy files validate against the schema.

Copy link
Member

@generalmimon generalmimon Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed CI job is now fully implemented in this repo in .github/workflows/check.yml. It passes if the encoding lists in KSC and in ksy_schema are identical, otherwise it fails and shows the diff. It runs when new commits are pushed to master and on pull requests. It is also triggered whenever the EncodingList.scala file changes, thanks to .github/workflows/encoding-list-updated.yml in the KSC repo.

A test change to EncodingList was made in kaitai-io/kaitai_struct_compiler@2308a19, which triggered the encoding-list-updated.yml workflow, which caused the check.yml workflow here in ksy_schema to run, proving that it works.

@GreyCat GreyCat self-requested a review April 4, 2024 11:50
Copy link
Member

@GreyCat GreyCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, if anything, that's a great first step towards user-friendly canonicalization.

@generalmimon
Copy link
Member

generalmimon commented Apr 4, 2024

@rotu On the one hand, naming Git branches after animals is kind of funny (I wonder what tool does that?), but on the other hand, sensible branch names that at least slightly indicate the content are better. It's not that different from the default naming scheme that GitHub uses for edits created via the Web GUI (patch-1, patch-2 etc.), but we can probably agree that this is not exactly best practice either.

Speaking of which, please try to write at least a brief explanation in the pull request description. In general, every change deserves it.

@rotu
Copy link
Contributor Author

rotu commented Apr 4, 2024

@rotu On the one hand, naming Git branches after animals is kind of funny (I wonder what tool does that?), but on the other hand, sensible branch names that at least slightly indicate the content are better. It's not that different from the default naming scheme that GitHub uses for edits created via the Web GUI (patch-1, patch-2 etc.), but we can probably agree that this is not exactly best practice either.

This is the default behavior of Visual Studio Code. The branch names should not be conveying meaning - please see the PR for canonical description.

Speaking of which, please try to write at least a brief explanation in the pull request description. In general, every change deserves it.

I thought the title of the PR was sufficient but I'll elaborate.

@generalmimon
Copy link
Member

generalmimon commented Apr 4, 2024

The branch names should not be conveying meaning - please see the PR for canonical description.

Why shouldn't they?

I thought the title of the PR was sufficient but I'll elaborate.

It's not just about what you did (commit / PR titles and code are probably indeed sufficient for that), but why and how you did it are often important questions. For example, here we had to guess that you took the list from EncodingList.scala, which you didn't mention. In kaitai-io/kaitai_struct#1101, @GreyCat had to ask you why you're making that change.

And I'm not talking just about this PR, but in general - most of the time, leaving PR descriptions empty is a bad habit.

@rotu
Copy link
Contributor Author

rotu commented Apr 4, 2024

The branch names should not be conveying meaning - please see the PR for canonical description.

Why shouldn't they?

Here's why I don't like naming my branches: (1) branch names are immutable, even if the scope of a PR changes or you named the branch poorly to start (2) it creates a conflicting source of truth for the intent of a branch.

I thought the title of the PR was sufficient but I'll elaborate.

It's not just about what you did (commit / PR titles and code are probably indeed sufficient for that), but why and how you did it are often important questions. For example, here we had to guess that you took the list from EncodingList.scala, which you didn't mention). In kaitai-io/kaitai_struct#1101, @GreyCat had to ask you why you're making that change.

And I'm not talking just about this PR, but in general - most of the time, leaving PR descriptions empty is a bad habit.

Good points!

generalmimon added a commit to kaitai-io/kaitai_struct_compiler that referenced this pull request Aug 5, 2025
This is a prerequisite (at least for how I want to implement it) for the
CI job checking whether `EncodingList` and the list of valid encodings
in `ksy_schema.json` are in sync, as described in
kaitai-io/ksy_schema#18 (comment)
rotu and others added 3 commits August 5, 2025 21:35
@generalmimon
Copy link
Member

@rotu Thanks!

@generalmimon generalmimon merged commit 977f3d4 into kaitai-io:master Aug 5, 2025
generalmimon added a commit to kaitai-io/kaitai_struct_compiler that referenced this pull request Aug 6, 2025
@rotu rotu deleted the mid-raccoon branch August 6, 2025 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants