Skip to content

Extend Support for Vobsub subtitles to mp4 files #2531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 20, 2025

Conversation

bennettpeter
Copy link
Contributor

Refer to Issue #2510 . Vobsub subtitles work in mkv files but not in mp4 files.

I cannot find an official document describing how vobsub is implemented in mp4. It appears to be an extension invented by Nero Inc. / Nero AG for their "Nero Digital format" in 2003. It is supported by ffmpeg and I have traced through ffmpeg to find the details.

There is a new subtitle box type 'subp' (subpicture) that occurs under the handler alongside the existing types "text", "subt" ,etc. There is a new box type "mp4s" that occurs inside the "stsd" box and contains an esds box with initialization data for vobsub.

Box hierarchy:

moov
 trak
  mdia
   hdlr
    subp
     minf
      stbl
       stsd
        mp4s
         esds
          vobsub initialization data

The vobsub initialization data here is 64 bytes: 16 x 4-byte YUV palette entries. These must be converted to RGB for the vobsub parser. Also the vobsub parser needs the video width and height. It needs to be formatted as a vobsub idx file string, as follows (example):

size: 720x480\n
palette: feff00, feff00, feff00, 1f1f1f, 7f7f00, bebebe, 00fefc, fd00fe, feff00, 00007e, 008000, 7e0000, 00807d, 7f007f, 7f7f00, fefefe\n

In order to supply the size values, the values from the video track are needed. These are not available at the time when the mp4s box in being parsed. I have used a nasty hack to get the width and height from the video track into the vobsub subtitle tracks. I have added static variables to BoxParser to store the width and height from the video track so they can be accessed when parsing the vobsub. It would be preferable to apply the size to the vobsub track initialization data in parseTraks after all tracks have been parsed, but all of the track variables involved are final so cannot be modified. If the vobsub track occurs before the video track, default values will be used for width and height. This is unlikely.

I have added a test file libraries/test_data/src/test/assets/media/mp4/sample_with_vobsub.mp4 which is the test mkv file converted to mp4.

* If the vobsub occurs before the video track, the default values here will be used.
*/
private static int width = 720;
private static int height = 576;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid we can't accept this PR with this static mutable state. It's quite possible that there could be two ExoPlayer instances in the same JVM, both handling VobSub subtitles in MP4 files, and this static state would result in very confusing "cross talk" between them.

I think you will need to re-visit this plumbing to remove the static mutability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is bad. Thanks for the feedback. I will try to work it in a different way.

@bennettpeter
Copy link
Contributor Author

I have redone it in a way that is better and gets rid of the static variables. Also does not depend on the video track being before the subtitle tracks.

In a couple of places I have used "instanceOf" to check the type of object before casting the object and calling a new method. Another, and possibly better, way would be to add the new method to the interface used by the object, with a default empty implementation, and avoid the "instanceOf". Let me know if you would like me to change it that way.

@icbaker
Copy link
Collaborator

icbaker commented Jun 13, 2025

I should give you a general warning about the likelihood of this PR being accepted: Since there's no spec for this, we have to weigh up the cost/complexity of the change, and the potential maintenance burden going forward. You can see a bit more of my general reasoning on this topic here: #2491 (comment)

I'm not saying we won't ever accept VobSub-in-MP4 support - but the bar for simplicity of implementation is higher than when adding support for a better specified extension.

The mutable static state initially suggested was clearly beyond what we'd accept (tbh even for a spec-supported change).

The latest version that requires adding VobSub-in-MP4-specific methods in the otherwise very general SubtitleTranscodingExtractorOutput is also likely too complex given the lack of spec I'm afraid. It also assumes that if an MP4 container has multiple video tracks, they all have the same size (which I'm not sure can be assumed in general).

Would it be possible to extract the width and height you need from the tkhd box of the subtitle track? That would hopefully reduce the blast radius of the change to make it more acceptable. Analysing your test file with https://gpac.github.io/mp4box.js/test/filereader.html I see this for the subtitle track (these are encoded in 16.16 format, so they decode to 1280x720):

width	83886080
height	47185920

This is slightly different to the values in the video track tkhd (which decode to 1080x720):

width	70778880
height	47185920

ExoPlayer doesn't currently parse the width and height from tkhd boxes at all (we read the width & height for video tracks from the VideoSampleEntry box instead), but we could - by extending the code here (beware that atm we stop reading halfway through the matrix, so you'll need to skip the rest of it before reading the width & height):

int alternateGroup = tkhd.readUnsignedShort();
tkhd.skipBytes(4);
int a00 = tkhd.readInt();
int a01 = tkhd.readInt();
tkhd.skipBytes(4);
int a10 = tkhd.readInt();
int a11 = tkhd.readInt();

So I think this would require:

  1. Parse width & height from tkhd into new fields on TkhdData class
  2. Pass this width & height from parseTrak(...) into parseStsd(...) and then into parseTextSampleEntry(...)
    • It might be easiest to replace the rotationDegrees param with the TkhdData object.

@bennettpeter
Copy link
Contributor Author

bennettpeter commented Jun 13, 2025

All done as suggested. It works perfectly. A much better solution. All changes are in BoxParser.java, apart from one new constant that is defined in MP4Box.java.

The difference between the size in the video track tkhd and the subtitle tkhd seem to be that when the source is anamorphic, the subtitle track tkhd and the video track VideoSampleEntry have the source width and height, while the video track tkhd has the size adjusted to the rendering aspect ratio. For the subtitles it is the source width and height that is needed, so this solution is good.

@icbaker
Copy link
Collaborator

icbaker commented Jun 16, 2025

Thanks for refactoring to keep everything within the parsing of the single subtitle track.

YUV palette entries. These must be converted to RGB for the vobsub parser.

I think this is the next contentious/awkward part of this PR. Do you have any context for why the Nero Digital Format uses YUV? As far as I can see, the 'original' VobSub IDX files use an RGB palette, so converting to YUV (and presumably forcing any parser to convert it back again) seems quite awkward!

I think (?) your conversion algorithm is based on BT.601 (i.e. SD) coefficients. Are you sure this is correct, and it shouldn't be BT.709 (i.e. HD)? The example in this PR only has white and black, so the difference might not be that noticeable.

@bennettpeter
Copy link
Contributor Author

In ffmpeg V6.0 the code to create the idx for vobsub in mp4 files is here:
https://github.com/FFmpeg/FFmpeg/blob/d388c347d41e4eb516dec05910551c5461e65615/libavformat/mov.c#L2356

It has a call to convert to RGB a few lines down, here:
https://github.com/FFmpeg/FFmpeg/blob/d388c347d41e4eb516dec05910551c5461e65615/libavformat/mov.c#L2372

Since ffmpeg is used for handbrake and vlc that seems a good place to find the correct implementation. I agree that the whole YUV palette seems odd. It makes sense for the vob to use RGB but does not make much sense that ithe palette is stored in YUV in the mp4 file.

I found it difficult to find a YUV to RGB formula that gave the correct results. Wikipedia was not helpful. The formula I eventually found was for the older SD conversion. I figured that any difference would not be noticable. Also I assume that since vobsub subtitles were created for DVDs the SD formula would be correct.

I could not figure out the formula that ffmpeg uses for the YUV to RGB conversion. It uses bit manipulation and goes into low level code.

If you want me to change to the HD conversion coefficients can you point me to where they can be found. I could not make sense of the wikipedia article, which has a formula that converts to R', G', B' and these do not seem to be the same as R G and B. I could not find any good information on it.

@icbaker
Copy link
Collaborator

icbaker commented Jun 16, 2025

Also I assume that since vobsub subtitles were created for DVDs the SD formula would be correct.

This reasoning makes sense to me - thanks.

I think we'll keep the logic in this class, and when I import this PR for internal review I might rename (and document) the yuvToRgb method to be VobSub specific, so it doesn't accidentally get re-used in a more general ExoPlayer/media3 context in future where it might not have the right coefficients.

@icbaker
Copy link
Collaborator

icbaker commented Jun 16, 2025

I'm going to send this for internal review now. You may see some more commits being added as I make changes in response to review feedback. Please refrain from pushing any more substantive changes as it will complicate the internal review - thanks!

@copybara-service copybara-service bot merged commit 0b2f694 into androidx:main Jun 20, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants