Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse: add support for non-seekable streams #132

Open
sbraz opened this issue Nov 26, 2023 · 2 comments
Open

parse: add support for non-seekable streams #132

sbraz opened this issue Nov 26, 2023 · 2 comments

Comments

@sbraz
Copy link
Owner

sbraz commented Nov 26, 2023

Currently, the parse() method supports file-like objects but they have to be seekable:

seek = lib.MediaInfo_Open_Buffer_Continue_GoTo_Get(handle)
# https://github.com/MediaArea/MediaInfoLib/blob/v20.09/Source/MediaInfoDLL/MediaInfoJNI.cpp#L127
if seek != ctypes.c_uint64(-1).value:
filename.seek(seek)

The mediainfo binary is able to parse piped input so I assume the library supports non-seekable streams.

@JeromeMartinez, how would I go about implementing this? Do I naively ignore MediaInfo_Open_Buffer_Continue_GoTo_Get and pass an arbitrary size to MediaInfo_Open_Buffer_Continue like this? It works with a test stream but I'm not sure it's the right way :D

diff --git a/pymediainfo/__init__.py b/pymediainfo/__init__.py
index 5cce5fa..852c2b3 100644
--- a/pymediainfo/__init__.py
+++ b/pymediainfo/__init__.py
@@ -457,13 +457,8 @@ class MediaInfo:
                 )
             for option_name, option_value in mediainfo_options.items():
                 lib.MediaInfo_Option(handle, option_name, option_value)
-        try:
-            filename.seek(0, 2)
-            file_size = filename.tell()
-            filename.seek(0)
-        except AttributeError:  # filename is not a file-like object
-            file_size = None
 
+        file_size = 2**64 - 1
         if file_size is not None:  # We have a file-like object, use the buffer protocol:
             # Some file-like objects do not have a mode
             if "b" not in getattr(filename, "mode", "b"):
@@ -476,13 +471,6 @@ class MediaInfo:
                     # 4th bit = finished
                     if lib.MediaInfo_Open_Buffer_Continue(handle, buffer, len(buffer)) & 0x08:
                         break
-                    # Ask MediaInfo if we need to seek
-                    seek = lib.MediaInfo_Open_Buffer_Continue_GoTo_Get(handle)
-                    # https://github.com/MediaArea/MediaInfoLib/blob/v20.09/Source/MediaInfoDLL/MediaInfoJNI.cpp#L127
-                    if seek != ctypes.c_uint64(-1).value:
-                        filename.seek(seek)
-                        # Inform MediaInfo we have sought
-                        lib.MediaInfo_Open_Buffer_Init(handle, file_size, filename.tell())
                 else:
                     break
             lib.MediaInfo_Open_Buffer_Finalize(handle)
@sbraz
Copy link
Owner Author

sbraz commented Feb 14, 2024

Hi @JeromeMartinez, could you please help me implement this?

@JeromeMartinez
Copy link

Oops, I missed the first ping last year.

Do I naively ignore MediaInfo_Open_Buffer_Continue_GoTo_Get and pass an arbitrary size to MediaInfo_Open_Buffer_Continue like this?

Our usage is to stop the parsing (so exiting the loop then MediaInfo_Open_Buffer_Finalize). Behavior is not so bad there, usually you miss only the duration from MPEG-TS files, or info only from stream content with MP4 with header at the end.

Ignoring MediaInfo_Open_Buffer_Continue_GoTo_Get will work in lot of cases e.g. Matroska, MPEG-TS, MP4 with header at the begining of the file, it will work partially with MP4 with header at the end (it will skip all the data part then read the header i.e. you have to feed with the whole file before there is something interesting, then will try to go back to the beginning for stream content but will fail so no info from the stream content itself, only from the header), it does not hurt, it may just be super slow because you provide the full stream in several cases e.g. MPEG-TS, so like full parsing.

I would recommend to stop the parsing after a seek request if parse_speed is < 1 or if seek request is < current offset, else ignoring the seek request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants