Not much help, but a quick search revealed this: https://github.com/nschlia/ffmpegfs
This seemed to be read-only tho, so not sure if it covers the use case you described. If you can program a little (AI help?) find a simple fuse filesystem in a language you know, fiddle with it and call ffmpeg or similar on receiving files.
You need more than a llm to do that. You need a Cognitive Architecture around the model that include RAG to store/retrieve the data. I would start with an agent network (CA) that already includes the workflow you ask for. Unfortunately I don’t have a name ready for you, but take a look here: https://github.com/slavakurilyak/awesome-ai-agents