Mpeg2 Parser
Date: 19/10/2009
Last night I got the latest refactor of my mpeg2 parser working on non-trivial amounts of data. I'm working on that with a view towards replacing PVAstrumento in my private stock video re-encoding app that compresses a lot of DTV files down to H264/AAC for semi long term storage. The issue with most demuxers is that they are not good at recovering from errors or keep good sync between the video and audio. PVAStrumento is the best demuxer for damaged and chopped up mpeg2 as it has a raft of error recovery logic. However it's painful to use and is not perfect either. Sometimes there are files that even it can't handle. So for most of this year I've been working on and off on a replacement parser.

As of yesterday I have a basic parser working that allows access to the raw streams. No error correcting yet, but it's a great foundation. The trick is mostly in chopping up the audio packets to match the the valid video frames using the PTS (timestamp) values in the program stream packets. At this point I have access to all the PS packets, their PTS values and all the frames of video and audio. So it's just a matter of working out which to keep and which to chuck. And I can use PVAStrumento as a reference as to what I should be doing (it works great 98% of the time) and then move on to the "problem" files and see where PVAStrumento is falling over and make my parser work for those edge cases.

Well thats the theory anyway.

Strangely I was starting with a 2MB memory buffer for the top level program stream. I was getting 15mb/s parse speed which is considerably lower than the HD's maximum transfer rate and I confirmed that the algorithm was CPU bound by checking the CPU usage.. 1 core was maxed. So I experiemented with different buffer sizes, 4mb? speed dropped to 13mb/s, 1mb? Speed up to 20mb/s, hmmmm thats odd. 512kb? 40mb/s wow... thats nice... 256kb? 54mb/s... I ended up settling on 32kb, which tops out about 59mb/s. I think thats largely because the more buffer I keep track of the most PS packets I have to keep track of and I think somewhere I have a non-optimal algorithm working on the list of packets. Anyway, I think it's most HD bound now, which is how is should be.
Email (optional): (Will be HTML encoded to evade harvesting)
Remember username and/or email in a cookie.
Notify me of new posts in this thread via email.