Blog
Temp files vs in memory.
Date: 28/11/2008
Most application programmers will eventually come to a point where they need to choose whether to store some data in memory or whether to write it to a temporary file. In these sorts of cases you generally don't know how much data you have to handle so the safe thing to do it to write it to disk. However that has it's downsides, it's slower, you have to make sure you clean up the file afterwards and it can get intercepted by all sorts of other processes like anti-virus.

I'm currently working on the code that handles email download in Scribe v2 and basically there needs to be a container that stores the email as it's streaming in over the socket. Initially I just used a standard in memory container (Lgi object GStringPipe) which is a FIFO sort of thing that puts it's data in fixed sized blocks. Which is kinda cool because it doesn't have to re-allocate a single block of memory to "grow" it when it runs out of space. It just allocates another block and writes to that. Anyway thats fine for about 99.8% of email. Then there is the edge case. The really insanely large email... 111mb? 456mb? I've seen some whoppers. You usually don't see those on the internet, but in a LAN situation it's not unreasonable to email a large file to someone. You know they are not particularly bandwidth limited. And I've always made it a point to handle that gracefully, both for sending and receiving. In the receiving case I stream into memory until I hit 4mb, then I create a temp file and write all the data in memory to disk, and continue streaming remaining data straight to disk. At the end of the email download I parse it in the worker thread. The cool thing about the MIME parser is that it accepts a generic stream class as input and I can give it a memory stream or a disk stream without the MIME parser caring where the data is stored. The graceful part of that system is that if you have a large email, it's not hanging around in memory, or worse virtual memory, making your system slow down.

I have had bugs before where I wrote an email to disk and for a moment closed the file, then a second later attempted to open it and it's gone. Deleted. This happens when some zealous anti-virus process decides that the email is infected. So I do take writing email to disk seriously, and in general I choose to keep the file handle open to avoid losing the data. I'm not particularly worried about infected email considering Scribe's fairly good at detecting executable attachments and blocking or deleting them.

The 4mb cut over limit is somewhat arbitrary. But I think it covers the normative case nicely.
 
Reply
From:
Email (optional): (Will be HTML encoded to evade harvesting)
Message:
 
Remember username and/or email in a cookie.
Notify me of new posts in this thread via email.
BBcode:
[q]text[/q]
[url=link]description[/url]
[img]url_to_image[/img]
[pre]some_code[/pre]
[b]bold_text[/b]