Date: 3/2/2006||Recently I've ported some B+Tree code to windows to use in Scribe for storing the spam database word indexes. The current system takes a lot of time to load into memory and uses too much memory. I spent some time optimizing the existing code, but fundamentally it's design is flawed. Really the data needs to stay on disk and be arranged for random access (by key) without loading much of the file into memory. These is an existing data structure for that called the B+Tree. It's now finding it's way into commonly file systems (like ReiserFS), and was the basis of the BeOS File System.
However my point is that Scribe is probably going to use B+Trees for it's spam word DB. To test the code I've been reading in my current spamwords.wdb file and exporting it to the new B+Tree format. The .wdb file is fairly optimal in it's size, mines 762kb, so thats our baseline. The B+Tree files are less optimal in storing raw data, because of the overhead of their structure, but they have other very desirable qualities.
Once I got the B+Tree code actually working, I began finding the best parameters for the B+Tree, my initial attempt created a 4+mb file, eek! B+Trees are parameterised by block size and order. The order (n) is an indicator of the desired number of keys per block. The B+Tree tries to keep the number of keys between n and 2*n. The block size and order are inter-related, if the order is too small for the block size the blocks have lots of wasted space, if the order is too large, the blocks get full and have to be split up into multiple blocks too soon. So I worked where the sweet spot was for my typical data (using 256 byte blocks):
However on top of that, the order effects the speed of the data structure, because the larger the order, the more keys in the block and thus more linear searching of keys at the leaf level.
I found the sweet spot for 512 byte blocks was quiet a bit slower than the sweet spot for 256 byte blocks for this reason.
I'll be putting this into the bayesian implementation in Scribe v1.89 test1... it bring down the memory footprint and speed up the first mail receive a fair bit... esp if you have large word databases.
|(1) Comment | Add Comment|
|XP Screen Madness|
Date: 31/1/2006||I've found that when you hook an XP machine up to a KVM and boot it while the KVM is displaying the other machine, XP sees there is no monitor attached and ignores the refresh rate set in the display settings. In my case I want 1280x1024@32bpp and 75hz refresh. Which is fine if I boot the machine with the KVM switched to the XP box, otherwise the machine boots into 1280x1024@32bpp and 85hz refresh. When I do switch the KVM to the XP box the LCD says "Input Out Of Range". Now I wrote a command line program to tell me the current screen mode, ssh'd in from the Mac mini while XP was in this "Out Of Range" mode and ran the cmd line app. Thus I actually know what the out of range mode is.
The next step of course is to stop XP from setting the refresh to 85hz. Now I've googled around and some people seem to think that there might be an Nvidia registry setting that forces the refresh rate but so far no luck in getting that to work.
I added some functionality to my cmd line app to set the video mode (inc refresh) as well, and then called it from a remote ssh shell, but it complains about not being able to set the video mode. Which is not surprising I guess. (It does work fine from the local console.)
Now I'm thinking that maybe a Windows service needs to be installed that sits there polling the video mode and then if it sees an out of range mode it tries to reset the screen to some default. I havn't written a service before and I have my doubts as to whether a service can change the screen resolution/refresh. But it's worth a try.
The reason I would have to run it as a service is that when the machine boots, it goes to the select account screen that has no user actually logged in. At this point I only have services running, thus the need to run as a service. Otherwise I have to log in blind. Which is actually what I do to fix the problem at the moment. Log in as one user and then log in as another user, all blind, using the keyboard only. XP resets the screen mode when logging in to the 2nd account and the LCD shows the desktop.
I'm not even sure if it's the video card, XP or the KVM at fault but it's really really annoying.
Update: Last night I installed the resolution check cmd line app in the "All Users" startup folder such that it checks the resolution and refresh and changes it to something acceptable while logging in to each account. This way all I have to do is hit enter when the machine has finished booting and it'll switch to a valid screen mode. So while it's not "ideal" it's now a fairly benign problem.
I've packaged the res/refresh change program (with source) and made it available here.
|(0) Comments | Add Comment|
Date: 29/1/2006||After years of tinkering with making my PC quiet I think I'm finally on the verge of actually acheiving an almost silent PC. Recently I installed a Zalman fanless northbridge chipset heatsink because the little 40mm chipset fan was making disturbing noises like the bearings were failing and the PC was still far too noisy. And over the last few days I've been getting a number of load related system freezes. So I thought I'll pull the cover off and trouble shoot.
Over the years I've been picking up super quiet fans like the Pabst 8412NGL (12db) which I'm using on the CPU and the SilenX 120mm (14db) on the case. And I never really saw the point and the system wasn't much quieter. So finally I figured that I could test the system components in isolation to see how noisy each was. So I pulled all the fans (bar the CPU) and HD power cables and powered up. Silence. System booted... wow! Ok... so I worked my way around the system pluging each device back in one at a time and lo and behold one of the hard disks was making 90% of the noise. So it's out and the system is finally almost silent. The fans I bought are actually very good.
My remaining issue is that under load the Pabst fan doesn't push enough air to cool the CPU and it overheats and hangs. So I'm schemeing up plans to beef up the CPU fan with either a temperature sensitive unit or maybe some ducting to adapt one of the 120mm SilenX fans to the 70mm copper heat sink. The SilenX fan pushs twice the air the Pabst fan does.
If only I'd been lateral minded enough to use my code optimisation skillz sooner *sigh*
|(0) Comments | Add Comment|
|Mpeg2 Non-Destructive Editing Workflow|
Date: 27/1/2006||I am in the process of editing out duplicate scenes in 6+ hours of DVD format MPEG2 files. Joy!
And at the moment the workflow consists of:
This seams to provide excellent results but it is somewhat time consuming. Is there a better way to delete scenes in MPEG2 without re-encoding any video? (No shareware or warez please)
|(5) Comments | Add Comment|
Date: 21/1/2006||Let it be known henceforth that the memory bitmap code in Lgi doth draw
upon thine Quartz 2d contexts in wonderous colour.
And late on the Saturday evening, fret saw it was good and went henceforth unto his rest.
|(4) Comments | Add Comment|
|DirectShow Filter Graphs|
Date: 12/1/2006||I'm trying to write a directshow filter graph that will convert a dvd-ms file into a normal AVI with standard compression filters (like XviD and Mp3).
I seems to get all the filters hooked up right but then when I "Run" the graph nothing happens... it just sits there in the running state not doing anything. The CPU is idle and the output file is either 0kb or a few hundred K. Same thing happens in the DX8.1 graphedit application.
As a side issue I can't figure out how to bring up the codec filter's settings dialog.
|(2) Comments | Add Comment|