I have been plugging on with the database side of the project. Not surprisingly, the WBGT calculation is once again the bottleneck and it limits what we can achieve with SQL. Got some interesting workarounds (SQL functions and / or C# functions which can be called from SQL Server CLR) but this is still inherently procedural, and it would be brilliant if we could truly make this set-based (becomes more scalable).
The problem with the current WBGT function is simply the huge number of iterations it has to do (total of 27 million iterations, with as many as 3 million on the worst inputs). So once again, I have tackled this part of the code to see if we can short-circuit the iteration cycle. Turns out we can, and we can do it accurately!
I re-tested this with the new formula for RH which Bruno supplied. It is fast! Absolutely amazing. Testing the accuracy, we can see that Tw is accurate to:
- 95th percentile: +- 2 C
- 99th percentile: +- 3 ~ 6 C
- Maximum delta: 25 ~ 29 C
(where maximum delta is the difference in Tw calculated by Iteration and Approximation)
My gut feeling is that this isn’t close enough.
In this method, I am using the original code, but trying to jump way ahead in the iteration of Tw. It isn’t a new and exciting model, simply a method of short-circuiting the existing method. Taking advantage of the idea that the iteration progresses (roughly) linearly, I decided we can probably do the first 2 calculations, obtain the slope, and then jump right to the final one. This was ok, but regularly over-estimated Tw.
So I tweaked this by estimating the final Tw (as explained above) and then backing up a short distance to get an intermediate Tw that is very close to the end of the iteration cycle. I then begin iterating from the intermediate Tw. Massively reduces the total number of iterations to ~ 3 million, with an average (mode) of 4 iterations per input. When stepping backwards by 15 %, the accuracy is high:
- 95th percentile: +-0.175 C
- 99th percentile: +-0.744 C
If you back up a little further (20 %), then the accuracy is:
- 95th percentile: +-0.156 C
- 99th percentile: +-0.183 C
Now this isn’t an exciting new piece of maths – but it does improve the scalability of the existing code. The effect on running time is reasonable (runs in approx 30 % of the time of the original), but the really exciting thing is that perhaps now we can actually use the database to do this in one ridiculously fast set-based operation. We couldn’t do this before, because the number of iterations required a huge number of disk writes and therefore was no faster, and probably less efficient than in-memory iteration. Testing this out is my next step (and probably final step, the deadline for project is fast approaching).
The upside of all this, is that if we can get out of procedural code and leverage the database, then the calculations can be run incredibly quickly. So if you ever wanted to increase the sample rate then you could. Monthly or weekly would certainly be achievable, and hourly wouldn’t be out of the question.