Did you just nerd-snipe yourself into porting this to NEON? No. That turned out to be a lie.
Despite knowing almost nothing about NEON, I couldn’t resist and blew my weekend trying to parse ARM documentation.
Here’s a NEON followup to Parsing numbers into base-10 decimals with SIMD
[Read More]
Parsing numbers into base-10 decimals with SIMD
Using SIMD to accelerate decimal parsing
When parsing number-dense JSON,
much of the overhead is parsing numbers
instead of parsing the JSON itself.
For the typical case, most of the overhead within number parsing is converting into a form
mantissa * 10^-exponent, not transforming into a proper floating point value.
[Read More]
An experiment with type-erased datastructures
Making a compiler microbenchmark 14 times faster
The Rust compiler is known to trade compile time for performance aggressively. I look at how type erasure can
significantly reduce compile times by only generating one version of a datastructure and it’s methods.
[Read More]
Battling the Branch Predictor, Part 1
Waiting on an infrequent signal
Even the best branch predictors can only get data as good as the input.
Naively spinning on a signal variable, a common technique in lock-free programming,
will lead the branch predictor to pessimize your software.
[Read More]
Rapidly Distinguishing Variable Layout Messages
Some years ago, at a now-defunct company, I wrote a nifty SIMD scanner to quickly discover “interesting” fields in a complex message format.
After getting the parse time around a microsecond, I considered my work finished and moved on.
[Read More]
Optimizing Cache Usage With Nontemporal Accesses
Nontemporal stores
Have you ever looked at code reading/writing to a large or infrequently used datastructure and thought “What a waste of the cache?”
Look no further than nontemporal memory operations for all your cache-bypassing needs.
[Read More]