I wrote a n-dim vector class using SSE/SSE2, but well, I gained no real speedup in my application, because that was limited by the memory transfer. dunno about using SSE otherwise. 3dimensional vectors are somehow bad, since their dimensionality is a multible of 2 or 4
For bot developing purposes the HL vector class should be sufficient. Dunno if there is still this little disadvantage in the / operator. since multiplication is faster, calculating once 1.f/float and then multiplication is faster ( although it may not be optimal regarding precision ) And all those functions are inlined, so the compiler has to take care of that. dunno about using + ... + ... + and how it's done by the compiler ... pmb ?!