Kirk Haines just stated:
If superior execution time is only achieved by offloading extra work
to an idle core, then that really isn’t a gain.
Agreed. Just because we have multiple cores now, that doesn’t mean we have to spawn threads for everything. In my opinion, achieving great single thread performance with good algorithms and clever optimization is still the best way of programming fast applications.