How many workers do you choose when running a parallel job in Julia? The answer is easy right? The number of physical cores. We always default to that number. For my Core i7 4770K, that means it's 4, not 8 since that would include the hyperthreads. On my FX8350, there are 8 cores, but only 4 floating-point units (FPUs) which do the math, so in mathematical projects, I should use 4, right? I want to demonstrate that it's not that simple.

## Where the Intuition Comes From

Most of the time when doing scientific computing you are doing parallel programming without even knowing it. This is because a lot of vectorized operations are "implicitly paralleled", meaning that they are multi-threaded behind the scenes to make everything faster. In other languages like Python, MATLAB, and R, this is also the case. Fire up MATLAB ... READ MORE

For many of us mathematicians, we were taught to use MATLAB, and we were taught to vectorize everything. I mean obviously if we have matrices $A$, $B$, and $C$ and want to multiply element-wise (say to solve a reaction-equation at each point in space), then the optimal code is

A.*B.*C

No questions to ask, right? Actually, this code isn't as optimized as you'd think. Lets dig deeper.

## BLAS, Linpack, and MKL

The reason you are always told by "the lords of numerical math" to vectorize your code is because very smart programmers worked really hard on making basic ... READ MORE