%matplotlib inline
%pylab inline
d = np.random.randint(0, 100, size=(10,10))
np.sort(d)
np.argsort(d)
d[np.arange(d.shape[0])[:,np.newaxis], np.argsort(d)]
Lexographical sorting
x = np.ones((5,2))
x[1:3,0] = 0
x[:,1] = np.arange(5)
x
# create view as a structured array (requires x to be continous)
v = x.ravel().view(dtype=[('a', x.dtype), ('b', x.dtype)])
# inplace sort will change x
v.sort(order=('a', 'b'))
x
# alternative, lexsort
x = np.ones((5,2)); x[1:3,0] = 0; x[:,1] = np.arange(5)
x[np.lexsort((x[:,1], x[:,0]))]
With NumPy 1.8 order statistics can be retrieved with partition
which is faster than applying a full sort
np.random.shuffle(d.ravel())
np.partition(d, (2, 8))
data = np.genfromtxt('stockholm_td_adj.dat', names=True)
temp = data['temperature']
year = data['year']
np.mean(temp) # or data['temperature'].mean()
data['temperature'].mean()
np.std(temp), np.var(temp)
temp.min(), temp.max()
temp.argmin(), temp.argmax()
year[temp.argmin()], year[temp.argmax()]
np.median(temp)
np.percentile(temp, [5, 95])
Most functions have an axis
keyword to only do the reduction on a certain axis
d = np.arange(12).reshape((4,3))
print(d)
print('naxis1', d.sum(axis=1))
print('naxis0', d.sum(axis=0))
# numpy >= 1.7, reductions also work over multiply axes:
d = np.arange(5*4*3).reshape((5,4,3))
d.sum(axis=(-2, -1))
#numpy < 1.7 way for trailing axes:
nshape = d.shape[:-2] + (-1,)
view = d.reshape(nshape)
print('new shape', view.shape)
view.sum(axis=1)
A universal function (ufunc) is a function operating on arrays element-by-element. They allow implementing a large set of operations and getting features like broadcasting, type casting and reductions "for free".
E.g. all basic operations like addition, multiplication and also basic math like np.sin
and np.cos
are ufuncs and thus all support the same set of features.
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
# binary ufunc
np.add(a, b)
# unary ufunc
np.sin(a)
r = np.empty_like(b)
np.add(a, b, out=r), r
np.add(a, b, dtype=np.float64)
np.logaddexp(a, b)
Binary ufuncs support reductions of arrays
np.add.reduce(a)
Reductions support the additional keepdims
argument that retains the same dimensionality as the input by adding empty dimensions. This allows applying the result to operands via broadcasting.
a = np.arange(3*4).reshape(3,4)
r = np.add.reduce(a, axis=1, keepdims=True)
r
a - r
accumulate
, apply operation successively onto elements (cumulative sum or product)reduceat
, reduction along slicesat
, unbuffered in place operation: a[indices] += b which allows repeating indicesnp.add.reduceat(np.arange(8),[0,4,6])
We can compute with subsets of the data in an array using indexing, fancy indexing, and the other methods of extracting data from an array (described above).
For example, let's go back to the temperature dataset:
!head -n 3 stockholm_td_adj.dat
The dataformat is: year, month, day, daily average temperature, low, high, location.
If we are interested in the average temperature only in a particular month, say February, then we can create a index mask and use the select out only the data for that month using:
np.unique(data['month']) # the month column takes values from 1 to 12
mask_feb = data['month'] == 2.
np.mean(data['temperature'][mask_feb])
With these tools we have very powerful data processing capabilities at our disposal. For example, to extract the average monthly average temperatures for each month of the year only takes a few lines of code:
months = arange(1,13)
monthly_mean = [np.mean(data['temperature'][data['month'] == month]) for month in months]
fig, ax = subplots()
ax.bar(months, monthly_mean)
ax.set_xlabel("Month")
ax.set_ylabel("Monthly avg. temp.");
Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations.
We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays element-wise
v1 = np.arange(0, 5)
img = np.arange(25).reshape((5, 5))
v1 * 2
img + img
What about matrix mutiplication? There are two ways. We can either use the dot
functions, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments:
np.dot(img, img)
# matrix * vector
np.dot(img, v1)
# vector * vector, scalar product
np.dot(v1, v1)
A dot product applies along the last axis of the first operand and the second to last of the second operand:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
The same broadcasting rules apply to these conditions of the input operands:
m = np.arange(2*3).reshape(2,3)
v = np.arange(3 * 2).reshape(3, 2)
m
v
m.shape, v.shape
np.dot(m, v)
NumPy includes the standard linear algebra functions to work on matrices, e.g. np.det (determinant), numpy.pinv etc. Some of them are placed in numpy.linalg (like eigenvalues, linear solvers, ...)
\(u \times b = u^jv^k\epsilon^i{}_{jk}\)
->
...
a = np.arange(9.).reshape(3,3)
# identity
np.einsum('ij', a) == a
#indices alphabetic -> transpose:
np.einsum('ji', a) == a.T
\(A^i{}_i\)
#trace
a = np.arange(49.).reshape(7,7)
np.einsum('ii', a), a.trace()
np.einsum('ii->i', a), np.diag(a)
\(C_{ik} = \sum_j A_{ij}B_{jk}\)
a = np.arange(12.).reshape(3,4)
b = np.arange(12.).reshape(4,3)
np.einsum('ij,jk->ik', a, b), a.dot(b)
#tensort dot product
a = np.arange(60.).reshape(3,4,5)
b = np.arange(24.).reshape(4,3,2)
np.einsum('ijk,jil->kl', a, b)