Scientific Programming: Analytics

Christian Elsasser (che@physik.uzh.ch)

The content of the lecture might be reused, also in parts, under the CC-licence by-sa 4.0

In [1]:
import scipy
print("%s -- Version: %s"%(scipy.__name__,scipy.__version__))
import numpy as np
print("%s -- Version: %s"%(np.__name__,np.__version__))
scipy -- Version: 1.4.1
numpy -- Version: 1.17.0
In [2]:
import matplotlib.pyplot as plt

Use case 1 - Root-finding in non-linear functions

In [3]:
from scipy import optimize

Let's define the non-linear function $f(x) = x^3+8$ having the root at $-2$.
And define also its first and second derivative

In [4]:
def f_1_0(x):
    return x**3+8
def f_1_0_p(x):
    return 3*x**2
def f_1_0_pp(x):
    return 6*x

First approach: Bisection method

In [5]:
optimize.bisect(f_1_0,a=-10,b=10) # a,b are the interval boundaries for the bisection method
Out[5]:
-1.9999999999993179

We want now to get more information about the result

In [6]:
optimize.bisect(f_1_0,a=-10,b=10,full_output=True) # In newer Scipy version (1.2.1+) full_output is the default output 
Out[6]:
(-1.9999999999993179,
       converged: True
            flag: 'converged'
  function_calls: 46
      iterations: 44
            root: -1.9999999999993179)

The first item is the result as float, the second item a RootResult object

Second approach: Newton method

In [7]:
optimize.newton(f_1_0,x0=1) # Unfortunately newton in Scipy 1.1.0 does not allow a full output, but versions 1.2.1+ do
Out[7]:
-2.0000000000000004
In [8]:
optimize.newton(f_1_0,
                x0=1,
                full_output=True)
Out[8]:
(-2.0000000000000004,
       converged: True
            flag: 'converged'
  function_calls: 6
      iterations: 5
            root: -2.0000000000000004)
In [9]:
type(_[1])
Out[9]:
scipy.optimize.zeros.RootResults

In case of no specification of fprime argument the secant method is invoked, with fprime (first derivative) the Newton-Raphson method is used, and with fprime2 (second derivative) in addition the Halley method is used.

In [10]:
print(optimize.newton(f_1_0,x0=100,fprime=None,   fprime2=None,   ))
print(optimize.newton(f_1_0,x0=100,fprime=f_1_0_p,fprime2=None,   ))
print(optimize.newton(f_1_0,x0=100,fprime=f_1_0_p,fprime2=f_1_0_pp))
-1.9999999999999085
-2.0
-2.0
In [11]:
%timeit optimize.newton(f_1_0,x0=100,fprime=None,   fprime2=None,   )
%timeit optimize.newton(f_1_0,x0=100,fprime=f_1_0_p,fprime2=None,   )
%timeit optimize.newton(f_1_0,x0=100,fprime=f_1_0_p,fprime2=f_1_0_pp)
1.11 ms ± 29.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
467 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
364 µs ± 6.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

But this result can vary depending on the initial guess, here starting closer to the solution would lead to a better performance of the secant case (1st case)

Let's be naughty to the algorithm ...

In [12]:
optimize.newton(f_1_0,x0=0,full_output=True)
Out[12]:
(9.999999999997499e-05,
       converged: True
            flag: 'converged'
  function_calls: 4
      iterations: 3
            root: 9.999999999997499e-05)

... which is obviously a wrong result despite the convergence (In Scipy 1.2.1+ the newton method has also a RootResult object when full output is specified.)

What happens when specifying the first derivative?

In [13]:
optimize.newton(f_1_0,x0=0,fprime=f_1_0_p)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-51e9e30ed6d1> in <module>
----> 1 optimize.newton(f_1_0,x0=0,fprime=f_1_0_p)

/usr/local/lib/python3.7/site-packages/scipy/optimize/zeros.py in newton(func, x0, fprime, args, tol, maxiter, fprime2, x1, rtol, full_output, disp)
    294                         " Failed to converge after %d iterations, value is %s."
    295                         % (itr + 1, p0))
--> 296                     raise RuntimeError(msg)
    297                 warnings.warn(msg, RuntimeWarning)
    298                 return _results_select(

RuntimeError: Derivative was zero. Failed to converge after 1 iterations, value is 0.0.

Third method - Brent's method

In [14]:
optimize.brentq(f_1_0,a=-100,b=100,full_output=True) # brenth also exists as an alternative implementation
Out[14]:
(-1.9999999999996092,
       converged: True
            flag: 'converged'
  function_calls: 21
      iterations: 20
            root: -1.9999999999996092)

We can also speficy parameters of the function in the root-finding.
It always goes for the first parameter as the one in which the search should be performed.

In [15]:
def f_1_1(x,a):
    return x**3+a
In [16]:
optimize.brentq(f_1_1,-100,100,full_output=True,args=(27,))
Out[16]:
(-2.9999999999995026,
       converged: True
            flag: 'converged'
  function_calls: 20
      iterations: 19
            root: -2.9999999999995026)

Use case 2 - Maximum-Likelihood estimation (Optimisation and Distributions)

In [17]:
from scipy import stats

Let's load some data (imagine a sample of measurements and you want to estimate the underlying distribution)

In [18]:
data = np.loadtxt("data_ml.csv")
data
Out[18]:
array([-4.13213207e+00, -2.08913041e+00,  1.11461807e+00,  1.59224122e-01,
       -1.76221163e+00,  2.04277819e+00, -6.65511125e-01, -2.04995514e+00,
        2.13371573e+00, -2.74897103e+00,  4.21481079e+00, -8.50988747e-01,
       -9.71802478e-01,  3.25676655e-01, -1.43108364e+00,  1.08889251e+00,
       -6.38033426e-02,  2.12857629e+00,  1.01102696e+00, -3.61478343e-01,
        4.82079393e+00, -2.91616743e-01,  1.17974363e+00,  1.90949855e-01,
        2.49867012e+00,  6.11863034e+00, -1.70223622e+00,  3.00195470e-01,
       -3.48167371e+00,  1.42249755e+00, -5.56201449e+00, -2.54442132e+00,
        2.78043815e+00,  1.79828387e-01, -3.41758553e-01, -1.16098923e+00,
        2.78321643e+00,  9.90712604e-01, -7.40908124e-01,  2.90223321e+00,
       -2.38957523e+00, -2.64295515e+00, -2.08422264e+00, -2.10920762e+00,
        1.26366130e+00, -2.74692097e+00, -3.48783000e-01,  5.10459078e-01,
        1.73718178e+00, -3.99600803e-01, -3.01240374e+00, -1.56806200e+00,
        5.40788769e-01, -2.24102496e-01, -1.62726898e+00,  2.93926249e+00,
        2.03476960e+00, -2.83039575e+00,  2.09118597e+00, -2.21087920e-01,
        2.76815588e+00, -7.28330743e-01,  2.79188370e-01, -8.48817442e-02,
        5.35144208e-01, -1.57825758e+00,  2.12887888e+00, -1.31421199e+00,
        3.78384503e-01,  1.43337494e+00, -9.52741142e-01, -1.93663546e+00,
        2.18146132e+00,  9.09884431e-01, -1.00018301e+00,  1.60699366e+00,
        1.14545493e+00,  2.68873648e+00, -8.78081644e-01, -7.94530997e-02,
       -5.75026488e-01, -7.28829869e-01, -6.44569436e-01, -1.85066160e+00,
       -1.92352130e+00, -9.58198157e-01, -2.92188681e+00,  1.27493904e+00,
       -4.69658733e+00, -1.88619259e+00, -2.33967395e+00,  2.12947728e+00,
        4.77954787e-01, -2.70828742e+00, -1.41644500e+00,  2.17250251e-02,
       -6.48481323e-01,  6.40702430e-01, -3.58702521e+00, -6.82820855e-01,
       -2.76623924e+00,  4.48175144e+00,  8.11316385e-01,  2.43318803e+00,
        2.80220355e+00,  1.39375344e+00, -2.12539597e+00,  8.90845696e-01,
        2.87295162e-01,  4.70009520e-02,  1.94702603e+00,  3.74183240e+00,
       -6.78777802e-01, -1.42666140e-01, -1.67189160e+00, -7.35632853e-01,
        6.97800430e-01,  1.87548457e+00, -1.02990882e-01, -1.09683045e+00,
       -1.07383671e+00, -1.73351686e+00,  1.15943379e+00, -1.92038624e+00,
        3.41217611e-01, -1.00893787e+00,  1.80222820e+00, -1.25086680e+00,
        5.23223450e-01,  1.37242908e+00, -2.05956698e+00,  2.38768669e+00,
       -7.83045360e-01, -1.32895668e+00, -4.33940894e+00,  4.16900866e-01,
        3.12719500e+00,  1.39886871e+00, -4.06999763e-01, -7.31304350e-01,
       -5.87289057e+00, -2.55334816e+00, -4.13094907e+00, -4.20225346e+00,
       -4.74007789e-01, -2.33053412e+00,  1.91486081e+00, -1.05498895e+00,
        2.19912353e+00,  1.84627193e+00,  1.07388363e+00, -4.29910659e+00,
        2.33132727e+00, -3.27689417e+00, -1.20181225e+00,  1.40841742e+00,
       -2.63494870e+00, -1.05022448e+00,  4.07906902e-01, -2.37717837e+00,
       -1.06228106e+00,  6.29656649e-01, -2.01168831e+00, -7.54924571e-01,
        3.27299235e+00,  1.48421843e+00, -2.77594294e+00, -2.18493551e+00,
        1.75328742e+00,  2.25761250e-01, -1.01772878e+00, -3.80857961e+00,
       -1.40709682e+00,  4.98356930e-01, -1.32071695e+00,  4.54211566e-01,
        1.44803127e+00, -2.84412638e+00,  3.31619726e-01, -1.08294781e+00,
        7.22970931e-02,  9.40823919e-02, -6.74537478e-01, -1.65548442e+00,
        1.91342272e+00, -9.46280011e-01, -1.07293012e+00,  5.39147818e-01,
        1.12664808e+00, -4.71499395e-01, -3.07695519e+00,  6.66250231e-01,
       -2.16099078e-01,  1.95031527e+00, -1.22675007e+00, -2.49510494e+00,
       -1.42948279e-01, -8.88347028e-01,  1.11364617e+00,  4.05204994e+00,
        1.36271242e+00, -5.60869579e-02, -9.32466666e-01,  2.48439062e-01,
        1.10327048e+00, -8.77179059e-01, -2.13296751e+00,  1.35611332e-01,
       -6.06988840e-01,  3.15518557e+00, -1.98234240e+00, -1.63586758e+00,
       -9.75744751e-01,  2.30739818e-01, -2.26972382e+00,  6.54074033e-01,
       -3.89926863e+00,  5.80677219e-01, -2.18791235e+00, -2.57464752e+00,
        2.65658627e-01,  1.08315228e+00, -7.02527359e-01, -2.67474069e+00,
       -8.86844119e-01, -1.73841279e+00, -4.25096377e-01, -3.26147459e-02,
       -2.93255063e-02,  3.26684649e-02,  2.46961636e+00, -2.39117685e+00,
        2.65629868e+00, -6.78341919e-01,  5.05632614e-01, -1.88797396e+00,
       -3.95361810e+00,  9.57619257e-01,  4.52791099e-01,  2.09003814e-01,
       -2.32644867e-01, -7.14224815e-01, -8.77621087e-01,  5.18874511e+00,
       -3.63204217e-01,  1.26686661e+00,  3.35401168e-01,  6.48896394e-01,
       -2.46127330e+00, -4.39484550e+00, -3.53306325e+00,  1.82729642e+00,
       -2.01126771e-01,  1.65548934e-01,  6.24343264e-01, -1.93409810e+00,
       -1.59662879e+00, -5.79268419e-01,  3.95841208e-01, -2.31433929e+00,
       -1.22688169e+00, -2.16571140e-02,  2.72107465e+00, -1.12254242e+00,
        1.56436399e-01, -1.05999951e+00, -3.67531089e+00,  1.14420653e+00,
        1.23775615e+00,  9.43256317e-01,  2.19621410e+00, -1.77192715e+00,
        2.41842329e-01, -2.73143690e+00,  1.07013700e+00,  1.34068706e+00,
       -1.58360611e-01,  3.14122054e+00,  5.30812495e-01, -7.25669395e-01,
       -8.66908143e-01, -1.29888936e+00, -2.17701300e+00,  3.18421623e+00,
       -5.75057103e-01, -3.93315452e+00,  6.18928946e-01,  4.83708179e-01,
       -4.69287231e-01, -7.14957970e-01, -3.48487500e+00, -1.04254101e+00,
        8.68067208e-01,  1.61304792e+00, -2.18052257e-01, -3.86025829e+00,
       -1.16702123e+00, -1.19572073e+00,  2.87042773e+00,  2.33652136e+00,
        2.14976941e+00, -5.47069983e-02, -3.17225276e+00, -1.62259153e+00,
       -7.25875127e-01, -2.55553780e+00,  3.65929840e+00,  3.15057456e-01,
       -2.31655839e+00,  4.52286054e-01, -1.72805335e+00, -1.18642118e+00,
       -6.84914030e-01, -4.57601610e-01,  1.64374790e+00, -7.79602847e-01,
       -4.49057905e-01, -1.50779079e-01,  1.24569471e+00,  7.06540033e-01,
       -1.58204842e+00, -1.70004116e+00, -2.39133896e+00, -1.06164307e+00,
        3.30659346e+00, -3.05392263e+00,  2.52321204e-01, -2.50037840e+00,
       -3.03095577e-01, -3.27400525e+00, -2.17590802e+00, -1.18576485e-01,
        5.41634237e+00, -9.51379036e-01, -4.06572426e+00, -1.84219077e-01,
       -1.74273358e+00, -9.66977407e-01, -1.36834322e+00, -3.67642960e+00,
        2.52672207e+00, -2.02033687e+00,  2.17077326e-01,  7.47682043e-01,
        2.35420159e+00,  2.67762848e+00, -1.42080358e+00,  2.75138528e+00,
       -1.60043683e+00,  8.78479485e-01, -1.99325550e+00, -1.78250427e+00,
       -1.26815423e+00, -4.88216593e-01,  2.94104275e+00,  2.79372481e+00,
        1.43266194e+00,  5.52150386e-01, -5.52913274e-01, -2.44771576e+00,
        3.54276542e+00, -2.79018686e+00,  2.20445790e+00,  1.70425549e+00,
        9.34544872e-01,  1.03300686e+00, -7.39362791e-01, -2.12576677e+00,
       -2.39306174e+00,  3.51916362e+00,  9.15439302e-02,  2.51170320e+00,
        7.77807922e-01, -4.47402895e-01, -1.73095122e+00, -1.59262228e+00,
       -1.73036397e+00,  2.46961226e+00,  2.41387555e+00, -8.93164763e-01,
        1.40276571e+00,  6.76442625e-01, -9.38254529e-01, -7.63660516e-01,
        4.89431449e-02, -1.69330206e-01, -6.14778320e-01, -2.12171402e-01,
        1.73640928e+00,  7.77206927e-01, -2.53870458e-01,  7.81905794e-01,
        4.63430489e-01,  1.20681064e-01,  1.22174910e+00, -1.96836788e+00,
       -2.03335308e+00,  1.38983935e+00, -5.55564366e+00,  1.59824888e+00,
       -2.11376643e+00,  5.40922073e-01,  2.28524600e-01, -2.27406863e+00,
        1.26773917e+00, -7.97888840e-01, -2.77442712e+00, -9.91415301e-01,
        1.62164236e-01, -3.92846203e+00,  1.62824364e+00, -2.60483577e+00,
        2.15585610e+00, -1.01349791e-01, -9.13252502e-01, -2.65369244e-01,
       -5.05568353e+00,  3.75984688e-01, -1.26804400e+00, -1.76116343e+00,
        1.92354415e+00,  7.04701040e-02,  3.08221425e+00, -4.82861018e+00,
       -1.94433245e+00,  1.28683334e+00,  1.03615145e+00, -1.29060416e+00,
       -8.89703204e-01, -2.49052877e+00, -4.90142188e-01,  2.15967969e+00,
        2.57010989e-01,  1.27005210e+00,  5.17455712e-01, -1.05577127e+00,
       -1.48798507e+00,  4.44023964e-01,  1.14437173e+00, -5.25823450e-01,
       -4.99641728e-01, -1.80756570e+00, -1.64689149e-01,  1.63242999e+00,
        1.32718451e+00,  6.00757493e-01, -2.45933430e+00, -3.09142906e+00,
        1.02117032e+00, -2.37325729e+00, -5.30385196e-01, -6.81206141e-01,
       -1.39431731e-01,  1.05794370e+00, -2.34712607e+00,  1.33806111e+00,
       -7.45630387e-01, -1.32109697e-02,  2.21113823e+00, -4.31850946e-01,
       -7.13042121e-01, -9.28084393e-01, -4.26154340e+00, -1.99235485e-01,
       -2.21997421e+00,  2.48096492e+00,  1.76108031e+00,  2.41710833e+00,
        8.58313163e-01,  1.25073184e-01,  1.55557926e+00, -3.82878244e+00,
        2.06392321e+00,  4.86256345e-01, -7.44241859e-01,  4.32264371e+00,
        1.76517975e+00, -4.88650984e-01,  1.42418660e+00, -1.66956370e+00,
       -7.02120987e-01, -1.87972845e+00, -7.77205810e-01, -4.01690766e+00,
        2.26043860e+00, -4.50391997e-01, -4.02216523e+00, -1.51658196e+00,
        4.90770756e-01,  1.78383941e+00, -2.18347159e+00,  2.23733721e+00,
        9.51071380e-01,  1.30774479e+00,  1.02637695e+00, -4.42366354e-01,
       -2.42566259e-02,  1.17120600e+00, -1.28894053e+00, -1.24041530e+00,
       -1.98588809e+00,  1.54078474e+00, -2.91637880e+01, -5.61537586e+00,
       -1.95959440e+01,  7.92083288e+00,  1.33913715e+00,  1.60793535e+01,
       -1.12211211e+01,  2.79473207e+00, -3.28738025e+00,  6.27426471e-01,
       -7.34566387e-02, -8.20950900e+00, -2.26566689e+01,  1.67904483e+00,
       -1.11515361e+00,  7.29163333e+00,  1.49404404e+01, -6.53294020e+00,
       -1.16916228e+01, -6.93870646e-01,  2.74967451e+01, -2.24124809e+00,
        9.19692881e+00,  2.70172706e+01,  3.53444929e+00, -1.03096919e+00,
        1.62867226e+01, -1.75666381e+00, -4.01876663e+00, -1.48126758e+00,
       -2.61478672e+01,  6.93166661e+00, -4.05316946e-01,  8.44564251e+00,
       -3.20288266e+00,  6.65925671e+00, -7.50701907e-01,  5.09010629e+00,
       -1.01731666e+01,  1.38355710e+01,  2.87436080e+00, -4.52408092e-01,
        1.49782520e+01,  9.58039994e-01, -1.16290262e+01,  5.66818703e+00,
       -1.74371699e+01,  7.08932077e+00, -6.97272909e-02, -1.49242251e+01,
       -5.50865781e-01, -9.84775521e+00,  2.48849878e+00, -5.80476879e-01,
        2.80555620e+00, -8.32935606e+00,  3.17395959e+00,  7.20585505e+00,
       -6.74830845e+00,  1.38496313e+01, -1.76312088e+01,  6.01508591e+00,
       -2.38603206e+00, -2.23038867e+01,  6.02748783e+00,  3.98033353e-01,
        1.39733020e+01,  1.61723649e+00,  4.68194906e-01,  2.38724908e+01,
       -1.81784942e+01,  1.40139603e+01, -1.44941239e+01,  6.76269898e+00,
       -6.24627321e+00,  7.88161325e+00, -1.51759911e+01,  1.16381111e+01,
        2.86319338e+01, -7.27001950e-01,  2.29605000e+01, -5.10816319e+00,
        8.26439939e+00,  3.19342105e+00, -2.39657533e+01, -7.01316056e+00,
        1.35798437e+01,  4.24537197e+00,  1.77945185e+01, -2.04825195e+01,
       -4.35263152e+00, -3.83496197e+00,  1.90676918e+01,  9.97890868e+00,
       -1.70791536e+01,  7.40232114e+00,  1.67596473e+01, -1.91453643e+00,
        1.00291762e+01,  5.71500683e+00, -3.07926987e+00, -1.57683273e+01,
       -6.91918252e+00,  8.07654101e+00, -1.22707170e+01,  2.91237123e-01,
       -1.14733692e-01, -1.43950095e+01, -5.37499265e+00, -1.35492114e+01,
       -4.60273397e+00, -1.27092383e+00, -4.78901210e+00, -2.52192459e+00,
       -1.15764787e+01,  3.72334346e+00,  3.05644296e+00,  4.47621922e+00,
       -1.88376849e+01,  5.94700980e+00, -4.37102655e+00,  1.49204637e+01,
       -5.75278381e+00,  4.25168421e+00, -1.25426305e+01, -3.61568648e+00,
        1.24695840e+01, -2.51536588e+00, -1.52278855e+01,  8.33917318e+00,
       -1.75316559e+00,  1.36840679e+01, -4.13532819e+00,  1.48423396e+01,
        2.06671655e+01,  5.39663416e-01, -4.49805506e+00,  2.11967936e+01,
       -2.57516331e+01,  1.37464299e+00,  6.17562793e+00, -9.78255959e+00,
        1.94477092e+00, -2.04413636e+01,  1.08805415e+01,  9.14142179e+00,
       -3.49281723e+00, -1.21211503e+00, -9.62205477e+00,  5.15731938e+00,
       -1.03519030e+01, -2.47768132e+00, -2.31752353e+00,  1.87601666e+01,
        7.47181398e-01, -1.36432734e+01,  7.97864198e+00,  8.44267017e+00,
       -1.03672922e+01, -3.28013355e+00,  1.52444558e+01, -1.23851940e+01,
       -1.08952095e+01, -1.89442486e+01, -7.00305480e-02,  7.82369729e+00,
        6.56159825e+00,  1.96170712e+01, -9.60115780e+00, -1.97783873e+01,
        4.79934103e+00, -7.87219401e+00,  9.11986826e-01,  7.01840375e+00,
       -1.67486632e-01,  1.67137316e+00,  8.13301991e+00,  1.25257128e+01,
       -1.21846874e+01,  2.10608599e+01, -3.39222552e+00,  1.20489163e+01,
       -1.30076977e+01, -1.43686165e+01,  8.93679342e+00,  1.39232254e+01,
       -2.25427581e+01,  6.39129061e+00,  1.33965387e+01, -1.52334101e+01,
        1.38267494e+00,  1.04738787e+01, -5.97205720e+00, -4.08133801e+00,
       -4.57270609e+00,  4.30711216e+00,  1.61869636e+01,  1.04568435e+01])

Let's quickly inspect them as a histogram data

In [19]:
bins = np.linspace(-10,10,51)
h,bins = np.histogram(data,bins)
In [20]:
fig = plt.figure("data",(6,4))
ax = plt.subplot()
ax.bar((bins[:-1]+bins[1:])*0.5,h,width=bins[1:]-bins[:-1],color="grey")
ax.set_xlabel("Measurement variable")
ax.set_ylabel("Number of observations")
ax.set_xlim([-10,10])
plt.show()
In [21]:
print(stats.distributions.__all__)
['entropy', 'rv_discrete', 'rv_continuous', 'rv_histogram', 'ksone', 'kstwobign', 'norm', 'alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'burr', 'burr12', 'fisk', 'cauchy', 'chi', 'chi2', 'cosine', 'dgamma', 'dweibull', 'expon', 'exponnorm', 'exponweib', 'exponpow', 'fatiguelife', 'foldcauchy', 'f', 'foldnorm', 'weibull_min', 'weibull_max', 'frechet_r', 'frechet_l', 'genlogistic', 'genpareto', 'genexpon', 'genextreme', 'gamma', 'erlang', 'gengamma', 'genhalflogistic', 'gompertz', 'gumbel_r', 'gumbel_l', 'halfcauchy', 'halflogistic', 'halfnorm', 'hypsecant', 'gausshyper', 'invgamma', 'invgauss', 'geninvgauss', 'norminvgauss', 'invweibull', 'johnsonsb', 'johnsonsu', 'laplace', 'levy', 'levy_l', 'levy_stable', 'logistic', 'loggamma', 'loglaplace', 'lognorm', 'gilbrat', 'maxwell', 'mielke', 'kappa4', 'kappa3', 'moyal', 'nakagami', 'ncx2', 'ncf', 't', 'nct', 'pareto', 'lomax', 'pearson3', 'powerlaw', 'powerlognorm', 'powernorm', 'rdist', 'rayleigh', 'loguniform', 'reciprocal', 'rice', 'recipinvgauss', 'semicircular', 'skewnorm', 'trapz', 'triang', 'truncexpon', 'truncnorm', 'tukeylambda', 'uniform', 'vonmises', 'vonmises_line', 'wald', 'wrapcauchy', 'gennorm', 'halfgennorm', 'crystalball', 'argus', 'binom', 'bernoulli', 'betabinom', 'nbinom', 'geom', 'hypergeom', 'logser', 'poisson', 'planck', 'boltzmann', 'randint', 'zipf', 'dlaplace', 'skellam', 'yulesimon']

First approach: Gaussian distribution - we define the function describing the probability density with p[0] as $\mu$ and p[1] as $\sigma$.

In [22]:
f_gauss = lambda x,p: stats.norm(p[0],p[1]).pdf(x)
# Non-lambda definition:
# def f_gauss(x,p)
#     return stats.norm(p[0],p[1]).pdf(x)
In [23]:
stats.norm?

!!! The order of the parameters is in scipy not always obvious and still not well documented !!!
It follows for many distribution a location-scale approach (1st parameter: location, 2nd: scale), even for distributions like the expotential distribution that do not belong to this type of distribution

Let's define the negative log-likelihood function that we want to minimize in a general fashion

In [24]:
def log_likelihood(p,data,f): # 'f' describes the probability density function we use
    return -np.log(f(data,p)).sum()
# here the data & distribution acts as parameter

Let's minimise the log-likelihood function with the data and starting parameters $\mu=0,\sigma=2.5$
We use as first minimisation method the Conjugate Gradient method

In [25]:
res_g = optimize.minimize(log_likelihood,[0.0,2.5],args=(data,f_gauss),method="CG")
print(res_g)
     fun: 2281.287183701291
     jac: array([0., 0.])
 message: 'Optimization terminated successfully.'
    nfev: 80
     nit: 9
    njev: 20
  status: 0
 success: True
       x: array([-0.14330203,  6.29681032])

For completeness let's check the analytical results which are the mean and the standard deviation of the sample

In [26]:
print("mu    = {0:10.7f}".format(np.mean(data)))
print("sigma = {0:10.7f}".format(np.std(data)))
mu    = -0.1433018
sigma =  6.2968100
In [27]:
res_g = optimize.minimize(log_likelihood,[0.0,2.5],   args=(data,f_gauss),method="TNC")
print(res_g)
     fun: 2281.287183962775
     jac: array([0.00263753, 0.00213731])
 message: 'Converged (|f_n-f_(n-1)| ~= 0)'
    nfev: 51
     nit: 13
  status: 1
 success: True
       x: array([-0.14315271,  6.29687087])
In [28]:
res_g["x"]
Out[28]:
array([-0.14315271,  6.29687087])

We create the fitted distribution

In [29]:
f_gauss_fit = stats.norm(*(res_g.x))

Let's try a t-distribution to accomodate for the longer tails. p[0] as number of degrees of freedom, p[1] as centre, p[2] as width parameter.

In [30]:
f_t = lambda x,p: stats.t(p[0],p[1],p[2]).pdf(x)
In [31]:
res_t = optimize.minimize(log_likelihood,[11,0.0,2.6],   args=(data,f_t),method="Nelder-Mead")
print(res_t)
 final_simplex: (array([[ 1.31372267, -0.25351935,  1.91997389],
       [ 1.3136852 , -0.25356857,  1.91997866],
       [ 1.31362542, -0.25354848,  1.91992113],
       [ 1.31367436, -0.25358207,  1.91990748]]), array([2036.97856709, 2036.97856712, 2036.97856727, 2036.97856729]))
           fun: 2036.9785670880706
       message: 'Optimization terminated successfully.'
          nfev: 302
           nit: 167
        status: 0
       success: True
             x: array([ 1.31372267, -0.25351935,  1.91997389])

Let's do some testing for model selection with the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC):

  • $AIC = 2k - 2\log\mathcal{L}$
  • $BIC = \log(n)k - 2\log\mathcal{L}$
where $\mathcal{L}$ is the maximised likelihood, $k$ is the number of estimated parameters and $n$ the number of data points. A large difference between two models correspond to a significant difference in performance in favour of the model with the lower value.

In [32]:
AIC_g = 2*len(res_g.x)+2*res_g.fun
AIC_t = 2*len(res_t.x)+2*res_t.fun
BIC_g = np.log(len(data))*len(res_g.x)+2*res_g.fun
BIC_t = np.log(len(data))*len(res_t.x)+2*res_t.fun
In [33]:
print("AIC Gauss:  {0:8.1f}".format(AIC_g))
print("AIC t-dist: {0:8.1f}".format(AIC_t))
print("BIC Gauss:  {0:8.1f}".format(BIC_g))
print("BIC t-dist: {0:8.1f}".format(BIC_t))
AIC Gauss:    4566.6
AIC t-dist:   4080.0
BIC Gauss:    4575.7
BIC t-dist:   4093.6

Let's create the fitted t-distribution

In [34]:
f_t_fit = stats.t(*(res_t.x))
In [35]:
x_vec = np.linspace(-10,10,201)
fig = plt.figure("data",(6,4))
fig,ax = plt.subplots()
ax.bar((bins[:-1]+bins[1:])*0.5,h,width=bins[1:]-bins[:-1],color="grey")
ax.plot(x_vec,f_t_fit.pdf(x_vec)*data.size*(bins[1:]-bins[:-1]).mean(),color="#a00014")
ax.plot(x_vec,f_gauss_fit.pdf(x_vec)*data.size*(bins[1:]-bins[:-1]).mean(),color="#0014a0")
ax.set_xlabel("Measurement variable")
ax.set_ylabel("Number of observations")
ax.legend(["t-dist","Gauss","Data"])
ax.set_xlim([-10,10])
plt.show()
<Figure size 432x288 with 0 Axes>

Sampling 100'000 = (500 by 200) random measurements according to the estimated t-distribution

In [36]:
mc_data = f_t_fit.rvs((500,200))
print("Shape: "+str(mc_data.shape))
print(mc_data)
Shape: (500, 200)
[[-1.63444046e-02  7.32869666e-01 -2.27090279e-02 ...  2.34427230e+01
  -7.22302458e-01 -1.41662481e+01]
 [ 1.52763125e+01 -6.26366321e-01  5.51999087e-01 ... -7.92246352e-01
   6.78748861e+00 -2.46178582e+00]
 [-3.06915818e+00 -1.08193395e-01  8.34663280e+00 ... -9.84926896e-01
   2.59960644e-01  7.71110431e-01]
 ...
 [-3.79318094e+00  1.32214523e-01 -8.84114508e-01 ...  4.00268513e+00
   1.04495959e+00 -4.20744436e-01]
 [ 2.37304758e+00 -6.98082650e+00  1.96912749e+00 ... -2.33994718e+00
  -5.97791575e+00 -1.83579660e+00]
 [-5.11542654e-03  4.28830108e+00  7.93389208e+00 ...  2.46714054e-01
   5.10203658e+00 -1.00414530e+01]]

Histogram the measurments and plot them

In [37]:
bins = np.linspace(-10,10,51)
h,bins = np.histogram(mc_data,bins)
In [38]:
fig = plt.figure("mc_data",(6,4))
ax = plt.subplot()
ax.bar((bins[:-1]+bins[1:])*0.5,h,width=bins[1:]-bins[:-1],color="grey")
ax.plot(x_vec,f_t_fit.pdf(x_vec)*mc_data.size*(bins[1:]-bins[:-1]).mean(),color="#a00014")
ax.set_xlabel("Measurement variable")
ax.set_ylabel("Number of observations")
ax.set_xlim([-10,10])
plt.show()

Get the 5% and the 95% quantile with ppf method of the distribution (percentile point function = inverse of the cumulative distribution function)

In [39]:
print(" 5% quantile: {0:6.3f}".format(f_t_fit.ppf(0.05)))
print("95% quantile: {0:6.3f}".format(f_t_fit.ppf(0.95)))
 5% quantile: -8.470
95% quantile:  7.963

Certain minimisation method require you to indicate the first derivative (jacobian) or even second derivate (hessian) as analytical function.
Let's define a simple 2D-function: $f(x_0,x_1) = (x_0-5)^2+(x_1-6)^4$

In [40]:
def f_2_0(x):
    return (x[0]-5)**2+(x[1]-6)**4
def f_2_0_p(x):
    return np.array([
        2*(x[0]-5),
        4*(x[1]-6)**3,
        ])
def f_2_0_pp(x):
    return np.array([
        [
            2,
            0
        ],
        [
            0,
            4*3*(x[1]-6)**2
        ]
        ])

Setting the initial guess

In [41]:
p_init = np.array([5,2]) 
# Please note that this is a special case where one of the coordinates is already at the optimal value

The usual Conjugate Gradient method will fail ...

In [42]:
optimize.minimize(f_2_0,p_init,method="CG")
Out[42]:
     fun: 15.369536160000013
     jac: array([  0.       , -31.0495677])
 message: 'Desired error not necessarily achieved due to precision loss.'
    nfev: 67
     nit: 1
    njev: 16
  status: 2
 success: False
       x: array([5.  , 4.02])

But switching to the Newton

In [43]:
optimize.minimize(f_2_0,p_init,jac=f_2_0_p,method="Newton-CG")
Out[43]:
     fun: 1.3715480756696346e-09
     jac: array([ 0.00000000e+00, -9.01505266e-07])
 message: 'Optimization terminated successfully.'
    nfev: 18
    nhev: 0
     nit: 17
    njev: 68
  status: 0
 success: True
       x: array([5.        , 5.99391441])
In [44]:
optimize.minimize(f_2_0,p_init,jac=f_2_0_p,hess=f_2_0_pp,method="Newton-CG")
Out[44]:
     fun: 1.3753061522324479e-09
     jac: array([ 0.00000000e+00, -9.03357242e-07])
 message: 'Optimization terminated successfully.'
    nfev: 18
    nhev: 17
     nit: 17
    njev: 34
  status: 0
 success: True
       x: array([5.        , 5.99391024])

Let's see what adding the hessian can do in terms of performance

In [45]:
%timeit optimize.minimize(f_2_0,np.array([100,100]),jac=f_2_0_p,method="Newton-CG")
%timeit optimize.minimize(f_2_0,np.array([100,100]),jac=f_2_0_p,hess=f_2_0_pp,method="Newton-CG")
2.39 ms ± 55.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.89 ms ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Depending on the situation (number of dimensions, starting position) the speed can be double!

Use case 3 - Linear equation solving (Matrices & Linear Algebra)

Previously Numpy provided the matrix class which is now depreciated in favour of the ndarray class, i.e. the standard numpy arrary. Matrix multiplication works via the @ operator, matrix inversion and calculation of the Hermitian goes via the corresponding functions.

In [46]:
import warnings
warnings.filterwarnings("default")

Eigenvalues and Eigenvectors

Let's create a diagonal matrix

In [47]:
d = np.matrix(np.diag([1,2,3,4]))
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:1: PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
  """Entry point for launching an IPython kernel.

OK, so let's use ndarrays instead of np.matrices

In [48]:
d = np.diag([1,2,3,4])
print(d)
[[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]
In [49]:
# Create a 4x4 matrix with random entries and its inverse
s = np.array(np.random.rand(4,4))
s_inv = np.linalg.inv(s)

Let's de-diagonalise the diagonal matrix by $M = S^{-1}DS$

In [50]:
m = s_inv@d@s # matrix multiplication
print(m)
[[  8.90454696   2.85858449   6.12132239   5.4431712 ]
 [-45.41404731 -19.89079796 -33.17509451 -42.83171715]
 [ -1.56277176  -0.61116644  -0.26649042  -1.18022755]
 [ 19.4323554    9.75036996  14.72627001  21.25274141]]

* would simply lead to an element-wise multiplication.

In [51]:
s_inv*d*s
Out[51]:
array([[ 0.07775095, -0.        , -0.        ,  0.        ],
       [-0.        , 23.03148643,  0.        , -0.        ],
       [ 0.        ,  0.        ,  0.21760286, -0.        ],
       [ 0.        , -0.        , -0.        , 42.55889406]])

Let's calculate the eigenvalues and eigenvectors

In [52]:
values,transf = np.linalg.eig(m)
print(values)
print(transf)
[4. 1. 2. 3.]
[[ 0.13165807 -0.02399464 -0.08640801 -0.27024927]
 [-0.91171571  0.92581611  0.91730314  0.9302555 ]
 [-0.02505076 -0.07212693  0.01449287  0.04352048]
 [ 0.38834659 -0.37025193 -0.3884309  -0.24432767]]

Solving linear equations

For solving linear equations there are also dedicated algorithms available in the Scipy package: \begin{align} 5 & = 6x_0+4x_1-2x_2 \\ 3 & = 3x_0+8x_1+2x_2 \\ 7 & = 8x_0+3x_1+x_2 \end{align}

In [53]:
A = np.array([[6,4,-2],[3,8,2],[8,3,1]])
b = np.array([5,3,7])

Let's do it "by hand": $x = A^{-1}b$

In [54]:
scipy.linalg.inv(A).dot(b)
Out[54]:
array([0.85057471, 0.02873563, 0.1091954 ])

Let's used the solving algorithm

In [55]:
scipy.linalg.solve(A,b)
Out[55]:
array([0.85057471, 0.02873563, 0.1091954 ])

Let's take some larger matrices to see what the difference in speed can be

In [56]:
N = 1000
A = np.random.rand(N,N)
b = np.random.rand(N,)
In [57]:
%timeit scipy.linalg.inv(A).dot(b)
%timeit scipy.linalg.solve(A,b)
40 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
19.3 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The solve methods is approximately twice as fast as the evaluation "by hand" ($x=A^{-1}b$).
So using built-in methods can support speed up!

Use case 4 - Signal processing (Integration & Differentiation)

Differentiation

In [58]:
from scipy.misc import derivative
In [59]:
def f_4_1(x):
    return np.sin(x)

The derivative calculation is based on the Central Finite Differential Method

In [60]:
derivative(f_4_1,
           0,
           dx=1e-1,
           n=4,
           order=15
          ) 
# `Order' is a confusing name since it is not the order of the derivative
# It is the number of points to be used in the calculation
# and it has to be an odd number with >n
Out[60]:
-3.011425744187112e-11

Integration

In [61]:
from scipy import integrate

Let's integrate $f(x) = 1/x^2$ between $10^{-2}$ to $1$ (analytical value: 99)

In [62]:
def f_4_2(x):
    return 1/x**2

Scipy has methods to calculate based on a sample of $x_i$ and corresponding $y_i = f(x_i)$ the integral based on specific rules (Newton-Cotes methods)

In [63]:
x = np.linspace(0.01,1.00,100)
y = f_4_2(x)

Let's first go for the Trapezoidal rule

In [64]:
integrate.trapz(y,x)
Out[64]:
113.49339001848928

The integral shows a significant error due to the diverging behaviour towards 0.
Let's try the Simpson's rule.

In [65]:
integrate.simps(y,x)
Out[65]:
107.24340693853145

Better due to the higher order, but still not good. So let's go for an adaptive algorithm,

In [66]:
integrate.quad(f_4_2,a=0.01,b=1) # First return value is the integral, second the estimated error
Out[66]:
(99.0, 3.008094132594605e-07)

Let compare the different algorithms in terms of speed.

In [67]:
%timeit integrate.trapz(y,x)
%timeit integrate.simps(y,x)
%timeit integrate.quad(f_4_2,a=0.01,b=1)
13.3 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
68.4 µs ± 3.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
72.6 µs ± 2.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Let's see what happens when we have a function with a discontinuity with $f(x) = \{x<0\}$ (1 for $x$ smaller than 0, 0 for $x$ larger than zero)

In [68]:
def f_4_3(x):
    return (x<0)*1.0

We integrate from -1 to x>0 so the integral should be always one!

In [69]:
print(integrate.quad(f_4_3,a=-1,b=1))
print(integrate.quad(f_4_3,a=-1,b=10))
print(integrate.quad(f_4_3,a=-1,b=1000))
(1.0, 1.1102230246251565e-14)
(0.9999999999999989, 2.9976021664879227e-15)
(0.0, 0.0)

If the range is much larger than the resolution required to catch the discontinuity, we are ending with wrong results (issue not even captured by the uncertainty)

points allows to specify which point are particular points that need to be taken into account in the spacing, and can be even specified as a function.

In [70]:
print(integrate.quad(f_4_3,a=-1,b=1000,points=[0]))
(1.0, 1.1102230246251565e-14)

Scipy also provides nquad for multidimensional integration

In [71]:
def f_4_4(x0,x1,x2):
    return 1.0/(x0**2*x1**2*x2**2)
In [72]:
integrate.nquad(f_4_4,ranges=((0.1,1),(0.1,1),(0.1,1))) # It should be 9^3 = 729
Out[72]:
(728.9999999999995, 2.0810636134419025e-06)
In [73]:
%timeit integrate.nquad(f_4_4,ranges=((0.1,1),(0.1,1),(0.1,1)))
610 ms ± 5.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Application of integration: Fourier Transform

Scipy also includes a sub-module for fast-fourier transforms (scipy.fft) which will be covered in a walk-through tutoriaL

In [74]:
from scipy.integrate import quad
from scipy import stats

We are generating two signals in the form of functions

  • $f(t) = \frac{1}{\sqrt{2\pi}\sigma}e^{-(t-t_0)^2/2\sigma^2}$ (Gauss curve $\rightarrow$ bang)
  • $f(t) = \frac{1}{\sqrt{2\pi}\sigma}e^{-(t-t_0)^2/2\sigma^2}\sin((t-t_0)\omega)$ (Gauss with sin wave)
Here with $t_0=0$, $\sigma=0.5$ and $\omega=2\pi$

In [75]:
# Generating two signals
f_gauss = lambda x: stats.norm(0,0.5).pdf(x) # Gaussian signal
f_g_osc = lambda x: f_gauss(x)*np.cos(2*np.pi*x) # ... with oscillation
In [76]:
# time range considered
t = np.linspace(-5,5,501)

Let's plot the two functions

In [77]:
plt.figure("signals",[8,12])
fig,ax = plt.subplots()
ax.plot(t, f_gauss(t),'b')
ax.plot(t, -f_gauss(t),'b--')
ax.plot(t, f_g_osc(t),'r')
ax.set_xlim([t.min(),t.max()])
ax.set_ylim([-1.0,1.0])
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$A(t)$")
plt.show()
<Figure size 576x864 with 0 Axes>

Let's define a function that returns the integrand of the Fourier Transformation for an arbitrary function $f:\mathbb{R}\rightarrow\mathbb{R}$ ($f(t)\rightarrow f(t)\cos(\omega t)$).
We only consider the real part hence using $\cos(\omega t)$ instead of $\exp(i\omega t)$

In [78]:
def get_fourier_integrand(f):
    return lambda t,w: np.cos(w*t)*f(t)

Let's define the functions describing the Fourier Transformation $\omega \rightarrow A(\omega)$ for our two signals as $\omega \rightarrow \int_{-t'}^{t'}\mathrm{d}t f(t)\cos(\omega t)$ with $t'=10$

In [79]:
# Define Fourier integrands for a given omega
f_i_gauss = lambda w : quad(get_fourier_integrand(f_gauss),-10,10,args=(w,))
f_i_g_osc = lambda w : quad(get_fourier_integrand(f_g_osc),-10,10,args=(w,))
In [80]:
f_i_gauss(1.0) # first entry: integration result, second entry: upper-bound on the error
Out[80]:
(0.8824969025845957, 6.463572515755245e-11)

But we cannot apply a vector of $\omega$ to this function

In [81]:
f_i_gauss(np.array([0,0.5,1.0]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-81-806ea94977c1> in <module>
----> 1 f_i_gauss(np.array([0,0.5,1.0]))

<ipython-input-79-972e2d237755> in <lambda>(w)
      1 # Define Fourier integrands for a given omega
----> 2 f_i_gauss = lambda w : quad(get_fourier_integrand(f_gauss),-10,10,args=(w,))
      3 f_i_g_osc = lambda w : quad(get_fourier_integrand(f_g_osc),-10,10,args=(w,))

/usr/local/lib/python3.7/site-packages/scipy/integrate/quadpack.py in quad(func, a, b, args, full_output, epsabs, epsrel, limit, points, weight, wvar, wopts, maxp1, limlst)
    340     if weight is None:
    341         retval = _quad(func, a, b, args, full_output, epsabs, epsrel, limit,
--> 342                        points)
    343     else:
    344         if points is not None:

/usr/local/lib/python3.7/site-packages/scipy/integrate/quadpack.py in _quad(func, a, b, args, full_output, epsabs, epsrel, limit, points)
    451     if points is None:
    452         if infbounds == 0:
--> 453             return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
    454         else:
    455             return _quadpack._qagie(func,bound,infbounds,args,full_output,epsabs,epsrel,limit)

TypeError: only size-1 arrays can be converted to Python scalars

Luckily, numpy has a functionality to vectorize functions

In [82]:
# Vectorise the integrands (allows to calculate the integration for different omega in parallel)
int_f_gauss = np.vectorize(f_i_gauss)
int_f_g_osc = np.vectorize(f_i_g_osc)
In [83]:
# Spectrum we are considering
w = np.linspace(0,10,51)
In [84]:
# Actual performance of the integration
yw_gauss = int_f_gauss(w)
yw_g_osc = int_f_g_osc(w)
In [85]:
yw_gauss
Out[85]:
(array([1.00000000e+00, 9.95012479e-01, 9.80198673e-01, 9.55997482e-01,
        9.23116346e-01, 8.82496903e-01, 8.35270211e-01, 7.82704538e-01,
        7.26149037e-01, 6.66976811e-01, 6.06530660e-01, 5.46074427e-01,
        4.86752256e-01, 4.29557358e-01, 3.75311099e-01, 3.24652467e-01,
        2.78037300e-01, 2.35746077e-01, 1.97898699e-01, 1.64474457e-01,
        1.35335283e-01, 1.10250525e-01, 8.89216175e-02, 7.10053537e-02,
        5.61347628e-02, 4.39369336e-02, 3.40474547e-02, 2.61214099e-02,
        1.98410947e-02, 1.49207861e-02, 1.11089965e-02, 8.18870101e-03,
        5.97602290e-03, 4.31784001e-03, 3.08871541e-03, 2.18749112e-03,
        1.53381068e-03, 1.06476624e-03, 7.31802419e-04, 4.97955422e-04,
        3.35462628e-04, 2.23745794e-04, 1.47748360e-04, 9.65934137e-05,
        6.25215038e-05, 4.00652974e-05, 2.54193465e-05, 1.59667839e-05,
        9.92950431e-06, 6.11356797e-06, 3.72665317e-06]),
 array([8.67103044e-10, 8.71759742e-10, 8.65823089e-10, 7.86216492e-10,
        5.27645140e-10, 6.46357252e-11, 8.16586275e-10, 4.80060338e-09,
        7.97553740e-09, 8.25985364e-12, 5.33560001e-12, 4.00755154e-12,
        2.55833889e-12, 5.48197209e-13, 2.19692646e-12, 2.03017172e-11,
        3.15272127e-11, 3.68896589e-11, 4.21640461e-11, 4.12359313e-11,
        2.85877887e-11, 2.09011249e-12, 5.64819439e-11, 1.31280532e-10,
        2.23145719e-10, 3.11740273e-10, 3.71356595e-10, 3.53420517e-10,
        5.43545490e-11, 2.05276070e-10, 6.77224376e-10, 1.29296123e-09,
        1.98762025e-09, 2.58833702e-09, 1.59243956e-09, 1.00383517e-10,
        1.49000244e-09, 4.41577452e-09, 8.63796704e-09, 1.36829487e-08,
        1.46585692e-08, 9.88017601e-09, 1.37864404e-08, 1.00040538e-08,
        1.18979086e-08, 2.04162842e-10, 6.88497936e-11, 9.28605735e-09,
        4.70400366e-10, 1.21317811e-08, 3.06428681e-09]))
In [86]:
plt.figure("Signal_spectra",[8,12])
fig,ax = plt.subplots()
ax.plot(w, yw_gauss[0],'b')
ax.plot(w, yw_g_osc[0],'r')
ax.set_xlim([w.min(),w.max()])
ax.set_ylim([0,1.5])
ax.set_xlabel(r"Spectrum $\omega$")
plt.show()
<Figure size 576x864 with 0 Axes>

But actually quad allows already to basically perform a Fourier transfrom via the weighting option.
The integrand can be weighted via some often used functions with corresponding parameters, among others sin and cos.
More details can be found in the documentation.

In [87]:
f_i_gauss_alt = lambda w : quad(f_gauss,-10,10,weight="cos",wvar=w)
int_f_gauss_alt = np.vectorize(f_i_gauss_alt)
yw_gauss_alt = int_f_gauss_alt(w)

Comparing the original calculation with the weighting option calculation shows good agreement

In [88]:
plt.figure("Signal_spectrum_comparison",[8,12])
fig,ax = plt.subplots()
ax.plot(w, yw_gauss[0],'b')
ax.plot(w, yw_gauss_alt[0],'r')
ax.set_xlim([w.min(),w.max()])
ax.set_ylim([0,1.5])
ax.set_xlabel(r"Spectrum $\omega$")
plt.show()
<Figure size 576x864 with 0 Axes>