Boldon: Why is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not

Monday, 9 September 2013

Why is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not

Why is univariate Horner in Fortran faster than NumPy counterpart while
bivariate Horner is not

I want to perform polynomial calculus in Python. The polynomial package in
numpy is not fast enough for me. Therefore I decided to rewrite several
functions in Fortran and use f2py to create shared libraries which are
easily imported into Python. Currently I am benchmarking my routines for
univariate and bivariate polynomial evaluation against their numpy
counterparts.
In the univariate routine I use Horner's method as does
numpy.polynomial.polynomial.polyval. I have observed that the factor by
which the Fortran routine is faster than the numpy counterpart increases
as the order of the polynomial increases.
In the bivariate routine I use Horner's method twice. First in y and then
in x. Unfortunately I have observed that for increasing polynomial order,
the numpy counterpart catches up and eventually surpasses my Fortran
routine. As numpy.polynomial.polynomial.polyval2d uses an approach similar
to mine, I consider this second observation to be strange.
I am hoping that this result stems forth from my inexperience with Fortran
and f2py. Might someone have any clue why the univariate routine always
appears superior, while the bivariate routine is only superior for low
order polynomials?
Here is my code, a script for automated benchmarking and 2 plots:
polynomial.f95
subroutine polyval(p, x, pval, nx)
implicit none
real(8), dimension(nx), intent(in) :: p
real(8), intent(in) :: x
real(8), intent(out) :: pval
integer, intent(in) :: nx
integer :: i
do i = 1, nx
pval = pval*x + p(nx-i+1)
end do
end subroutine polyval
subroutine polyval2(p, x, y, pval, nx, ny)
implicit none
real(8), dimension(nx, ny), intent(in) :: p
real(8), intent(in) :: x, y
real(8), intent(out) :: pval
integer, intent(in) :: nx, ny
real(8) :: tmp
integer :: i
do i = 1, ny
call polyval(p(:, ny-i+1), x, tmp, nx)
pval = pval*y + tmp
end do
end subroutine polyval2
benchmark.py (use this script to produce plots)
import time
import numpy as np
from numpy import f2py
import matplotlib.pyplot as plt
# Compile and import Fortran module
fid = open('polynomial.f95')
source = fid.read()
fid.close()
f2py.compile(source, modulename='polynomial')
import polynomial
# Create random x and y value
x = np.random.rand()
y = np.random.rand()
#==============================================================================
# Array containing the polynomial order + 1 for several univariate
polynomials
n_uni = np.array([2**i for i in xrange(1, 21)])
# Initialise arrays for storing timing results
time_numpy = np.zeros(n_uni.size)
time_fortran = np.zeros(n_uni.size)
for i in xrange(len(n_uni)):
# Create random univariate polynomial of order n - 1
p = np.random.rand(n_uni[i])
# Time evaluation of polynomial using NumPy
t1 = time.time()
np.polynomial.polynomial.polyval(x, p)
t2 = time.time()
time_numpy[i] = t2 - t1
# Time evaluation of polynomial using Fortran
t1 = time.time()
polynomial.polyval(p, x)
t2 = time.time()
time_fortran[i] = t2 - t1
# Speed-up factor
factor_uni = time_numpy / time_fortran
plt.figure()
plt.plot(n_uni, factor_uni)
plt.title('Univariate comparison')
plt.xlabel('# coefficients')
plt.ylabel('Speed-up factor')
plt.xlim(n_uni[0], n_uni[-1])
plt.ylim(0, max(factor_uni))
plt.xscale('log')
#==============================================================================
# Array containing the polynomial order + 1 for several bivariate polynomials
n_bi = np.array([2**i for i in xrange(1, 11)])
# Initialise arrays for storing timing results
time_numpy = np.zeros(n_bi.size)
time_fortran = np.zeros(n_bi.size)
for i in xrange(len(n_bi)):
# Create random bivariate polynomial of order n - 1 in x and in y
p = np.random.rand(n_bi[i], n_bi[i])
# Time evaluation of polynomial using NumPy
t1 = time.time()
np.polynomial.polynomial.polyval2d(x, y, p)
t2 = time.time()
time_numpy[i] = t2 - t1
# Time evaluation of polynomial using Fortran
t1 = time.time()
polynomial.polyval2(p, x, y)
t2 = time.time()
time_fortran[i] = t2 - t1
# Speed-up factor
factor_bi = time_numpy / time_fortran
plt.figure()
plt.plot(n_bi, factor_bi)
plt.title('Bivariate comparison')
plt.xlabel('# coefficients')
plt.ylabel('Speed-up factor')
plt.xlim(n_bi[0], n_bi[-1])
plt.ylim(0, max(factor_bi))
plt.xscale('log')
plt.show()

Boldon

Monday, 9 September 2013

Why is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not

No comments:

Post a Comment