Index
Home
About
Blog
From: corbett@lupa.eng.sun.com (Robert Corbett)
Subject: Re: usage of assumed shape and performance on SGI,f90
Date: 05 Aug 1998
Newsgroups: comp.lang.fortran,comp.sys.sgi.admin
In article <35C6A2A6.A696C8DD@guitar.rockefeller.edu>,
Azat Badretdinov <azat@guitar.rockefeller.edu> wrote:
>We were experiencing severe drop in performance of our software after we
>switched from f77 to f90.
>
>The analysis of the origin of this drop lead us to the painful discovery
>that replacement of fixed shape (with constant boundaries) and explicit
>shape (with sizes defined by explicitly passed varaibles) formal array
>arguments by assumed-shape formal arrays arguments lead to 20-fold
>increase of the number of instructions per function in typical
>bottleneck case as detected by pixie-prof suit of programs.
>
>Please help us.
>
>Sincerely,
>
>Azat Badretdinov.
>
> -disclaimer-
> unless stated otherwise, everything in the above message is personal opinion
> and nothing in it is an official statement of molecular simulations inc.
Assumed-shape arrays are inherently slower than explicit-shape or
assumed-size arrays. Twenty-fold increases in the number of
instructions should be unusual, but such increases are by no means
impossible.
Passing explicit-shape or assumed-size array typically consists of
passing a pointer. An assumed-shape array requires passing the extent
and stride for each dimension of the array in addition to the pointer.
Explicit-shape and assumed-size arrays must be contiguous. Compilers
can exploit that requirement with optimizations such as flattening
loop nests and address calculations. Assumed-shape arrays are almost
always contiguous, but the compiler cannot rely on them being
contiguous.
Sincerely,
Bob Corbett
From: Richard Maine <maine@altair.dfrc.nasa.gov>
Newsgroups: comp.lang.fortran
Subject: Re: efficiency of explicit vs. assumed shape arrays
Date: 13 Jul 2000 07:41:46 -0700
"Lonnie Hamm" <hamm@okstate.edu> writes:
> When passing an array to a subroutine, are there any standard assumptions
> that can be made about how the array is passed. Is the address of the first
> element in the array passed or is the array copied in and out, maybe using
> temporary arrays. I suppose this may vary depending upon compiler and
> whether the array is assumed or explicit shaped.
Tim Prince gave what I think to be a pretty good answer here.
Basically, yes, it may vary depending on several things. There are
some things that are common to most compilers, but these kinds of
things are not specified by the standard and do vary. It is not
safe to count on them.
One thing I'll add that Tim didn't say. Any program that can tell the
difference is not standard-conforming.....well, unless the program can
tell the difference by noticing the performance or something of that
sort. Performance can be majorly affected, and for this reason, you may
want to be aware of what to expect in certain situations. You code might
not care in terms of getting correct answers, but it might care a lot
in terms of getting decent performance.
The situation where you are most likely to encounter copy-in/copy-out
is when a non-contiguous actual argument (i.e. an array slice) is
passed to a dummy argument that is not assumed shape. For assumed shape
dummies, an explicit interface is needed, and this typically includes
(behind your back) the stride information necessary to properly access
the elements of the slice. If the dummy is not assumed shape, then
the subroutine might not, in general, be able to deal with non-contiguous
arrays (indeed, it might have been compiled with an f77 compiler - some
of the funny criteria in this area are to make sure that it is possible
to call f77-compiled subroutines with an f90-compiled main). In this case,
the calling routine is likely to make a contiguous copy.
You also occasionally run into this even when the actual argument is
contiguous, if that contiguity was not trivially obvious at compile
time. For example, if you are passing a pointer, which *could*
point to an array slice, even though it never actually does. There is
an obvious run-time optimization to make here. Many compilers do the
optimization, but some don't.
These are just some fairly common cases. There are some others, and they
do vary from one compiler to another.
--
Richard Maine
maine@altair.dfrc.nasa.gov
From: "James Van Buskirk" <torsop@ix.netcom.com>
Newsgroups: comp.lang.fortran
Subject: Re: efficiency of explicit vs. assumed shape arrays
Date: Thu, 13 Jul 2000 17:47:36 -0600
Richard Maine wrote in message ...
>One thing I'll add that Tim didn't say. Any program that can tell the
>difference is not standard-conforming.....well, unless the program can
>tell the difference by noticing the performance or something of that
>sort.
I'm not so sure about that. Consider:
! CopyTest3.f90 -- attempts to check for copy-in/copy-out
module test3
implicit none
contains
subroutine test1(x, y)
integer, target :: x(:)
integer, target :: y(size(x))
integer, pointer :: a, b
a => x(size(x))
b => y(1)
if(associated(a, b)) then
write(*,'(a)') 'No copies through test 1.'
else
write(*,'(a)') 'Copies made by test 1.'
end if
call test2(x, y, size(x))
end subroutine test1
subroutine test2(x, y, n)
integer, intent(in) :: n
integer, target :: x(n)
integer, target :: y(n)
integer, pointer :: a, b
a => x(n)
b => y(1)
if(associated(a, b)) then
write(*,'(a)') 'No copies through test 2.'
else
write(*,'(a)') 'Copies made by test 2.'
end if
end subroutine test2
end module test3
program copytest3
use test3
implicit none
integer, target :: test(19)
call test1(test(1:10), test(10:19))
end program copytest3
Output with CVF 6.1A:
No copies through test 1.
Copies made by test 2.
Seems to be a detectable difference. Where did this code fail
to conform?
From: Richard Maine <maine@altair.dfrc.nasa.gov>
Newsgroups: comp.lang.fortran
Subject: Re: efficiency of explicit vs. assumed shape arrays
Date: 14 Jul 2000 08:08:30 -0700
"James Van Buskirk" <torsop@ix.netcom.com> writes:
> Richard Maine wrote in message ...
>
> >One thing I'll add that Tim didn't say. Any program that can tell the
> >difference is not standard-conforming.....well, unless the program can
> >tell the difference by noticing the performance or something of that
> >sort.
>
> I'm not so sure about that. Consider:
...[elided]
> Seems to be a detectable difference. Where did this code fail
> to conform?
1. Association is a concept defined by the language. Copy-in/copy-out is
an implementation mechanism for certain features. The two are *NOT*
necessarily tied quite as tightly as you seem to assume. It is possible,
for example, for a processor to do a copy-in, but keep track of the fact
that the copy is associated with the original. (Seems like a complicated
thing to do, but its possible). Your test would fail to notice that
distinction.
So by testing association, you are not strictly testing whether or
not a copy occured. Perhaps a bit pedantic here, as in practice I
don't think any compilers are quite as strange as the standard allows
here. But some for distributed environments just might be. I'm not
really an expert in those kinds of environments, but in principle you
can get cases where multiple data copies on different processors
are associated in some sense.
2. You are depending on a feature specified by the standard to be
processor-dependent. Section 12.4.1.1 of f95
"If the dummy argument has the TARGET attribute and is an explicit-shape
array or is an assumed-size array, and the corresponding actual argument
has the TARGET attribute but is not an array section with a vector
subscript
(1) On invocation of the procedure, whether any pointers associated
with the actual argument become associated with the corresponding
dummy argument is processor-dependent ..."
If one studies this enough (and if it doesn't damage your brain first),
you can conclude that its in essense saying that it is processor
dependent whether or not copy-in/copy-out occurs in this case, but
it says it just in terms of association just in case association
does something "strange" (see point 1 for picky distinctions).
3. I abhor that particular part of the standard (the infamous interp 125).
Its all but impossible to remember all the complicated conditions.
I'd have preferred a far simpler set of rules that one could actually
remember. This would have made the rules more conservative, in that
they would disallow some particular cases that ought to be workable.
But I lost out to the crowd that wanted to precisely specify the
exact cases that turn out to be naturally implementable with maximum
performance; this makes the specification pretty messy. My personal
rule is much simpler - don't do anything particularly close to that
and you won't have to worry about all the quirks.
I assure you that *I* didn't remember all the conditions mentioned in
the above quote (much less all the other combinations of conditions
mentioned in the surrounding text). In fact, my first reaction was
that it was likely a compiler bug and that the TARGET attribute
should have essentially forced the compiler to not do the copy
(it just wouldn't surprise me to find compilers that didn't get all
these complicated conditions right). But on about my 3rd careful
reading, I concluded that this case falls into the above particular
conditions where its stated to be processor-dependent.
--
Richard Maine
maine@altair.dfrc.nasa.gov
From: corbett@lupa.Sun.COM (Robert Corbett)
Newsgroups: comp.lang.fortran
Subject: Re: Efficient way of passing arrays
Date: 17 Sep 2001 23:52:15 GMT
In article <369aqt815f3crssecreh1fota4loqs3ev9@4ax.com>,
Ken Plotkin <kplotkin@nospam.com> wrote:
>On 16 Sep 2001 21:20:31 GMT, dantex1@aol.com (Dan Tex1) wrote:
>
>[snip]
>>more recent compiler releases still act this way. My guess is... that they
>>do. I am however curious as to why the slowdown when sizes are not passed
>>directly. Is this because the compiler writers haven't gotten around to
>>optimizing here yet ( for some unknown reason ) or are there technical
>>difficulties involved?
>
>I'm curious as to why it matters at all for a one-dimensional array.
>
>Unless maybe run-time array bounds checking was enabled, so that with
>(:) notation every reference to the array was accompanied by a
>debugging trace back to the calling routine?
>Ken Plotkin
Explicit-shape arrays, assumed-size arrays, and allocatable arrays
are known to be contiguous. Assumed shape arrays and deferred shape
arrays might or might not be contiguous. When computing the address
of an array element in the case where the array is contiguous, the
first subscript is multiplied by the size of the element. For many
data types, the size is a power of two and so the multiplication can
be done using a shift instruction. In the case where an array is not
contiguous, the stride is typically not known at compile time, and so
a multiplication must be used instead of a shift. Rank one arrays
are the arrays most affected by this difference.
Sincerely,
Bob Corbett
From: robert.corbett@sun.com (Robert Corbett)
Newsgroups: comp.lang.fortran
Subject: Re: Sun f90 bad optimization for assumed-shape array
Date: 23 Jul 2004 17:01:37 -0700
Message-ID: <cb977dbc.0407231601.2f5ad8c9@posting.google.com>
braams@courant.nyu.edu (Bastiaan Braams) wrote in message news:<9b35cf8.0407230644.56e410c8@posting.google.com>...
> In profiling my code I noticed strange behavior on a Sun system.
> I am on an Ultra-III machine and this is the Fortran compiler:
>
> f90: Forte Developer 7 Fortran 95 7.0 Patch 111714-09 2003/10/15
>
> I compile with use of the "-fast" option.
>
> A critical routine in my code has a heading similiar to this:
>
> | subroutine foo (n, x, w)
> | integer, intent (in) :: n
> | real (kind=dp), intent (in) :: x(0:)
> | real (kind=dp), intent (out) :: w(0:)
>
> The expected size of w is n, and at the top of my code I test
>
> | if (size(w).ne.n) then
> | stop '...'
> | endif
>
> Profiling tells me that this subroutine executes three times faster
> if the declaration of w is replaced by
>
> | real (kind=dp), intent (out) :: w(0:n-1)
>
> Is this kind of behavior a known feature? I'm quite annoyed with
> it, because the replacement declaration is much more sensitive to
> errors. (A routine that calls foo could pass to w an array of any
> size at least w and of any dimension, and all is legal.)
>
> Bas Braams
One reason code for explicit-shape arrays can run faster than code for
assumed-shape arrays is that explicit-shape arrays are known to be
contiguous. While the optimizer can guess that it is worthwhile to
test for contiguity at the top and clone different versions of the
code for the contiguous and discontiguous cases, it can't always
guess correctly.
Sincerely,
Bob Corbett
From: robert.corbett@sun.com (Robert Corbett)
Newsgroups: comp.lang.fortran
Subject: Re: Sun f90 bad optimization for assumed-shape array
Date: 30 Jul 2004 22:23:27 -0700
Message-ID: <cb977dbc.0407302123.1cb4ba91@posting.google.com>
Paul Van Delst <paul.vandelst@noaa.gov> wrote in message news:<ceb76v$b6p$1@news.nems.noaa.gov>...
> > There is one
> > (I think) obscure case that effectively disallows passing
> > a copy when enough pointers are used. But the general rule
> > applies: if your program can detect whether or not copies
> > are being passed, you need to fix your code ;).
>
> Well, even in the original case it wasn't the code that detected that a copy was being
> made - it was the OP! He noticed a large difference in execution speed between using
> assummed- or explicit-shape dummy arguments. Doesn't that imply the compiler should be
> "fixed" (to better handle the assumed-shape dummy argument case) ?
In the original case, where the dummy argument is an assumed-shape array,
no copy is made. My guess is that the argument passed to the array is
contiguous and so no copy is made in the second case where the dummy
argument is an explicit-shape array. The reason the code runs slower
when the dummy argument is an assumed-shape array than when the dummy
argument is an explicit-shape array is that the code that accesses the
assumed-shape array must assume that the array might be discontiguous
while the code that accesses the explicit-shape array can assume that the
array is contiguous.
As I said earlier in this thread, the routine with the assumed-shape
dummy argument could test to see if the array is contiguous at the top of
the routine and branch to a clone of the body of the routine that takes
advantage of contiguity in that case. Given the code in code size, an
optimizer is likely to be reluctant to do that optimization.
> It's all very well to say that you shouldn't need to worry about what a compiler does
> under the hood -- I say it to people all the time -- but when you notice a large
> performance hit like the OP did, you'd be nuts *not* to modify the code to eliminate the
> cause of the performance degradation.
Fortran 90/95 made the situation much worse than it was in FORTRAN 77.
For FORTRAN 77, pretty much all of the major vendors implemented the
same set of optimizations. Fortran 90/95's array expressions create far
more problems for optimizers. The number of possible optimizations is
staggering, but there are no known algorithms that implement more than a
tiny portion of the possible optimizations. As a result, the amount of
code in Fortran 90/95 compilers devoted to optimizing array expressions
is huge, but the chance that a compiler implements a given optimization
that someone might think it should is small. Furthermore, each compiler
implements a different set of optimizations.
Sincerely,
Bob Corbett
From: robert.corbett@sun.com (Robert Corbett)
Newsgroups: comp.lang.fortran
Subject: Re: Sun f90 bad optimization for assumed-shape array
Date: 29 Jul 2004 18:04:05 -0700
Message-ID: <cb977dbc.0407291704.51bec0db@posting.google.com>
Paul Van Delst <paul.vandelst@noaa.gov> wrote in message news:<ceb0l0$3m7$1@news.nems.noaa.gov>...
> So use of an explicit shape array guarantees that a copy of the argument will be made (at
> least on the compilers I tested, pgf90 5.2, ifort 8.0, and lf95 v6.2 on linux). Is that a
> Fortran Standard detail or an implementation one?
The Fortran 95 standard does not require that behavior, but the
standard and the existing code base strongly favor it. I believe
every existing Fortran 90/95 compiler works that way.
An implementation could use array descriptors to pass actual arguments
in all cases where the corresponding dummy argument is not known to be
a scalar. Such an implementation have a performance advantage if
discontiguous arrays were routinely passed to dummy arguments that are
explicit-shape or assumed-size (not assumed-shape) arrays. Since such
codes are rare, the cost of using descriptors for explicit-shape and
assumed-size is rarely justified. Since the cost of copying arrays is
high, users avoid writing codes that pass discontiguous arrays to
explicit-shape or assumed-size arrays. The circle is complete.
> I probably should have know about this already, but this littel exercise means I won't
> forget it now. Cool. Thanks.
You're welcome.
Sincerely,
Bob Corbett
Index
Home
About
Blog