[metrics-dev] What went on with GridFTP yesterday?
liming at mcs.anl.gov
Wed Oct 18 12:53:12 CDT 2006
Ok, this was an interesting test case of what we can (and can't)
learn using our usage stats data and the current report generators.
Here's what I can tell from the reports that we currently generate
SUMMARY: The most likely theory, I think, is that someone is using a
post-GT4 development version of GridFTP for stress testing purposes
on a LAN using third-party transfers between a large number of
servers (somewhere around 80, I suspect).
The resulting data points were gleaned by comparing the usage profile
on the 16th (normal usage level) with the 17th (usage spike). This
comparison is not exact, of course, because the "control" usage (the
baseline, ordinary usage) likely varied a bit between the two days as
-- Most relevant, I think: The GridFTP version used was version 3.2.
This version is a development version, post GT-4.0. So someone is
using what is probably the latest development version for testing
-- Something we need to follow up on: The data in the database
suggests that there were actually ~3.3M transfers reported yesterday,
not the 1.5M that the daily summary report showed. (The daily summary
report shows packets received, not number of transfers, but I would
have thought they'd be the same.) I'm not sure why these numbers
disagree but it's sure to be interesting. A possible clue is that in
a third-party transfer, both the sending server and the receiving
server are likely to send reports. But one would think that would be
counted both in the packets received and in the transfers reported,
so it's not clear why one is still twice the other.
-- The transfers were all single stream, single stripe (not striped).
-- The file sizes were mostly in the 0 - 100kb range, though it
appears that a smaller set of them (10% or less) were in the 100kb -
-- A very large number of the transfers (~1.5M) appear to have been
handled at transfer rates of 100mb/s or higher. A smaller number were
in the lower bandwidth ranges (10mb/s - 50 mb/s, 500kb/s - 1 mb/s,
NOTE: Given that these were all single stripe, single stream
transfers, it suggests to me that these were transfers on a local
-- With regard to the servers that were used, the main difference
between the 17th and the 16th is that on the 17th, there were quite a
number of transfers between hosts with unregistered IP addresses (no
reverse DNS entry). On the 16th, there were ~20 servers in this
category. On the 17th, there were slightly over 100 servers in that
NOTE: This might further support the theory that the "extra"
transfers were done between hosts on a LAN.
-- There was a 50/50 breakdown between STOR and RETR requests.
NOTE: I think it's safe to assume from this that what we're seeing
are third-party transfers between GridFTP servers, and we're seeing
the reports from both the sending hosts and the receiving hosts. (So
the true number of transfers is probably 50% of the total reported,
because each transfer is reported twice, once by sending host and
once by receiving.)
-- All of the transfers that were reported had return code 226 (which
I assume is success?), but based on the fact that I've never seen a
different response code in any of our reports, I suspect that we only
report successful transfers.
-- The TCP buffer sizes used were mostly in the 10kb - 100kb range,
but some--slightly over 500k--were in the 100kb - 1mb range. (Just a
guess, but I'd bet those were the files that were >100kb in size.)
The block sizes used were all 100kb - 1mb.
More information about the metrics-dev