[metrics-dev] What went on with GridFTP yesterday?

Ian Foster foster at mcs.anl.gov
Wed Oct 18 15:51:51 CDT 2006


shouldn't the answer be to filter those out? My view is we shouldn't be 
including Argonne traffic, in any case.

At 02:35 PM 10/18/2006 -0500, Ravi Madduri wrote:
>I think we should turn off usage stats when we are doing internal tests. A 
>large quantity of useless data will skew the inferences we wish to draw. I 
>know how to turn it off with java stuff but i am not sure with the c side 
>of things
>my 2c
>
>Lee Liming wrote:
>>Ravi:
>>Yes, that is what it looks like.  Turns out I was wrong on at least one 
>>point.  John was testing both UC -> UC transfers (LAN) and UC -> SDSC 
>>transfers (WAN).  I'm guessing that the lower transfer rates (10mb/s - 50 
>>mb/s, 500kb/s - 1 mb/s) were on the WAN because all of the transfers were 
>>single TCP streams.
>>Also, although there were an additional ~80 servers in the "unregistered 
>>IP address" category yesterday vs. the day before, John says that he used 
>>only a few interactive nodes at SDSC (allocated to him by the local 
>>scheduler) and a few nodes at U-C (same). It shouldn't have added up to 
>>80 nodes total.  So it might actually be that the additional 80 servers 
>>were not related to John's testing. (I think it more likely is and we 
>>just haven't figured out how it happened.)
>>FYI, John is apparently running these tests to generate some interesting 
>>data on the performance difference resulting from changes made in the 
>>development version of the GridFTP code.
>>I'd be interested in your thoughts regarding whether people doing testing 
>>like this should be turning usage reporting off or not. Anyone have an opinion?
>>         -- Lee
>>
>>On Oct 18, 2006, at 12:05 PM, Ravi Madduri wrote:
>>
>>>Lee
>>>John was testing some control channel command pipelining yesterday to be 
>>>shown as part of a SC Demo. I am cc'ing john
>>>
>>>Lee Liming wrote:
>>>>Ok, this was an interesting test case of what we can (and can't) learn 
>>>>using our usage stats data and the current report generators.
>>>>Here's what I can tell from the reports that we currently generate for 
>>>>GridFTP.
>>>>SUMMARY: The most likely theory, I think, is that someone is using a 
>>>>post-GT4 development version of GridFTP for stress testing purposes on 
>>>>a LAN using third-party transfers between a large number of servers 
>>>>(somewhere around 80, I suspect).
>>>>           -- Lee
>>>>Details:
>>>>The resulting data points were gleaned by comparing the usage profile 
>>>>on the 16th (normal usage level) with the 17th (usage spike).  This 
>>>>comparison is not exact, of course, because the "control" usage (the 
>>>>baseline, ordinary usage) likely varied a bit between the two days as well.
>>>>-- Most relevant, I think: The GridFTP version used was version 3.2.
>>>>This version is a development version, post GT-4.0.  So someone is 
>>>>using what is probably the latest development version for testing purposes.
>>>>--  Something we need to follow up on: The data in the database 
>>>>suggests that there were actually ~3.3M transfers reported yesterday, 
>>>>not the 1.5M that the daily summary report showed. (The daily summary 
>>>>report shows packets received, not number of transfers, but I would 
>>>>have thought they'd be the same.)  I'm not sure why these numbers 
>>>>disagree but it's sure to be interesting.  A possible clue is that in a 
>>>>third-party transfer, both the sending server and the receiving server 
>>>>are likely to send reports. But one would think that would be counted 
>>>>both in the packets received and in the transfers reported, so it's not 
>>>>clear why one is still twice the other.
>>>>--  The transfers were all single stream, single stripe (not striped).
>>>>--  The file sizes were mostly in the 0 - 100kb range, though it 
>>>>appears that a smaller set of them (10% or less) were in the 100kb - 1mb range.
>>>>-- A very large number of the transfers (~1.5M) appear to have been 
>>>>handled at transfer rates of 100mb/s or higher. A smaller number were 
>>>>in the lower bandwidth ranges (10mb/s - 50 mb/s, 500kb/s - 1 mb/s, etc.).
>>>>NOTE: Given that these were all single stripe, single stream transfers, 
>>>>it suggests to me that these were transfers on a local area network.
>>>>-- With regard to the servers that were used, the main difference 
>>>>between the 17th and the 16th is that on the 17th, there were quite a 
>>>>number of transfers between hosts with unregistered IP addresses (no 
>>>>reverse DNS entry).  On the 16th, there were ~20 servers in this 
>>>>category.  On the 17th, there were slightly over 100 servers in that category.
>>>>NOTE: This might further support the theory that the "extra" transfers 
>>>>were done between hosts on a LAN.
>>>>-- There was a 50/50 breakdown between STOR and RETR requests.
>>>>NOTE: I think it's safe to assume from this that what we're seeing are 
>>>>third-party transfers between GridFTP servers, and we're seeing the 
>>>>reports from both the sending hosts and the receiving hosts. (So the 
>>>>true number of transfers is probably 50% of the total reported, because 
>>>>each transfer is reported twice, once by sending host and once by receiving.)
>>>>-- All of the transfers that were reported had return code 226 (which I 
>>>>assume is success?), but based on the fact that I've never seen a 
>>>>different response code in any of our reports, I suspect that we only 
>>>>report successful transfers.
>>>>-- The TCP buffer sizes used were mostly in the 10kb - 100kb range, but 
>>>>some--slightly over 500k--were in the 100kb - 1mb range.  (Just a 
>>>>guess, but I'd bet those were the files that were >100kb in size.)
>>>>The block sizes used were all 100kb - 1mb.
>>>
>>>--Ravi K Madduri
>>>The Globus Alliance | Argonne National Laboratory
>>>http://www-unix.mcs.anl.gov/~madduri
>
>--
>Ravi K Madduri
>The Globus Alliance | Argonne National Laboratory
>http://www-unix.mcs.anl.gov/~madduri

_______________________________________________________________
Ian Foster -- Weblog: http://ianfoster.typepad.com
Computation Institute: www.ci.uchicago.edu & www.ci.anl.gov
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619 --- Globus Alliance: www.globus.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.globus.org/pipermail/metrics-dev/attachments/20061018/36ab9efb/attachment.htm>


More information about the metrics-dev mailing list