Index Home About Blog
From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch
Subject: Re: HP loses text-seg protection [really: Big Data]
Date: 9 Aug 1996 00:43:37 GMT

In article <gscottDvsMGH.Mxt@netcom.com>, gscott@netcom.com (Gavin
Scott) writes:

|> John R. Mashey (mash@mash.engr.sgi.com) wrote:

|> : As I recall, they had 200TB for digitized audio alone. 
|> 
|> Yeesh!  200TB of digitized audio works out to something like 36 *years*
|> of continuous stereo audio at CD quality (assuming no compression either).
|> 
|> Where do you *get* that much data?

I went back to my notes, rather than relying on memory:

1) 170TB (not 200TB, sorry) of audio.

2) "300K hours of sound":
	300000/(24*365) ==> 34 years, so Gavin's 36-year estimate is good.

3) From a collection of records/tapes going back to 1896.

4) This is just one of several such companies, of course, and is just the
audio.  From my notes, I also found 60TB of video, and 200TB allocated for
art+pictures of various kinds. (I know they have more video, but I don't
know where that's going).

Note that, strangely enough:
	a) Media companies think that their archives are important assets.
	b) Physical media degrade, and can sometimes be hard to find,
	   and take a lot of space when they do.
	c) They usually have money to spend on preserving their assets,
	   and they want *infinite* file life :-)  

5) If a radio station does stereo recording of everything it broadcasts,
there are plenty of them that could have accumulated data on the
same order of magnitude, even with some compression.  Of course, then
there are TV stations, and then there are satellite image folks, and....

6) If people can afford the storage, they will fill it up.
These days, both DRAM (and lately) disk capacity, are going up at
4X/3 years, or 16X/6 years, and if this continues for a while, one
can expect 100GB individual 3.5" disks in 5-6 years ... and people will
still want more disk space, and even worse, they want it in networks,
and they want the data to slosh-around fast.

7) CPUs get faster, disks & DRAM get bigger ...as usual, what lags is
I/O in general ... but that's hardly new, although lack of doing anything
about it is going to cause a lot of heartburn, I suspect.

8) This customer is not only not unique, but not even odd; I've been in
plenty of meetings with people who were wishing/planning for similar
problem sizes. 



-- 
-john mashey    DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP:    mash@sgi.com 
DDD:    415-933-3090	FAX: 415-967-8496
USPS:   Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch
Subject: Re: Why IA64 Now? (was: IA64 references and questions)
Date: 24 Dec 1997 01:02:38 GMT

In article <rbarrisELB4v5.Itz@netcom.com>, rbarris@netcom.com (Robert
Barris) writes:

|> If the networking hardware has indeed gotten so fast that nobody in the
|> office can use up all of the available bandwidth no matter how hard they
|> try, I'm not sure whether to cheer or groan.  Maybe I just don't
|> understand where gigabit ethernet hardware is really being deployed and
|> what the desires for it are.

Note that Gigabit Ethernet is just really coming in, so there aren't
a huge number of installations yet.

I do a seminar called "Big Data & the Next Wave of InfraStress",
which is about interactions of various technologies in
data growth (DRAM & especially disk), stress on computing infrastructure over
the next 4-5 years, and lots of bandwidths.

(1) A moderately current disk (like an IBM Scorpion) can be made to
sustain ~10MB/sec for large blocks.  100BT is already a bottleneck to back
up one such disk, and the disks are getting faster, and of course,
many customers use striped disks.  A very useful chart is to
show bandwidth (vertical) versus year (horizontal),
and plot "high-performance disks" versus:
	networks
	peripheral connections (SCSI, FC, etc)
	I/O busses
Given expected technologies over the next few years, the disks rapidly
choke the slower networks, connections, busses.  Already, with 40MB/sec
UltraSCSI, we've seen that we get 30MB/sec read with 3 (10 MB/s) disks,
and only 32 MB/sec with 4 disks. Consider what's likely to happen with
the 13-15 MB/sec disks becoming more widely avaialble in 1998.
We do get 95 MB/sec from 100 MB/s FC, 10 disks, so there's a little room there.

If you want to get scared, consider the implications of disk capacity
reaching 200-500 GB in 1" X 3.5" disks within 5 years.

(2) Supercomputer customers, for years, have used HIPPI-800, which used to be
really expensive; people would be happier with GigaBit Ethernet or similar
technologies, which have approximately similar bandwidths.
Of course, they *really* want GSN GigaByte System Network, i.e., like
HIPPI-6400, but they can't get that yet.

(3)  Here are some of kinds of customers who already think 100BT isn't
enough, and some of whom think HIPPI-800 and GigaBit Ethernet or OC12 aren't
enough:
	(1) Some MCAD people, at least on the simulation & analysis side.
	(2) Intelligence agencies worldwide, some of whom would buy multiple
	GSN's per system today if they could.
	(3) Satellite image folks, whether in intelligence or elsewhere.
	(4) Special-effects houses.
	(5) Oil-and-gas-and-mining folks.  I saw some 2 weeks ago, asked them
	what they really wanted:
		"We want to be able to handle 10TB datasets and roam
		around in them."  Takes some I/O :-)
	(6) Medical-imaging folks: people would like to do 3D CAT-scans in
	real-time, and pipe the data to the doctors in other end of the
	hospital; I've heard 500 MB/sec quoted for what they'd like.

[This is not theoretical, this comes from frequent discussions with
senior people in such places; in the last month's trip through
NZ/OZ/Singapore/Malaysia, I talked with people from all these categories.]

Finally, while 100BT may be appropriate to the desktop for a while,
since it's on many motherboards, and the cabling is often easy, one
would expect from past history that backbones have to be faster
(meaning OC12 or 1000BT) and computer-room clustering even faster
(GSN or proprietary clustering).



--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-932-3090
USPS:   Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389



From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch,comp.arch.storage,comp.lsi
Subject: Re: THE FUTURE:  What It Will Be Like !!! [disks, Big Data]
Date: 11 Sep 1998 19:04:50 GMT

In article <35f91df8.148751386@news.wam.umd.edu>, rsrodger@wam.umd.edu
(Rob Rodgers) writes:

|> 100GB at, say, 20MB/s... Hmm.
|>
|> I wonder if for very, very high capacity drives it's going to start
|> making sense to have multiple simultaneous transfers from single
|> platters..

(Note that Terastor's products as announced are 5.25" removables ...
but the same laser-enhanced magnetic technology does seem applicable to
HD's as well.)

I often do a talk "Big Data & Next Wave of InfraStress"
that looks at the various trendlines,
with a lot of log scale charts.
One of the truly scary ones is the one Rob alludes to:
	elapsed time to read an entire disk.

Since density is going up faster than bandwidth, and since the techniques
that Terastor uses drastically increase the track density, rather than the
linear bit density, the backup time continue to increase.  The plausible
evolutionary tracks over the next few years would seem to include:

1.  1" * 3.5" drives: you get what you get in this form factor;
	good cost/bit.

2.  1.6" * 3.5": even bigger, great cost/bit, long backup time,
	worst cost/seek.

3. smaller * 3.5": single platter, better cost/seek, not as good cost/bit.

3a. smaller * smaller: single platter, best cost/seek, worst cost/bit,
	higher bandwidth (since smaller, can spin faster at given technology).

4. whatever size, with multiple head parallel transfer;
	optimizes bandwidth, but higher cost.  Has been done in the past.
	of course, has to cost less than twice that of a pair of smaller
	disks with the same total storage.

===
Trendlines: when you do log-scale graphics, inflection points in growth
rates show up very obviously.  In disks, there was a crucial inflection
around ~1989/1990 where the growth rate accelerated from 2X/3 years, to
4X/3 years, and it looks like there's a temporary inflection starting up,
with IBM"s GMR & the Terastor or Quinta laser-enhanced techniques pusing
things up faster, whichever actually ends up working.  Depending on
who you believe and how much, it seems plausible to see, by end of 2003,
3.5" X 1" drives in the 200GB-500GB range, and one would expect to buy them
in Fry's by 2004, and I expect to see home PCs with 500GB or more of disk
by that year [I know my wife will have one like that :-)], for digital
photography, if nothing else.  Selecting among the various trendline
combinations that I've plotted, if you have a 500MB 1" drive, and it has
a 50MB/sec transfer rate [straightline extrapolation], it takes
a minimum of 10,000 seconds to read the entire disk. A fatter 1.6" drive
could take 20,000 seconds.

Quite often, technology ratios are more important than absolute levels,
which  why log-scale charts are nice: if you plot two technologies,
and they're going up in parallel, the ratios are staying the same.
If they are diverging or converging, at some point the tradeoffs change
strongly.  For example, we've done a 1 TB backup to tape in one hour of
elapsed tije ... but it took ~38 tape drives. If you plot, for example
GB of disk per $
	versus
bandwidth off disks
	or even worse, versus
effective bandwidth to tape

you have to think there will be some wrenching changes in the ways people
use disks, whether or not they actually  back them up to take, or how often,
versus to other disks, or disk cartridges.

Another interesting comparison is disk bandwidth versus network bandwidth,
with sad conclusion, that disks we were shipping 1-2 years ago, already
cannot be backed up at full-speed across 100BT (i.e., we could achieve 10+
MB/sec / disk a while ago, but 100BT is a bit slower in practice).

On the other hand, how can anyone dislike more disk space?  (I was lucky enough
to hold one of these 340MB IBM microdrives my hand last year.  fun.)
As for the bigger disks, I love customers like, for example, the oil&gas folks:

me: "So if you could have what you want, what would you like?"
them: (quite seriously): "Well, we'd like to have a 10 TB dataset of
seismic info, that we could visually roam through."
me: "Are you willing to buy lots of disks to get enough bandwidth?"
them: "sure, when can we get it?"
me: "Well, not quite yet, but the pieces ought to come together in just
a few years."  [We've done tests with configurations with ~1000 disks, ~ 10TB,
and we've gotten sustained 7 GB/sec off disk, with disks that do 10 MB/sec.
If, by 2003, we have even 200 GB/disk, and 50 MB/sec/disk, one rack =
~100 disks, or theoretically, 5 GB/sec.  One would probably want at least
20 GB/sec, meaning 4 disk racks, or 80 TB.  peripheral connection is a bit of
a problem, for bandwidth applications, even  2Gb FibreChannel will be getting
stressed.  Anyway an imaginable configuration would be:
	400 X 200GB disks = 80 TB = 8 datasets
	400 * 50 MB/sec = 20 GB/sec I/O bandwidth
		= 200 1Gb current FC-ALs
		= 100 2Gb next-gen FC's
[If we had the disks, we could actually build these now, I think, i.e.,
Onyx2/Origin2000s can be configured with more than that I/O bandwidth].
As usual, the real issue is software: although the visual-simulation
people have a long history of optimizing for roaming [i.e., lots of real-time
support, and use of application-controlled pre-paging of data from the
direction it looks like you're heading, this doesn't seem to be production
yet in the critical geosciences applications, although people are at least
talking about it.]

One would probably want .5 - 1 TB of main memory [that's OK, we're
already seeing requests for TB memories from customers for delivery a few
years out.]

While some of this might seem like fantasy, it's not.  Of course,
this stuff terrifies the system planners who actually have to worry about
backup/restore ... although fortunately for the oil&gas folks, the datasets
usually existed on tape anayway, and are essentially readonly, thank goodness.
--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-969-6289
USPS:   Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389


From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch,comp.arch.storage,comp.lsi
Subject: Re: THE FUTURE: What It Will Be Like !!! [disks, Big Data]
Date: 11 Sep 1998 23:20:04 GMT

In article <6tc03i$ajo$1@server05.icaen.uiowa.edu>,
dsiebert@icaen.uiowa.edu (Doug Siebert) writes:

|> Organization: Iowa Computer Aided Engineering Network, University of Iowa
|>
|> mash@mash.engr.sgi.com (John R. Mashey) writes:
|>
|> >things up faster, whichever actually ends up working.  Depending on
|> >who you believe and how much, it seems plausible to see, by end of 2003,
|> >3.5" X 1" drives in the 200GB-500GB range, and one would expect to buy them
|> >in Fry's by 2004, and I expect to see home PCs with 500GB or more of disk
|> >by that year [I know my wife will have one like that :-)], for digital
|> >photography, if nothing else.  Selecting among the various trendline

|> want this for /tmp :) )  Mirror a couple 500GB drives and then it would
|> no longer be a problem that the average person doesn't do backups of their
|> home PC (and time taken to backup is not an issue if you don't do it!)

Yes, that's one of the plausible approaches I often talk about.

|> Digital photographs won't be enough to use up that much space unless you
|> don't believe in JPEG and you REALLY take a lot of pictures!  Now if you
|> want to copy a hundred full length DVD movies to your hard drive, then
|> that space might start looking a little small, but that's not going to be
|> a typical consumer's usage model.

Sorry, instead of digital photography, I should have been more explicit
and said digital (still & video) photography; I agree that it is a lot harder
for a home user to chew up 100s of GBs with stills, than with motion.

Of course, the line seems to be blurring:
consider a Nikon Coolpix 900, which already shoots 2 frame/second in VGA mode.

1,280 x960  pixels (1.3 Mpixel) or 640x480

In JPEG, fine mode, (1:4 compression) it takes ~ .7 MB/image for 1280x960.

If you matched the LCD's monitor rate of 30 frames/sec, call that 20 MB/sec,
or a GB in 50 seconds (call it a minute), so 500 GB = 500 minutes of "video",
which seems not unreasonable to generate over a few years of home use.

Of course, none of these is even remotely close to 35MM film resolution
(~100 Mpixel, or 70X bigger than the 1280x1024 above), i.e., nobody thinks
the current digital cameras replace film ... but people would like to.

(see http://www.usnews.com/usnews/issue/970512/12digi.htm, for example).

I would assume, based on looking at the recent past history of this stuff:

	(a) Input resolutions will rise as CCDs, and then maybe CMOS-image
	cameras keep improving.
	(b) Stored resolutions will rise along with DRAM size increases,
	and disk capacities.
	(c) Viewing resolutions will improve along with bigger monitors &
	these big LCDs that are coming in.
	(d) Printing resolutions will increase as per vendor desires to
	sell higher-resolution printers to consumers.
	(e) Frames/second will go up.

I assume the bottlenecks are more likely in (a) and (c) than in disks,
although I do observe that disks of the order of 10GB (of which my household
has a few) seem insufficient for keeping the current digital photography around,
whereas disks on the order of 100GB would be more useful...
and besides, in 2003, one will needs some space for Office2003 :-)





--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-969-6289
USPS:   Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389


From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch,comp.arch.storage,comp.lsi
Subject: Re: THE FUTURE: What It Will Be Like !!! [disks, Big Data]
Date: 12 Sep 1998 03:15:56 GMT

In article
<76A12B881E2E3A93.3C73FD0DC95587DA.EA15A9205DE0C048@library-proxy.airnews.net>,
malc@mci2000.com (Malcolm Weir) writes:


|> >|> want this for /tmp :) ) Mirror a couple 500GB drives and then it
|> >|> would no longer be a problem that the average person doesn't do
|> >|> backups of their home PC (and time taken to backup is not an issue
|> >|> if you don't do it!)
|>
|> >Yes, that's one of the plausible approaches I often talk about.
|>
|> Not very plausible!  Backups are done for more reasons than just to protect
|> against hardware failure!

It's quite plausible in the context of typical home systems, some of which
never get backed up, and some of which get backed up onto tapes that are stored
right next to the PC. For backup versions of software & such, big disks
certainly let you have multiple partitions. Another way to do this,
that more follows traditional backups, would be to split each disk in
half, and back up the "faster half" of each disk onto the slower half of
the other disk.


It's hardly a solution in corporate data centers that require
offsite storage of tapes.  It may well be part of the solution if
there are live mirror sites.

--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-969-6289
USPS:   Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389


From: lm@bitmover.com (Larry McVoy)
Newsgroups: comp.arch,comp.arch.storage,comp.lsi
Subject: Re: THE FUTURE: What It Will Be Like !!! [disks, Big Data]
Date: 12 Sep 1998 05:02:27 GMT

: Another way to do this,
: that more follows traditional backups, would be to split each disk in
: half, and back up the "faster half" of each disk onto the slower half of
: the other disk.

That's a bad idea - a head crash will take out the whole disk.  Disks are
cheap enough I'd do this:

	[ fast ]	[ fast]
                \      /
		  \  /
		   /\
		 /    \
	[ slow ]        [ slow ]

Disks are small & fast enough that they ought to be able to make two
HDA's in one case that look like a single disk (i.e., have on SCSI or
IDE attachment).  I suspect you could build a 100% reliable drive for
about 1.8x the cost of a regular drive.
--
---
Larry McVoy            	   lm@bitmover.com           http://www.bitmover.com/lm


From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch,comp.arch.storage,comp.lsi
Subject: Re: THE FUTURE:  What It Will Be Like !!! [disks, Big Data]
Date: 14 Sep 1998 06:10:23 GMT

In article <35FC63EF.2635287D@Japan.NCR.COM>, Eric Hildum
<Eric.Hildum@Japan.NCR.COM> writes:

|> I don't understand the difficulty with the system below. You seem to
|> imply that it cannot be done yet; however, NCR demonstrated an 11 Tera
|> system almost two years ago here in Japan. It should be able to easily
|> do what you are describing below....

Perhaps I failed to explain this well enough, or perhaps I do not
understand NCR MPP systems well enough ... but it seems rather unlikely
that an NCR MPP would be very useful for this application:
the problem is *not* just being able to hook up a lot of disks,
nor is it to do typical commercial data mining [both NCR & SGI do that,
and amusingly, "data mining", and "visual exploration of mining data"
are different, despite both using the word "mining" :-)  :

1) "visually roam through": this is a real-time visual-simulation-like
application.  People heading in this direction tend to use
visualization walls, or at least reality-center setups [160-degree screen,
small auditorium, at least 3-pipe InfiniteReality graphics units tightly
integrated with the computation & I/O]. Some oil&gas folks have already
built such things ... none of which are quite yet up to the problem described.

=> I don't think NCR WorldMarks come with IR-class graphics... :-)

2) These problems involve big floating-point computations on large
arrays of technical data [hence the guess at .5 TB+ of main memory],
and a 32-bit OS would be a noticable hindrance ...

=> WorldMarks are 32-bit systems, and it is not obvious that they are
tuned for large floating-point codes.

3) A Worldmark 5150 is 2 to 128 node MPP, where each node is 4 200MHz PPRos,
has a 533 MB/sec (peak) system bus & memory bandwidth, up to 4GB total memory.
I/O : 6 32-bit PCI, + 4 EISA, 4 40MB/sec UltraSCSI.  I think the
memory bandwidth translates to about 400 MB/sec sustained, and presumably
the total sustainable I/O is about 200 MB/sec (although this is hard to tell,
I don't know offhand what the chipsets look like, but assuming 2 PCI32/33Mhz,
that looks about right.)

Ignoring some other issues for the time being, let's work backwards from
a sustained 20 GB/sec off disk, back-of-the-envelope style:

20 GB/sec, for 128 nodes = 156 MB/sec per node sustained
	That looks to fit, although not just in the SCSIs, since you can
	actually sustain about 32-35 MB/sec through a 40MB/sec UltraSCSI.
	In a max configuration, if each node only has to deal with its
	own disks [see below], then the potential 200 MB/sec of I/O / node
	is mostly consumed to get 156 MB/sec sustained from disk
	[since some of the remaining 44 MB/sec gets chewed up with
	overhead

4) Maybe somebody will correct me if I'm wrong, but for interconnect, the latest
version is a BYNET rated at 500 MbPS, bi-directional, which I think is a
peak number, for a peak of 62.5 MB/sec bidirectional per node,
or 125 MB/sec/node total, and a total aggregate interconnect bandwidth
of 16 GB/sec.  [Without knowing more details of the actual interconnect
toplogies, and peak-vs-sustained, it's hard to know what the sustained bisection
bandwidth actually is., but a best case of 16 GB/sec is a warning flag for this
kind of application, where data tends to be spread around ... i.e.,
the reference patterns are fairly different from a TPC-D, for example].

In a typical MPP system, I/O goes from disk -> node to which it is attached,
and then gets copied (via DMA operations) to the node that needs the data.

per node		Best case		Worst case
			perfect placement	saturated BYNET
Total memory B/W	 400 MB/sec		 400 MB/sec
I/O B/W used		- 156 MB/sec		- 156 MB/sec
Interconnect used	- 0			- 125 MB/sec
B/W left for 4 CPUs	= 244 MB/sec		= 119 MB/sec*

How much would it take to saturate the BYNET?  Note that each byte of
data that needs to get to another node effectively burns 3X memory
bandwidth: 1X to read from disk, 1X to later copy out, 1X to read in.
Consider the simple case of 2 nodes, together reading 312 MB/sec of data.
If 40% (125/312) of their data needs to be moved, they will saturate
the interconnect, even assuming no switching congestion or other overheads.
I don't recall if there are any "remote-DMA" tricks in the WorldMarks to
do the I/O directly off disk across the BYNET into a different node's memory.

5) My conclusion:
	Ignoring the graphics & FP issue, from back-of-the envelope,
	a 128-node, 512P WorldMark 5150:
	a) Has enough theoretical I/O bandwidth to go for 20 GB/sec,
	but it may be touch-and-go, depending on the file system overhead.
	b) Might work if you could somehow keep most of the I/O & CPU work
	inside each node ... but this data often isn't that way, in which case
	either the interconnect bandwidth may be a little light, and if
	the interconnect is being run near-saturation, then there is
	not much bandwidth left for the CPUs [119 MB/sec for 4 CPUs.]
	c) Reality is almost always a bunch worse than the above;
	usage seldom perfectly balances, there is ovehead, etc. In practice,
	the high I/O rates demanded by such folks really want direct I/O
	right into the user address space, and the data are often *not*
	kept in the kinds of DBMS that people use to do commerical data
	mining & warehousing.
	d) Anyway, I wouldn't bid a 5150 on a contract that demanded 20 GB/sec
	sustained from disk into an application of this type.  it might
	barely be possible, but I doubt that it would be easy!


--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-969-6289
USPS:   Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389


From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch,comp.arch.storage,comp.lsi
Subject: Re: THE FUTURE:  What It Will Be Like !!! [disks, Big Data]
Date: 14 Sep 1998 19:21:16 GMT

In article <6tjl6k$i8m$1@pollux.dnai.com>, lm@bitmover.com (Larry McVoy)
writes:

|> John, John, John - they'll never get it this way.  You need sound bites
|> to prove your point (which is a good point, none the less).  Try this one
|> (it's what I used to use when I was at SGI):
|>
|>     I'll bet you the cost of the disk drives, which you have to provide
|>     in advance, that an SGI Origin can deliver any I/O rate you want
|>     through the file system.  Any I/O rate.  A gigabyte, 10, 100, a
|>     Terabyte, 10TB, whatever.
|>
|> The only limit on the I/O bandwidth is the cable lengths of the CrayLink
|> interconnect and/or the Fibre Channel (or whatever).  And I/O is easy on
|> that box, you get at least 90% of theoretical max so it's safe for quite
|> some time.

Hmmm.  I recall being in an SGI engineer's office who once reminded me that
there were all sorts of inefficiencies between theoretical peaks and
what people could actually get, and complained about sales/marketing people
promising things closer to peaks without talking to engineers about what
the real numbers .... I think I recall that engineer's initials being L.M. :-)

I talk to too many customers who told they could get "any rate" would then
demand that!


--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-969-6289
USPS:   Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389


From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch,comp.arch.storage
Subject: Re: THE FUTURE:  What It Will Be Like !!! [disks, Big Data]
Date: 14 Sep 1998 19:47:33 GMT

By happy coincidence, here's a description of the current sort of system
(a RealityCenter @ Statoil in Norway) of the direction discussed earlier,
although clearly not as large: 16P, 8 GB memory, 4  InfiniteReality
graphics systems, and they expect to do 100GB datasets [still a factor
of 100 smaller than the 10TB datasets the other folks wanted, but certainly
going in the right direction.]

http://biz.yahoo.com/prnews/980914/ca_silicon_1.html


--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-969-6289
USPS:   Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389


From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch
Subject: Re: Memory Wall -- Comments?
Date: 7 May 1999 01:12:14 GMT

In article <7gclsk$mhh$1@news.iastate.edu>, john@iastate.edu (John
Hascall) writes:

|>    How big will an N GB disk (for modest N) be in 2005?
|>    My guess, it will fit along with everything else on
|>    a thick credit-card-sized unit.

Well, actually:
1) There are already 340MB drives of this size (IBM), with actual platter
the size of a quarter, and the drive is actually 1.7" x 1.4" x .19",
compared to a credit card (3.25" X 2" X (.03?)").
See http://www.ibm.com/storage/microdrive, amazing tech.

2) Disk density improved about 30%/year through ~1989,
then went to 60% since then, and appear to be accelerating to ~100%/year
over the next couple years, which says there should be a 1+ GB microdrive
sometime in 2001.

3) Just for fun, at the IDEMA (disk industry) conference yesterday,
Dataquest/Gartner's John Monroe predicted the # of Terabytes of disk shipped
(with rounding by me):

Year	# TBs
1997	   344,000
1998	   704,000
1999	   892,000
2000	 1,923,000
2001	 3,009,000
2002	 5,645,000

If these are anywhere near close, there are some amusing conclusions:
a) This is 1.75X/year, less than the 2X/year density idea (i.e.,
more smaller drives).  If it gets to 2X/year, that means that every
year, we will install *more* disk space than the entire accumulated disk
space in the history of computing (i.e., 1, 2, 4 (>1+2), 8 (>1+2+4);
we're not quite there yet, you have to repalce "year" by 15 months...

b) Each year, the price of a disk of a given form factor drops a little,
and the capacity goes up.  If you buy the same number of disks each year
going forward, and they double every year for a while, then:

Year	Relative Size	% of space in year N
N	16		~52%
N-1	8		~26%
N-2	4		~13%
N-3	2		~06%
N-4	1		~03%

Meaning, at least in terms of capacity provided (as opposed to seeks/sec),
any disk more than 2-3 years old is almost useless...
[I'm reminded of still having, from years ago, a 20MB Mac disk that cost
$1000 at the time ... a fine deal.]

c) Of course, bandwidth doesn't go up as fast as capacity,
and latency ... sigh....  I figure, within 2-3 years, comparing
average random access times (average seek plus 1/2 rotation),
and peak execution rates (issue rate * Mhz), a random disk access =
20-30M (peak) instruction issues, meaning that you can execute a lot of
code to avoid a disk access.



"Money can buy bandwidth, but latency is forever".
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-933-4392
USPS:   Silicon Graphics/Cray Research 40U-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

Index Home About Blog