Open Grid Forum

WHAT ARE OGF DOCUMENTS?

OGF DOCUMENT SERIES
Recommendation
Informational
Community Practice
Experimental

PUBLIC COMMENTS
Archived Comments

DRAFT DOCUMENTS


EGA DOCUMENTS

OGF Public Comments

Public comments are a very important part of the OGF document approval process.  Through public comments, documents are given scrutiny by people with a wide range of expertise and interests. Ideally, a OGF document will be self-contained, relying only on the other documents and standards it cites to be clear and useful.  Public comments of any type are welcomed, from small editorial comments to broader comments about the scope or merit of the proposed document.  The simple act of reading a document and providing a public comment that you read it and found it suitable for publication is very useful, and provides valuable feedback to the document authors.

Thank you for making public comments on this document!


Comments for Document: Using Clouds to Provide Grids Higher-Levels of Abstraction and Explicit Support for Usage Modes
Author(s):S. Jha, A. Merzky, G. Fox
Type:INFO
Area:-
Group:-
Public Comment End:1 Sep, 2008

To make anonymous comments, please use 'anonymous' and 'guest' as the un/pw.


Comments:


Posted by: lee 2008-09-03 11:21:56Achievement of Viable Simplicity?
This is a very interesting paper. I find that I don't have specific
comments about the structure paper, or about individual concepts in the paper,
but rather I have a bunch of meta-questions or observations.

How do "simplicity" and "affinity" relate?

I think some grids exhibit affinity in the types of applications they support
(data grids and compute grids) but the simplicity of an API that presents a
useful model of resources that are for all intents and purposes virtualized on
the back-end seems to presuppose (or demand) an affinity in that useful model.

Clearly an API represents some sort of abstraction of the underlying semantics
of the model "the system" is supposed to implement.

Is the real simplification that has been achieved here that clouds+virtualization
allows the issue of "job scheduling" on specific hardware platforms to be cast
into an issue of "resource provisioning"? By statically scheduling virtual nodes
or a cluster, the end-user can run jobs as usual. Virtualization (and perhaps
more homogeneity on the cloud back-end) means that reliability and fail-over
can be better.

Does virtualization enable simple, minimal, but useful APIs? Said in another
way, does virtualization allow grungy detail to successfully hidden in the
back-end and not exposed to the end-user?

Another perspective on this is perhaps the notion of "the achievement of viable simplicity".
Not only are the APIs simple, they are useful for a critical mass in the marketplace.

Where/What is the sustainable markets for grids and clouds? Clearly clouds
have gained marketshare as a provisioning mechanism. But something like
"frameworks" are still needed to make distributed systems easier to build and manage,
even if provisioned by clouds. Could the traditional grid concept be viewed
as essentially a cloud framework? Is this essentially one of the Usage Models
or Modes?

Very minor comment:

It seems like there are figure labels missing from Figure 1 ("Compute Resource" perhaps)
and Figure 2 ("Network Resource" perhaps).


Posted by: anonymous 2008-09-10 09:22:49Comments by Ian Foster
The authors of this document make several assertions with which I take exception:

1) "There is a level of agreement that computational Grids have not been able to deliver on the promise of better applications and usage scenarios."

It is fascinating to watch the Gartner hype curve in action, if sad to see people stuck in the trough of despondency. But the fact is, fortunately, that there are substantial grid projects and applications that are having substantial success. Ones that come immediately to mind are the Earth System Grid, cancer Biomedical Informatics Grid, and the LIGO Scientific Collaboratory, but as it is today that the LHC was switched on, we should also recall the remarkable successes of the LHC Computing Grid. At a different level, Globus people will be happy to talk about the millions of files moved via GridFTP every day, and Miron Livny will be happy to talk at length about how many millions of CPU hours are delivered every day via Condor.

2) To address this purported lack of success, "there is a need to expose less detail and provide functionality in a simpli?ed way. If there is a lesson to be learned from Grids it is that the abstractions that Grids expose – to the end-user, to the deployers and to application developers – are inappropriate and they need to be higher level."

No evidence is provided for this assertion that complex interfaces are the reason for the difficulties people have with grids. I argue that the issues are more complex.

First, the interfaces themselves are not, in my view, a significant issue. We can argue whether we prefer REST or Web Services, or say Nimbus (a grid virtualization interface) or EC2 (a cloud virtualization interface), but the differences among these alternatives are not great.

On the other hand, the economic systems that apply in the two cases are extremely different:

* Amazon services are designed to support the masses, they have no political constraints on who they can provide service to, and their charging model provides strong return to scale; thus, Amazon can focus on, and succeed in providing, modest-scale, reliable, on-demand service to many.

* TeraGrid (to use a US example) is designed to support a small number of extreme computing users, with a negative return to scale (the more users, the more work for fixed budget); thus, they are not motivated to provide virtualization solutions or to operate highly reliable remote access interfaces.

The implications of these different foci for users are tremendous. On EC2, I give my credit card and start a VM--a few seconds. On TeraGrid, I request an allocation (which may not be granted!), get an account, submit a request to run a job (they won't allow me to start a VM), wait in the queue--a many week process. Furthermore, I sometimes find that the remote access interfaces fail because keeping them running is not high priority.

This alternative perspective is I think more revealing about the sources of the differences and the ways we might address them. If we want on-demand, high-quality, compute and storage services, then we need either to create an economic system in which academic providers are motivated to provide such services, or decide to outsource to industry.

The importance of higher-level interfaces is a separate issue. Yes, tools like Hadoop and Swift for data analysis, Introduce for service authoring, Taverna for service composition are important and necessary. Yes, we should be hoping to leverage and influence work done in the far larger corporate market to our advantage. (A focus of the upcoming CCA workshop: www.cca08.org.)

3) "Grids as currently designed and implemented are difficult to interoperate." The authors make a big deal of this point, but it is not clear to what purpose.

It is true that interoperation is not automatic. [If only everyone used Globus software, then all would be well :) --although of course the policy issues would remain]. But I am not sure that this is a significant problem for users, or hard to achieve when it is needed. E.g., the caBIG team recently demonstrated a gateway to TeraGrid. The LHC Computing Grid integrates reosurces worldwlde. Etc. Most users never ask about interoperability, in my experience.


Posted by: gbnewby 2008-09-11 13:43:38Comment received via email
Via email from
Dave Berry
Technology Lead, Grid Computing Now!
National e-Science Centre
15 South College Street, Edinburgh, EH8 9AA
+44 131 651 4039 www.gridcomputingnow.org

The OGF should not publish this document in its current form.

The authors set out to investigate the usability differences between existing Grid and Cloud systems from the point of view of an application developer. This is a commendable goal. The authors introduce the notions of Usage Mode and Affinity, which seem to be useful. However, their presentation is profoundly flawed by their generalisations about grids and clouds.

The root of the problem is that there is no generally agreed definition of grid or cloud. When the term "grid" was the marketing buzzword du jour, all sorts of systems were called grid. Even if we exclude clusters, there is clearly room for desktop grids (e.g. Condor, Digipede), enterprise grids (e.g. Ebay) and inter-enterprise or scientific grids (e.g. EGEE, TeraGrid). Each of these would generate a different comparison with a Cloud system such as Amazon EC2. For example, an enterprise grid might underpin a cloud offering, whereas an inter-enterprise grid might link one or more "Cloud" offerings, possibly combined with other resources. The discussion of the "fit" between clouds and grid in this paper seems to jump between the two. On the one hand, the grid it mentions most often is the TeraGrid, which operates on an inter-enterprise model. On the other, it suggests that grids might underpin Cloud offerings, whereas current commercial cloud offerings exist primarily on enterprise grids.

THe paper further confuses implementation technologies with fundamental concepts. It seems to suggests that grids are necessarily built on the SOAP/WS-* stack. Certainly this is the approach taken by OGSA and (partially) by TeraGrid and EGEE. On the other hand, Condor could not be further from this model.

I do believe the paper is right to consider these differences from the point of view of the application developer. It may be that an inter-enterprise grid could be considerably improved by offering a restricted API for each usage mode. I share the authors' prefenence for simple, task-directed interfaces.

In my opinion, the authors would do better to drop the generalisations about grids and clouds. Instead, they should pick a certain few systems - e.g. Condor, EGEE, TeraGrid, Amazon EC2 and Flexiscale. They should contrast the developer APIs of these systems, in order to identify their usage modes and affinities. This may well lead to a conclusion that the cloud offerings are better designed and easier to use, but I hope that this conclusion will be based on a rigorous examination of actual systems rather than the frankly dubious generalisations of the current draft.

It is perhaps worth mentioning that there is no single definition of Cloud. It seems that several cloud offerings are based on the notion of submitting a virtual machine to the system, as opposed to the grid notion of submitting an application. If this is the case, then it seems worth highlighting. Particularly in the commercial field, an infrastructure has to support transaction processing rather than just batch applications; this seems to be a major distinction between current grid and cloud systesm. Perhaps a cloud is analogous to a VM version of a cluster job submission system, rather than anything more complex?




Posted by: merzky 2008-09-19 14:39:28Answer to Comments by Craig Lee
Dear Craig,

below some answers to questions you raised, along with actions taken
to address these points in the document.


> How do "simplicity" and "affinity" relate?

Simplicity and affinity don't relate directly per se. It is more that
the _focus_ is on a _limited set_ of user modes, while \jha{providing?
exhibiting? exposing?} affinities allows a certain simplicity of user
interfaces. Were Clouds to expose a gazillion usage modes at the same
time, with a concomittant increase in the complexity of interface,
simplicity of use would loose out again.

==> ACTION: We added some clearifying statements.


> I think some grids exhibit affinity in the types of applications
> they support (data grids and compute grids) but the simplicity of an
> API that presents a useful model of resources that are for all
> intents and purposes virtualized on the back-end seems to presuppose
> (or demand) an affinity in that useful model.

Yes, exactly!: narrow Grids do indeed focus on affinities, but what
they are usually missing are the application level abstractions (API,
or application framework etc.) to map the respective usage modes to
these affinities.


> Clearly an API represents some sort of abstraction of the underlying
> semantics of the model "the system" is supposed to implement.
>
> Is the real simplification that has been achieved here that
> clouds+virtualization allows the issue of "job scheduling" on
> specific hardware platforms to be cast into an issue of "resource
> provisioning"? By statically scheduling virtual nodes or a cluster,
> the end-user can run jobs as usual. Virtualization (and perhaps more
> homogeneity on the cloud back-end) means that reliability and
> fail-over can be better.
>
> Does virtualization enable simple, minimal, but useful APIs? Said in
> another way, does virtualization allow grungy detail to successfully
> hidden in the back-end and not exposed to the end-user?

It is not obvious that, virtualization is a Cloud-only property.
Keahey and Foster showed the usefulness and feasibility of
provisioning VMs on Globus based Grids. Also, we could envision a,
Grid which uses Condor and allows job execution w/o virtualization
(that assumes a certain amount of homogeneity of Cloud resources
though). E.g., is the Amazon DB message queing Cloud really
virtualized, or is it just providing some abstraction? We may be
splitting hairs here though ;-)

==> ACTION: we added minor changes in the document, to clearify that
we do not discuss virtualization as Cloud property.


> Another perspective on this is perhaps the notion of "the
> achievement of viable simplicity". Not only are the APIs simple,
> they are useful for a critical mass in the marketplace.
>
> Where/What is the sustainable markets for grids and clouds? Clearly
> clouds have gained marketshare as a provisioning mechanism. But
> something like "frameworks" are still needed to make distributed
> systems easier to build and manage, even if provisioned by clouds.
> Could the traditional grid concept be viewed as essentially a cloud
> framework? Is this essentially one of the Usage Models or Modes?

Not sure if we are fit to answer the market question. Yes, we
believe that traditional Grids can be seen as _a_ Cloud
framework: missing are mostly:
- higher level abstractions, for specific usage modes
- SLAs
- Cloud business model

==> ACTION: We changed the text to clarify the limited scope of our
Cloud definition.


> Very minor comment:
>
> It seems like there are figure labels missing from Figure 1
> ("Compute Resource" perhaps) and Figure 2 ("Network Resource"
> perhaps).


==> ACTION: That has been fixed.

Craig, many thanks for your comment, and for our discussions during
the last week!


Posted by: merzky 2008-09-19 14:56:56Answer to Comments by Ian Foster (posted by anonymous)
> Posted by: anonymous 2008-09-10 09:22:49
> Comments by Ian Foster
>
> The authors of this document make several assertions with
> which I take exception:
>
> 1) "There is a level of agreement that computational Grids have not
> been able to deliver on the promise of better applications and usage
> scenarios."
>
> It is fascinating to watch the Gartner hype curve in action, if sad
> to see people stuck in the trough of despondency. But the fact is,
> fortunately, that there are substantial grid projects and
> applications that are having substantial success. Ones that come
> immediately to mind are the Earth System Grid, cancer Biomedical
> Informatics Grid, and the LIGO Scientific Collaboratory, but as it
> is today that the LHC was switched on, we should also recall the
> remarkable successes of the LHC Computing Grid. At a different
> level, Globus people will be happy to talk about the millions of
> files moved via GridFTP every day, and Miron Livny will be happy to
> talk at length about how many millions of CPU hours are delivered
> every day via Condor.

Yes, there are wonderful examples of successful Grids.

It is critical to note that all successful Grids examples provided
are, by our definition *narrow* Grids -- which we state several times
are arguably, the only Grids fit for purpose. Thanks for helping make
our point!

But a few positive counter examples do not change the overall state of
despair, dysfunction and disrepute.

We are not arguing that Grids as a concept have failed. We are saying
that at some point along the evolution, Grids have become difficult to
use as distributed systems. We agree the reasons are complex and we
are the first to warn against over-simplified reasons. We will add a
more nuanced discussion.

==> Action: we will clarify that we don't argue that Grids are useful, but
that they missed their goal of pervasive/ubiquitous vision.


> 2) To address this purported lack of success, "there is a need to
> expose less detail and provide functionality in a simpli?ed way. If
> there is a lesson to be learned from Grids it is that the
> abstractions that Grids expose – to the end-user, to the deployers
> and to application developers – are inappropriate and they need to
> be higher level."
>
> No evidence is provided for this assertion that complex interfaces
> are the reason for the difficulties people have with grids. I argue
> that the issues are more complex.

True, we don't provide much evidence, other than our documented
experiences talking to the community about what they feel are some of
the challenges. But yes, we should address this, and be more careful
that it is only part of the reasons. Thanks for these comments. It is
however not the topic of the paper to evaluate the performance of
Grids -- it is merely part of the motivation for our approach, and is
an observation we make.

==> Action: we will provide some evidence for the need of simplier
interfaces, to better motivate our discussion.


> First, the interfaces themselves are not, in my view, a significant
> issue. We can argue whether we prefer REST or Web Services, or say
> Nimbus (a grid virtualization interface) or EC2 (a cloud
> virtualization interface), but the differences among these
> alternatives are not great.

It is not about the technology (REST vs WS vs. Nimbus etc), it is
about the level of detail being exposed. Example Globus (especially
chosen for you :-)

- go to Globus 4.0.2 WS, API documentation, Java version.
- pick 4 sections out of 52(!) (eg. those ending in '_client_java')

That yields a total of 30 classes, with a total of ~170 methods
(not counting c'tors, inherited methods etc.). Assuming that
the pick is representative, for 52 sections one would see >2.000
(!) calls.

I know _I_ would have trouble remembering after one day, that

org.globus.exec.client.GlobusRun.kill()

takes a string as argument, and

org.globus.exec.client.GlobusRun.terminateJob()

takes a GramJob instance. Or to remember the name of this
function:

GramJob.populateStagingDescriptionEndpoints()

Do I need to call that method? When? Why? Does that call work
against Globus-2.x? Globus-4.0? Globus-4.2? (Gram versions
changed w/o being backward compatible).

And, just for fun, from another package:

setManagedJobPortTypePortWSDDServiceName (java.lang.String)

:-))


Lets compare that to Amazon's cloud API: it has 34 API calls
(there are no inherited calls). About half of the calls are for
setting description, setting securities, etc. So, there remain
less than 20 calls for running and managing a VM instance.

Note that we are not arguing that Globus is not fit for its purpose -
it probably is. We argue, that if you take a Globus based Grid, and
implement something like Amazon's EC2's API on top of it, along with
the SLAs, usage policies, and business model, you turn that Grid into
a Cloud, as the _exposed_ semantics would be limited, and it would
focus on a much smaller set of usage modes (i.e for specific
application classes), which would then be very easy (i.e. trivial) to
use.

Rinse and repeat for other application classes, i.e. for clouds
with other usage modes (Amazons Storage, Queuing, DB cloud
etc.)

BTW: We really value the work that Kate Keahey, you and others
have been doing in that direction, on Nimbus!

> [...] interoperate." The authors make a big deal of this point, but it is
> not clear to what purpose.
>
> It is true that interoperation is not automatic. [If only everyone
> used Globus software, then all would be well :)

:-D

> -- although of course the policy issues would remain]. But
> I am not sure that this is a significant problem for users, or
> hard to achieve when it is needed. E.g., the caBIG team
> recently demonstrated a gateway to TeraGrid. The LHC
> Computing Grid integrates resources worldwide. Etc. Most users
> never ask about interoperability, in my experience.

Interop has two dimensions in our opinion: system interoperability (a
Grid can utilize resources of another grid), and application
interoperability (an application written for Grid A can also run on
grid B, w/o major changes). We are mostly concerned with the latter
("As an application developer/user, what do I care if it is a Grid, a
Cloud, or a Grid-of-Clouds, Clouds-of-Grids...."), and although we do
mention "application-level interoperability", we should make that more
clear in the paper. Thanks for pointing that out.

==> ACTION: we will make clear that we talk about app interop, mostly,
and that GIN experiences show that system level interop is possible,
although it is not simple.


Ian, many thanks for your comments! We sincerely appreciate that you
found the time to both read the document, and to provide your feedback!


Posted by: merzky 2008-09-19 15:06:55Answer to Comments by Dave Berry

> The OGF should not publish this document in its current form.
>
> The authors set out to investigate the usability differences between
> existing Grid and Cloud systems from the point of view of an
> application developer. This is a commendable goal. The authors
> introduce the notions of Usage Mode and Affinity, which seem to be
> useful. However, their presentation is profoundly flawed by their
> generalisations about grids and clouds.
>
> The root of the problem is that there is no generally agreed
> definition of grid or cloud.

We have tried to stay away from presenting our take on Clouds as
a crisp definition. That may indeed be a mistake (we got other
feedback in that respect, too). We will add a definition for
Grids and Clouds which (at the least) will hold valid within the
terminology framework of this paper, and potentially beyond --
we are not assuming though that this definition will be the one
and only Cloud definition. The fact that the latter one does
not exist at the moment should, however, not stop us from
discussing the concepts inherent to Clouds.

==> Action: we will explicitly add our cloud definition


> When the term "grid" was the marketing
> buzzword du jour, all sorts of systems were called grid. Even if we
> exclude clusters, there is clearly room for desktop grids (e.g.
> Condor, Digipede), enterprise grids (e.g. Ebay) and inter-enterprise
> or scientific grids (e.g. EGEE, TeraGrid). Each of these would
> generate a different comparison with a Cloud system such as Amazon

Actually, in our framework they would not, or at least not to
the extend one would think (although we certainly agree with
your buzzword observations! :-). For us, a Grid and a Cloud
simply differ in that a Grid tends to expose a maximal set of
the available system semantics, whereas a Cloud tends to expose
a minimal set of the available system semantics, just enough to
support the usage mode specific to that cloud. That definition
seems to be valid for a very large and diverse set of Grid (and
Grid like), systems.

==> Action: we will try to make that point more clear - it is central
to the document. The introduction of the definitions, see above, will
certainly help to that end.


> EC2. For example, an enterprise grid might underpin a cloud
> offering, whereas an inter-enterprise grid might link one or more
> "Cloud" offerings, possibly combined with other resources. The
> discussion of the "fit" between clouds and grid in this paper seems
> to jump between the two. On the one hand, the grid it mentions most
> often is the TeraGrid, which operates on an inter-enterprise model.
> On the other, it suggests that grids might underpin Cloud offerings,
> whereas current commercial cloud offerings exist primarily on
> enterprise grids.

I had trouble parsing the last part:

'it suggests that grids [...] underpin Cloud offerings,
whereas [...] cloud[s] exist [...] on [...] grids.'

But I think you are saying that the layered relation between
grids and Clouds is not hewn in stone. We certainly agree from
a implementation perspective.


> The paper further confuses implementation technologies with
> fundamental concepts. It seems to suggests that grids are
> necessarily built on the SOAP/WS-* stack. Certainly this is the
> approach taken by OGSA and (partially) by TeraGrid and EGEE. On the
> other hand, Condor could not be further from this model.

Again, we absolutely agree, and we cannot be further from the
standpoint that Grids are not SOAP/WS bound! It seems we are
too fuzzy in distinguishing in what we present as concepts, and
what we present as examples: our concepts (see definitions in
Section II) do not mention _any_ specific technology. We will
try to make that distinction clearer.

==> ACTION: we will state explicit that Globus/WS Grid thingies are
_examples_ only.


> I do believe the paper is right to consider these differences from
> the point of view of the application developer. It may be that an
> inter-enterprise grid could be considerably improved by offering a
> restricted API for each usage mode. I share the authors' prefenence
> for simple, task-directed interfaces.

We are happy to hear that.


> In my opinion, the authors would do better to drop the
> generalisations about grids and clouds. Instead, they should pick a
> certain few systems - e.g. Condor, EGEE, TeraGrid, Amazon EC2 and
> Flexiscale. They should contrast the developer APIs of these
> systems, in order to identify their usage modes and affinities. This
> may well lead to a conclusion that the cloud offerings are better
> designed and easier to use, but I hope that this conclusion will be
> based on a rigorous examination of actual systems rather than the
> frankly dubious generalisations of the current draft.

That sounds like a valid approach, which we did, on purpose, not
take, for the following reasons:

- as you state before, it is disputable what systems actually
classify to be a cloud. The various circulating definition
are rather contradictory in that respect, and for some, S3
would for example not be a cloud.

- it is actually the intent of the paper to provide a certain
formalism, and thus an abstraction from a certain
viewpoint, in order to come to a better defined terminology
when discussing Grid and Cloud concepts. A case based
discussion as you proposed would not further that goal.
OTOH, given your comments, we seem to have missed that
anyway :-/

==> ACTION: clarify why we follow an abstract, not a
empirical/phenomenological approach.


> It is perhaps worth mentioning that there is no single definition of
> Cloud. It seems that several cloud offerings are based on the notion
> of submitting a virtual machine to the system, as opposed to the
> grid notion of submitting an application. If this is the case, then
> it seems worth highlighting.

We don't share that notion. In particular, that definition
would immediately break on Storage clouds, e.g. S3, see above.


> Particularly in the commercial field, an infrastructure has to
> support transaction processing rather than just batch
> applications; this seems to be a major distinction between
> current grid and cloud systems.

It is not clear to us if this is a distinction between Grid and
Cloud systems, or between commercial and academic systems.


> Perhaps a cloud is analogous to a VM version of a cluster job
> submission system, rather than anything more complex?

Again, we think that a Cloud definition solely focusing on the
job management and virtualization properties of Cloud systems is
bound to fail for not job-oriented Clouds. We certainly agree
that for compute clouds (or, in our language, compute affine
clouds) virtualization seems to be the dominant enabling
technology, but for other cloud affinities, e.g. for storage
affine clouds, there seems no clear technology winner at the
moment. That observation again is part of our motivation to
attempt a system based description/definition of the Cloud
phenomenon.


Dave, many thanks for your valuable comments. We will modify the
paper accordingly, and, given the initial proposal of yours (not
to publish the document as is) would be very happy to get again
feedback to the modified version! We would of course also be
delighted to discuss the concepts we describe therein, in this
comment thread, or by mail/phone etc.

> login   RSS RSS Contact Webmaster

OGFSM, Open Grid ForumSM, Grid ForumSM, and the OGF Logo are trademarks of OGF