Public comments are a very important part of the OGF document approval process. Through public comments, documents are given scrutiny by people with a wide range of expertise and interests. Ideally, a OGF document will be self-contained, relying only on the other documents and standards it cites to be clear and useful. Public comments of any type are welcomed, from small editorial comments to broader comments about the scope or merit of the proposed document. The simple act of reading a document and providing a public comment that you read it and found it suitable for publication is very useful, and provides valuable feedback to the document authors.
Thank you for making public comments on this document!
Comments for Document: GLUE Specification v. 2.0
|Author(s):||S. Andreozzi, S. Burke, F. Ehm, L. Field, G. Galang, B. Konya, M. Litmaath, P. Millar, J. P. Navarro|
|Public Comment End:||13 Aug, 2008|
(Distinguished Names) with different delimiters are specified.
Section 16.3.8 defines as DNs: "X509 uses a X500 namespace represented
as several Relative Domain-Names (RDNs)
concatenated by forward-slashes". A slash-separated DN notation is also
used in the examples throughout the document.
I was not able to find such a definition in the X509 spec. As X509 stay
rather general, are you sure it implements a forward-slash
Section 17.4., in contrast, defines a DataType DN_T as a RFC 4515
RFC 4515 says "There is zero or more relative distinguished names,
I propose to either
- specify both delimiters, fix the X509 citation and state clearly in
which cases which notation is to be used, or
- decide for the RFC4515 notation (comma separated), which seems to be
(better) standardized and rewrite the examples.
Also at the beginning of section 16.3.8, the sentence "It must start
[...]" (state ?) should be improved.
I think the first sentence referring to "specializations" could be reworded - it's ambiguous for me what the word means in this context.
Section 17.24 AppEnvState_t
For the pending removal description, isn't "is due to be removed" more appropriate than "as soon as possible"? I think perhaps giving indications of time is out of context here.
I would suggest to include in all Entities the field
OtherInfo - as it is in some of them in form of placeholder to publish info that does not fit in any other attribute. Free-form string, comma-separated tags, (name, value ) pair are all examples of valid syntax.
Our EDGeS project will have to publish information on entities of 'Desktop Grids' (Grids of Computer Scavenging), where resources are volatile, and GLUE schema 2.0 seems flexible enough and have adequate place-holders for unknown data.
Chapter 5.3 Contact
I suggest to add a 'Name' (String, 0..1) property :
Even if this property will not always contain the name of a real person, it can contain the detailed name of a responsibility.
Chapter 5.6 Endpoint
Attribute 'TrustedCA' :
Typo in the description : 'issues' --> 'issued'
Chapter 5.11 Policy
- Attribute 'Rule' :
Typo in the description : 'is provide' --> 'is provided'
- Last paragraph : 'then these policy instances SHOULD be expected to be consumed independently' : Could you be more explicit ?
Chapter 6.1 ComputingService
End of the second paragraph : 'as part of the computing service' :
I suggest to add 'same' before 'computing service'.
Chapter 6.6 ExecutionEnvironment
Entity 'ExecutionEnvironment' :
Typo in the description : 'envonrment' --> 'environment'
Chapter 6.10 ToStorageService
Entity 'ToStorageService' :
This entity is not at all symmetrical to the 'ToComputingService' entity.
In order to avoid confusion, I therefore suggest to rename 'ToStorageService' as 'ToPosixStorageService'.
Chapter 7. Conceptual Model of the Storage Service
Throughout this chapter, you use both words 'capacity' and 'extent' as if they are synonyms.
- If yes, please state it clearly at the beginning of the chapter.
- If not, please explain the difference.
Chapter 8. Relationship to OGF Reference Model
- Can you provide more explanations :
For example, what is the meaning of the arrow between 'Entity' and 'GridComponent' ?
- Is it possible for you to show examples ?
Chapter 9. Security Considerations
I suggest to write, at least, that concrete data models must ensure availability and reliability of the published data. Therefore :
- Resiliency to DoS attacks is mandatory,
- Resiliency to intrusion and counterfeits is mandatory,
- Dynamic redundancy can help.
Chapter 17. Appendix B: Data Types
Is it possible to sort the enumeration types alphabetically ?
That would permit a reader in a hurry to find an enumeration type quicker.
Chapter 17.5 Capability_t
Value 'executionmanagement.candidatesetgenerator' :
Typos in the description : 'a nit of workcan' --> 'a unit of work can'
Chapter 17.9 EndpointHealthState_t
I would suggest to add following value :
'compromised' : It was possible to check that there are security issues
Chapter 17.19 Platform_t
I suggest to be consistent with 'draft-ggf-jsdl-spec-28.doc' describing JSDL, and to use the values listed in Table 5-2 'Processor Architectures' of Chapter 5.2.1 of the JSDL document.
Chapter 17.21 OSFamily_t
I suggest to be consistent with 'draft-ggf-jsdl-spec-28.doc' describing JSDL, and to use the values listed in Table 5-4 'Operating System Types' of Chapter 5.2.3 of the JSDL document.
In order to avoid ambiguity, I suggest to rename them :
- One as 'ReferredActivity.Id'
- One as 'ReferredByActivity.Id'
RFC 2307 (http://www.ietf.org/rfc/rfc2307.txt) gives a standard mapping for entities related to TCP/IP networking and services (e.g. hostname/port/service combinations) to LDAP. Perhaps a similar model could be used in GLUE 2.0?
This mapping is already used by some common services for dynamic lookup, e.g. Apache ActiveMQ :
Sec. 6.3, p21, MinWallTime entry: "than" -> "then".
Sec. 6.4, p24, CacheTotal and CacheFree entries: "consequent" -> "subsequent".
In Sec. 6.4 for the ComputingManager entity, I have a couple comments related to the WorkingArea* attributes:
1) On my site, we typically give normal and MPI jobs different current working directories (WorkingAreas). For the normal jobs, we create a temporary area in /var for the job. This area is removed at the end of the job. For MPI jobs, we set the working directory to the shared home directory so that all of the processes in the MPI job see the same files. For this case, are we expected to publish multiple ComputingManager entities? Probably this is really a question on whether this information is placed correctly in the overall model.
2) Actually for all jobs both the temporary area in /var and the shared home are always visible. We set the current working directory as appropriate; however, a job technically has access to both of these areas. I imagine that there may be jobs (esp. parallel ones) that would like to take advantage of both. Perhaps you should consider being able to publish multiple "WorkingAreas" for a ComputingManager with a mechanism to identify the default for a particular job.
The "Cache*" attributes imply that some caching mechanism will be made available to users. However, this brings lots of questions about who the cache is open to, who manages contention for the cache between multiple jobs, how the existance of files in the cache are published to users, etc. These two attributes provide too little information for end-users to know what to do and I question whether including them is useful.
For the "ScratchDir" attribute, the description says this is a shared area. However, the extent of that sharing isn't clear. Is it shared between all grid jobs (from the same user) or all processes in a job? For a normal job, would a dedicated area on the local disk count as a "ScratchDir"? It is important to clearly distinguish the "TmpDir", "ScratchDir", and "ApplicationDir" properties so that system administrators publish something consistent and these values are meaningful to users.
For the three attributes "TmpDir", "ScratchDir", and "ApplicationDir", the values are described as absolute paths or paths. However, all of these are very likely to be different depending on the user running a job. Are environmental variables allowed to be published for these values? If not, in cases where a unique value cannot be given, what should be published?
The "*Dir" attributes indicate that at least three different types of storage can potentially be made available to users. There are certainly applications that will want to take advantage of the differences between those storage resources. However, this also means that users will want to specify how much space they need in those various areas. Currently, there are no attributes to actually indicate how much space is available (to a job) in those areas. Having attributes that indicate the existance of these areas without giving parameters such as the size is probably not terribly helpful to users.
The last comment raises a similar problem with Sec. 6.3 "Computing Share". There is only one attribute ("MaxDiskSpace") to specify the maximum disk space policy. If multiple types of storage areas are advertised (as in the ComputingManager), then the policy should also contain attributes corresponding to each type.
In addition, the description of MaxDiskSpace implies that the value corresponds to the "WorkingArea" of the ComputingManager. If that is the case, then explicitly saying "WorkingArea" would make it clear what the limit applies to.
Many batch systems enforce overall wall times and total CPU times; hence, those are the values that a system administrator will set when configuring her batch system. These will correspond to the "MaxTotal*Time" attributes (Sec. 6.3 Computing Share). However, what will she publish for the "Max*Time" attributes? If only normal jobs are accepted there is no problem; they are the same as "MaxTotal*Time". But if the site accepts parallel jobs with up to 100 slots, what is the correct value to publish for "Max*Time"? The actual limit depends on the number of slots requested; a single correct number cannot be published.
No doubt one can configure the batch system for per slot limits, but this is certainly not the usual case nor the most straight-forward. I suspect that there will be many sites that publish incorrect values for the "per slot" attributes diminishing the utility of these values for scheduling.
The "per slot" values are also likely to be much less interesting for users as well. The typical case is that one has an parallelized application and one increases the number of CPUs to find the most efficient scale of the application. With the "per slot" values, the user must recalculate the CPU and Wall limits everytime she resubmits the job with a different number of slots. However, the total CPU consumed is approximately the same and the wall clock time diminishes. This means that it would be much more convenient to specify the wall and total CPU limits once and only have to change one parameter in the job description.
Overall, I would suggest revisiting the decision to use "per slot" values. For users and for system administrators the "total" values are likely to be more consistent and useful.
There should be a clearly defined distinction in that type tree between middleware and grid organisations to avoid the usage of different strings for the same service by different grids.
For globus, e.g. org.globus.ws-gram could possibly be better than org.teragrid.ws-gram because the former notation would be consistent with the definition of org.glite.wms
It could also be an improvement to define the second element as the name of the service implementation provider (like globus-alliance, EGEE, etc.)
There are quite a lot of typos to eliminate - and even words that a spell checker should pick up.
In the introduction you refer to a conceptual model but then in section 3 you refer to an information provider - this does not sound conceptual.
The last paragraph of section does not add anything. Are you allowed to add and delete from GLUE and still call it GLUE?
In chapter 5 the terminology gets confused. You use the term entity (as in ER I presume) where you have previously used class and you also have an entity called "entity". I would change, for example, at the top of page 7 to read "This entity is the root class from which all the GLUE classes inherit ...". Your UML diagram then has classes and one class is called Entity.
Entity - creation time - is this the creation time of this description of the entity or of its real world counterpart? In either case it cannot be optional - if we ignore theological debates - everything was created at some time. This is supposed to be a conceptual model as it says in the introduction. I see, reading ahead a bit, in Appendix A that you explain about place holders for unknown data. Again this is not conceptual unless some information is inherently unknowable. All of appendix A should be in the particular bindings. For example if it is relational and you don't know the value then you can use NULL. It might be the responsibility of a specific profile to define what is actually meant by missing information - e.g. "Don't know", "not applicable", "I am not going to tell you" etc.
Location - latitude and longitude - this cannot be optional for a geographical location. On the contrary there are many geographical locations that have no human readable name.
AdminDomain - Distributed - I don't like the definition very much. More of a problem is implementing the optional status. Many information systems only have 2 values for booleans.
I suggest that you are very clear about what this document is, how it relates to the bindings and whether or not you are going to use profiles to define what information should actually be published and how to interpret missing/null information.
As I watch ATLAS reco jobs being butchered by various Batch Systems, this is the
quantity I`m most interested in.