Last modified: Jan 24, 2012 by Calvat

HPSS: starting to use HPSS


 1   Introduction: what is HPSS ?
 2   What types of file can be stored in HPSS ?
 3   Can HPSS be used for archiving files ?
 4   Why must a user be defined to HPSS ?
 5   How can a user be defined to HPSS ?
 6   How can I access to HPSS ?
 7   How and why declare my HPSS jobs to the batch system SGE?
 8   Where can I create files in HPSS ?
 9   How are HPSS files named ?
 10   What server name should I use ?
 11   What is the use of a COS (Class Of Service) ?
 12   How do I set a COS ?
    12.1   Creation by rfcp
    12.2   Creation by API
    12.3   Creation by bbftp
 13   Where is a file in HPSS really located ?
 14   How are authorizations managed ?
 15   How do I specify authorizations on a directory ?
 16   How do I specify authorizations on a file ?
 17   Can I access HPSS from another computing center ?
 18   Can I access data in HPSS directly by using UNIX commands ?
 19   How can I use a symbolic link in order to access a file in HPSS ?
 20   Does the command rfcd exist ?
 21   How do I find a file in HPSS ?
 22   What should I do if I have a problem ?

1. Introduction: what is HPSS ?

HPSS for High Performance Storage System, is just what its name implies, i.e., a system capable of storing very large quantities of data with good performances during storage and recovery (write and read). However, if one does not respect certain conditions while using it, its performance is seriously degraded.

2. What types of file can be stored in HPSS ?

The performance of HPSS is seriously degraded by the storage of small files, where "small" generally indicates a file whose size in smaller than 100 MB. Nevertheless, the storage of smaller files is allowed in certain cases. If you need to store smaller files, contact us. Especially, one must never store files smaller than 20 MB without the carrying out of a preliminary study by the experiment and the Computing Center.

3. Can HPSS be used for archiving files ?

HPSS is not an archival system. In particular, it stores only a single copy of a file, whereas the Computing Center's archival system, elliot, stores two.

4. Why must a user be defined to HPSS ?

If a user is not defined in HPSS, RFIO command behavior is undetermined.

5. How can a user be defined to HPSS ?

He must request the "czar" of his experiment or laboratory to request it. In order to find out the username of your czar, use the command:
> anq czar

6. How can I access to HPSS ?

By RFIO, i.e., an API and the following commands:For more details, you can have a look on the RFIO client or the command man pages.

7. How and why declare my HPSS jobs to the batch system SGE?

Batch jobs which use HPSS must be submitted with an option which declares this use to the SGE batch system monitor. In order to do this, just add the following option to the qsub command:
-l hpss=1 
which gives, for instance:
> qsub -l hpss=1 job_file_name 
In case of maintenance of the HPSS cell or servers, this option allows us to selectively inhibit the startup of these jobs, which would in any case crash without HPSS, thus avoiding the necessity of restarting them later on. Clearly, this option is in everyone's interest, the uses' and the computing center's.

8. Where can I create files in HPSS ?

The HPSS file space has a tree structure with a root independent from that of the local file system or AFS. Normally, you can access a "HOME" directory and a "GROUP" directory. The access paths are:
/hpss/in2p3.fr/home/x/your_name
/hpss/in2p3.fr/group/your_group
where x is the first letter of your_name. For example, for the user titi of neutrino experiment:
/hpss/in2p3.fr/home/t/titi
/hpss/in2p3.fr/group/neutrino

9. How are HPSS files named ?

The name of an HPSS file accessed by RFIO is composed of two parts:Which gives:
rfio_server:/hpss/in2p3.fr/home...
The tree structure starts with /hpss for production and with /hpsstest for the HPSS test server.

10. What server name should I use ?

One should not use the real name of a server. For this reason, we provides you an alias for each experiment, of the form:
cchpssmanip
where manip is usually the name of the experiment. Examples are:
cchpssd0: server for D0 experiment
cchpss0:  server for COS 0

11. What is the use of a COS (Class Of Service) ?

A COS allows the appropriate management of each type of file. This optimization takes into account the file size, the type of support and hierarchy (usually disk + tape), the amount of diskspace. It is by the COS that one can also have access to special services such as a double copy on tape. The COS is linked to each real file (file, neither directory nor link). A directory may contain files belonging to different COS. An experiment should use different values of COS depending on the file type (size, management). A COS can be shared by several different experiments.

12. How do I set a COS ?

The COS must be defined at the moment a normal file is created. There are several ways of doing this:

12.1. Creation by rfcp

You must define the variable ${RFCP_HPSSCOS} to the value of COS desired before using rfcp.
> export RFCP_HPSSCOS=3
> rfcp titi ccmcrs10:/hpss/in2p3.fr/home/t/toto

12.2. Creation by API

The instruction following the open (or rfio_open) must be rfio_setcos().
int rfio_setcos(int rfd, int filesize, int cosid);
Where:Example:
int main() {
            int cosid    = 3;
            int filesize = 0;
            rfd = rfio_open("ccmcrs10:/hpss/in2p3.fr/home/t/toto/test",
O_CREAT|OWRONLY, 0777);
            if (rfio_setcos(rfd, filesize, cosid) {
                rfio_perror("rfio_setcos");

12.3. Creation by bbftp

The COS can be specifiied by a specific sub-command.

13. Where is a file in HPSS really located ?

The actual location of a file is independent of its place in the HPSS directory tree. It depends on the COS (class of service). Two files in the same directory may be stored independently. If a COS corresponds to several levels in a hierarchy, the file may be in any level of the hierarchy. When a file is read, il is always copied ("cached") to the lowest level of the hierarchy (usually to disk). Currently, there is no way of knowing on what level of a hierarchy a file resides.

14. How are authorizations managed ?

Authorizations in HPSS are managed on a file basis. They are visible via the RFIO interfaces as similar to UNIX file authorizations (ugo<->rwx).

15. How do I specify authorizations on a directory ?

The Command rfmkdir includes an option -m, by means of which one can specify authorizations in octal. See the man for rfmkdir.

16. How do I specify authorizations on a file ?

The command rfcp recopies the authorizations of the original file. However, you must always take into consideration the value ofumask which is generally 022. This value forbids write access to the group. Therefore, you may want to set 'umask 002' and do a local chmod before copying the file. You can also use the rfchmod command (at the CCIN2P3 only) after a file has been created. If you use the RFIO API, the usual rules apply, so again look out for the value of umask.

17. Can I access HPSS from another computing center ?

For instance, from the laboratories of the IN2P3, the DAPNIA, CERN, SLAC or FermiLab. It is possible to transfer files to HPSS from a remote site by using bbftp. This procedure requires the installation of a server on the remote site. For use of bbftp, see bbftp. There is no specific read-only access mode for remote sites.

18. Can I access data in HPSS directly by using UNIX commands ?

Contrairy to XTAGE practice, it is not possible to mount HPSS files via NFS or other distributed file systems. This is logical, given the volume of data that HPSS manipulates. Commands such as 'find' can never be used in HPSS. HPSS is only accessible via RFIO (API, specific commands, bbftp).

19. How can I use a symbolic link in order to access a file in HPSS ?

It is possible to access an HPSS file via a symbolic link on the condition that the symbolic link point directly to the file. The link can not be concatenated with the name of a file or directory. Example:
> ln -s ccmdrs10:/hpss/in2p3.fr/home/t/thibault  monrep
> rfdir monrep
drwxr-xr-x   4 thibault   ccin2p3     512 Nov 29 18:09 .
drwxr-xr-x   5 root       system      512 Nov 23 11:24 ..
drwxr-xr-x   2 thibault   ccin2p3     512 Sep 06 11:54 rep1
> rfdir monrep/rep1
monrep/rep1:  No such file or directory

20. Does the command rfcd exist ?

No.

21. How do I find a file in HPSS ?

Commands to search systematically for a file in the HPSS file tree are excessively slow. We therefore recommend strongly that each experiment possess a data base outside of HPSS. If, in addition, one wants to obtain/stock information on file contents or other parameters (date of run, etc.), the use of a data base becomes indispensable. There is no question of reading a set of HPSS files in order to determine whether one of them contains desired information.

22. What should I do if I have a problem ?

You may contact the Computing Center user support.