The KiloKluster: Where is Everything?

Hiram Clawson � 05 November 2004, Updated 2006-03-03 hiram@soe. ucsc. edu

Introduction

This document describes the filesystems and machine organization of the UCSC KiloKluster. There are some details that may be missing here. Corrections are certainly welcome. This document is intended for users new to the KiloKluster that are trying to find where everything is.

Some of this information is documented elsewhere. See also:

Update to this note 2006-03-03

Browser project, Getting Started

The Parasol Parallel Batch System

http://www.cbse.ucsc.edu/research/cbsecomputing

UCSC KiloKluster Filesystem Organization

The idea of the KiloKluster filesystem organization is to optimize numerous data pathways into and out of the KK.

There are four data pathways for each cluster node:

1. Local disk filesystem: /scratch

2. Individual rack NFS filesystem: /iscratch/i

3. NFS filesystems outside the KK filesystems from other fileservers

4. Local disk filesystem: /tmp

/scratch

/scratch is a dedicated local disk on each cluster node. It is 100 Gb of space. It contains data that will be commonly used by numerous processes over a period of time on the order of weeks. Typical contents include current genomes being processed, their nib files, their repeatmasker outputs for use in blastz runs, etc. /scratch is loaded with data by users by placing the data in /cluster/bluearc/scratch and then emailing a request to cluster-admin to do an rsync of /scratch. The rsync copies the contents of /cluster/bluearc/scratch to the local /scratch in each of the 512 nodes in the KK. Since this is an expensive data transfer operation, care should be taken for each request.

/scratch/i

Each of the eight cluster racks has a special root node that serves up a 500 Gb filesystem via NFS to its rack set of nodes. To the root nodes, this is a local filesystem. To the other nodes in that rack, it appears as an NFS mount as /iscratch/i. To load data into the /iscratch/i system, users must set up their data on the machine kkr1u00 in /iscratch/i. When the data is correct there, users can operate the rsync themselves by running /cluster/bin/iSync. This copies the data from kkr1u00 to kkr[2-8]u00, which is only seven copies, thus not as expensive a data transfer operation as in the case of /scratch. The data here can therefore be more volatile than /scratch. Typical contents are a second copy of genomes being processed, often the blastz query sources to target genomes on /scratch.

External filesystems

Outside the KK are numerous filesystems available via NFS from various fileservers. Currently named:

/cluster/store[1-7]
/cluster/bluearc
/cluster/home

These are mostly archives of genome sources and data processing results from genome processing. Because they are automounted, they will not appear with a df command unless they are in use. To make the automounter access them, do an ls command on them.

You can see which fileserver they come from via the df command. These sources can change over time as machines are rearranged and upgraded to balance I/O loads.

Currently the NFS servers are:

kkhome:� /cluster/home - you can not log on to kkhome
eieio: store1, store4, store5
kkstore: store2, store3
kksilo: store6, store7
/cluster/bluearc is a dedicated NFS server - you can not log on to it

/tmp

Sometimes a cluster job will have a heavy temporary file I/O requirement that does not need to come from a dedicated fileserver. In this case, the cluster job is typically a script that starts up and creates temporary directories in /tmp. It may even copy some of its source data from the other data sources described above if it is going to be reading them often during the processing. Results produced here must be copied back by the controlling script to one of the outside NFS resources, not /scratch or /iscratch/i. Beware that temporary names here must be unique for each cluster job since you can have two cluster jobs running on the same node (2 CPUs per node). Also, beware that the controlling script must clean up its tracks in /tmp. Otherwise, that directory on each machine fills with garbage.

KiloKluster machines and parasol clusters

Large cluster

441 nodes * 2X CPUs = 882 CPUs
Parasol Hub machine:� kk
Log on to kk to operate parasol and para commands
Cluster nodes are: kkr[2-8]u[0-5][1-9] and kkr[2-8]u6[0-3]
Rack 1 kkr1 machines have been removed from service

Small cluster

8 nodes * 2X CPUs = 16 CPUs
Parasol Hub machine:� kki
Log on to kki to operate parasol and para commands
Cluster nodes are: kkr[1-8]u00

Rack Nine cluster CURRENTLY OUT OF SERVICE 2005-06-01

50 nodes * 2X CPUs = 100 CPUs
Parasol Hub machine:� kk9
Log on to kk9 to operate parasol and para commands
Cluster nodes are: kkr9u01 - kkr9u50

Update 2006-03-03

We currently have three clusters, their hub control machines are:

kk  pk  kki

kk - our older cluster, approximately 500 CPUs, Pentium III 860 Mhz, 1Gb memory

pk - our newest cluster, 390 CPUs AMD Opteron's 2.0 Ghz 64 bit machines, 4 Gb mem
the preferred cluster these days since it is so much faster than kk Beware, MACHTYPE here is x86_64 and it is nice to have 64 bit binaries for all programs used although it is not necessary. These Opteron's will run existing i386 binaries just fine, although with 32 bit memory limitations.

kki - file servers to the kk kluster, 14 CPUs, AMD Opteron's 2.2 Ghz 64 bit machines,
8 Gb of memory, used for special big jobs

The /cluster/storeN/ filesystems are ordinary NFS filesystems served up by by ordinary Linux boxes which you can login to and work on the files directly without NFS in the way. Useful for big data moving operations. In fact, anytime you want to move data off or onto a /cluster/storeN/ filesystem to somewhere else, you want to be logged into the file server of the source or destination for the copy so the data on that CPU only has to go from the disk to the CPU to be processed by the command and thus is a one-way trip via NFS to its destination (or from its source).

There are special NFS-only servers that do not allow logins. There are three different kinds, three different manufacturers:

/cluster/bluearc/ - 2Tb of space, mostly consumed since it is older quite reliable and works well with all three clusters
/san/sanvol1/scratch/ - 13Tb total space, 9.8T free space, relatively new, mostly works OK with everything, occasionally fails, is supposed to be best for the pk cluster although it is the same bandwidth distance from all three clusters
/panasas/store/ - 2.4Tb total space, 1.2Tb free space, I often forget this one, had bad experience with it in its earlier days, supposedly it is better these days. Appears to be available only to the kk and kki clusters.