+ Home Documents iRODS Testing iRODS Summary
Document Actions

iRODS Summary

A brief report on our experience with iRODS, as well as recommendations for PDS and possible futures with the system.

This document describes our experience to date with iRODS.

Overall Observations

The "i" Rule Oriented Data System, or iRODS, promises to be a next-generation middleware for large-scale, real-time digital archives. Developed by Data Intensive Cyber Environments Research and various collaborators, its rule-oriented approach promises to manage workflow, resources, services, and archive policies.

Although the number of grid-based data-centric systems are few, and the majority are still academic projects, iRODS does distinguish itself as a viable contender for the Planetary Data System (PDS). In particular:

  • Compilation and installation is dead simple. Other grid systems come with hefty setup guides and require almost arcane-level of operating system knowledge.
  • Configuration is surprisingly easy. A single five-line configuration file is all it takes. Contending systems give system administrators migraines as they puzzle out dozens of settings in multiple configuration files.
  • Running iRODS is immediate and obvious. iRODS comes with an entire array of executable commands (called "i-commands") that parallel the standard set of Unix commands. Other grid systems even fail to provide such executables, instead giving only a framework for developers to later create such programs.

iRODS also stands out by being based on a series of micro-services that are orchestrated together using the application of an adaptive rules engine. Rather than creating a high-level service such as "upload a file", it provides a series of fine-grained services that can achieve the goal of uploading a file. Such a goal may include authentication, selection of a target repository, identification of replica repositories, transfer of blocks in a multithreaded fashion, computation of a verification message digest, metadata population, post-processing for derived products, and so forth. The set of rules and services are flexible and may be updated in for domain-specific needs.

Performance of iRODS is in line with that of contending systems. As gigabyte-scale data sets become the norm for scientific research, grid system developers have realized the inherit shortcomings of TCP/IP and have developed adaptations. The most prevalent adaptations are:

  • Using a customized UDP/IP approach where datagrams are scattered/gathered at high speed.
  • Using multi-threaded, concurrent, parallel TCP/IP connections.

iRODS uses the latter adaptation.

Reliability and Performance

Of paramount concern to PDS is the rate at which data can be moved around between iRODS systems and the integrity of that data. The inventory of data sets within PDS range from the miniscule to the gargantuan, and our tests of iRODS have exercised a number of sizes.

In general, we've found that collections containing fewer large files transfer more quickly than those that contain many small files. The reason for this is simple: iRODS establishes a fresh transfer for each file sent. With many small files, the overhead of each transfer (file name, destination directory, computation of parallel transfer parameters, etc.) is large. Further, the benefits of parallel, multi-threaded TCP/IP connections are lost with small files. However a single large file has low overhead compared to the file size, and parallel TCP/IP connections do yield a boost in performance.

This leads us to make the following recommendation: Where possible, always send or receive an archive of data. It doesn't matter whether the archive is of the cpio, tar, or zip formats, and it doesn't matter whether the archive is compressed. Transfer a single file wherever possible.

Performance versus Conventional TCP/IP

In order to measure the performance of iRODS against conventional TCP/IP file transfer such as FTP or HTTP, we created a large randomized data file and sent and retrieved it from various locations in the US to a single iRODS server located at San Diego Supercomputing Center. This enables iRODS to leverage its multithreaded transfer capability to greatly accelerate the movement of data. (All tests were conducted with iRODS 1.1.)

This diagram shows the systems tested:

Speed Tests

Overall ThroughputThe following table (and gratuitous graph) summarizes the results:


Throughput (MiB/s)
System iRODS Conventional
Commercial data center 6.92 1.47
Engineering/Imaging node 9.74 7.58
Geo node 9.65 0.74
PPI node 11.16 9.24
Small Bodies node 8.00 0.68

iRODS clearly provides enormous benefit to systems where conventional file transfer performs poorly, and less benefit to systems closer to SDSC. (In the case of the Geo node, we believe a packet-shaping firewall causes conventional transfers to suffer high latency. For the Small Bodies node, the number of network hops to SDSC may be quite large.)

Bear in mind that these tests are for a single large file. PDS data sets, unless archived together into a large file, won't feel such transfer benefits.

Transferring PDS Data Sets

Sending an unbundled collection (directory hierarchy) of many differently-sized will be slower than sending single large archive. However, for the sake of completion, we ran tests transferring collections from a variety of PDS nodes using iRODS. These tests ran iRODS at 1, 4, and 8 parallel threads (with 1 thread being equivalent to doing a recursive FTP). The following map shows the nodes involved in the test:

Dataset Tests

Note: Because these tests take quite a bit of time to run, the results presented below are not yet complete.

Unbundled DatasetsThe following table (and chart) summarizes the transfers of unbundled datasets from various PDS nodes to SDSC:


Transfer Rate (MiB/s)
Transfer 1 Thread 4 Threads 8 Threads
Atmos 1G0.0740.120.053
Atmos 10G
Atmos 20G
Atmos 1T
Eng 1G1.141.961.76
Eng 10G
Eng 20G
Eng 1T
PPI 1G2.161.911.91
PPI 10G1.741.771.35
PPI 20G
PPI 1T
Rings 1G0.760.590.29
Rings 10G
Rings 20G
Rings 1T
Small Bodies 1G0.460.450.56
Small Bodies 10G
Small Bodies 20G
Small Bodies 1T

As can be seen, multithreading has only no effect on an unbundled dataset. In fact, we're questioning why we must even continue running such tests. Both conceptually and from the above evidence, the transfer of unbundled datasets is utterly pointless. Many small files will receive no benefit from multithreaded parallel transfers, and much of the communication overhead is spent setting up and tearing down each file's individual transfer. It is far superior to bundle a dataset into an archive and send that instead. However, we will continue to gather this data as we have been instructed.

Bundled v Unbundled TransfersWhen creating a single archive file of all of the files and directories that comprise a PDS dataset and sending that, iRODS improves performance enormously. The following table (and gratuitous graph) summarize the results.

Time (s) Rate (MiB/s)
NodeSize (bytes)UnbundledBundledUnbundledBundledImprovement
PPI962,007,040385.1488.412.3810.38335.63%
PPI10,581,995,5204297.46941.192.3510.72356.60%
Small Bodies1,008,431,1041929.54233.910.504.11724.91%
Rings11,543,992,32023127.246608.520.481.67249.96%
Atmospheres9,619,989,50484164.228327.100.111.10910.73%

On each node we sent an unbundled (many files in a directory tree) dataset to SDSC. We then created a TAR-format file archive of the dataset and sent it to SDSC. In both cases we let iRODS choose a number of concurrent threads for data transfer. Because of the boost in transfer performance, we recommend bundled transfers wherever possible. Note that the iRODS server does not yet support automatic unbundling upon receipt of an archive. That feature is expected in iRODS 1.2.

Reliability

The reliability of iRODS is contingent upon its ability to:

  • Transfer data correctly: no incorrect bytes are saved or retrieved
  • Transfer data successfully: iRODS must recover from network outages and hardware failures

iRODS includes built-in integrity checking for data transfers. iRODS can compute a message digest (a hash function) of data after its ingested, and the value of this hash function can be compared to the source data's value as well as between replicas of the data. iRODS uses the Message-Digest Algorithm 5 (MD5) cryptographic hash function.

In addition, iRODS includes a recovery mechanism for interrupted data transfers. During the transfer of data, iRODS client systems may save an auxiliary file containing the current state of the data transfer. Should the network connection be interrupted or a power failure occur, data transfers may be resumed using this auxiliary file.

All of our tests have made use of the auxiliary file in order to recover from failures. However, during our tests, none of the sites experienced failures except for the Geo node. We suspect that a traffic-management firewall at the Geo node is aggressively closing connections it believes to be either idle or using too much bandwidth. iRODS transfer logs would show periodic interruptions at the Geo node. However, the recovery mechanism of iRODS worked around these issues easily and without operator intervention.

As for data transfer integrity, the following table summarizes the results of our findings:


Transfer Integrity
Location iRODS Conventional
Atmospheres node 100% 100%
Commercial data center 100% 100%
Engineering/Imaging Node 95% 95%
Geo node 100% 100%
PPI node 100% 100%
Small Bodies node 100% 100%

Neither iRODS nor conventional file transfer suffered from any data corruption, except for systems at the Jet Propulsion Laboratory. Here, both plain FTP or HTTP as well as iRODS will occasionally lose data integrity. We believe JPL's border connection to the public internet to be at fault.

Futures

Transferring a bundled archive of a collection is far more efficient than transferring the unbundled collection. However, researchers often need a subset of the collection once unpacked, therefore making the retrieval of large archives for only a few small files a waste of bandwidth. The developers of iRODS have foreseen this, and are currently developing a bundling and unbundling capability.

This capability enables a curating PDS node to efficiently send a single large archive file. Once it arrives at SDSC, the file may be unbundled automatically into its collective directory hierarchy. Other consumers of the data may then download subsets of the files they require. Furthermore, such subsets may be converted into bundled archives at SDSC and then transferred to a consumer more efficiently as a single large file. These capabilities will be available in iRODS version 1.2.

iRODS also supports multiple replicas of data, enabling data to be automatically mirrored. This improves resilience in the face of failure of a particular node. It may also prove beneficial for data transfers by enabling a client to communicate with a closer replica than with a more distant one. PDS may wish to investigate implementing multiple replicas.