The first Genome cluster was a compute farm in the Beowulf tradition (2001); i.e., built with commodity hardware and freely available software. Growing to include more than 60 CPU's, Genome served both as a testbed for CGRB developers and system administrators and as an increasingly powerful tool for OSU faculty, staff and students. Learning from setbacks and capitalizing on successes, Genome has steadily grown to include more than 3400 processors. The new version of the Genome is currently has none of the original commodity hardware left. Space and BTU constraints motivated the migration to rack-mount multi-processor nodes and the need to address large amounts of RAM instigated the move to 64-bit hardware. Right now, about 90% of Genome is composed of rack-mount multi-processor nodes that contain 2, 4, 8, 16, 24, 32, 40, 48, 64, 80 or 120 AMD multi-core Opteron and Intel Xeon multi-core based processors.
The Center for Genome Research and Biocomputing maintains an extensive and well-managed infrastructure consisting of a distributed service architecture, a greater than 4000-processor computer cluster and a secure private 1G/10G/40G network (see figure 1). Each machine has internal hard drive disk space, but is also connected to 3.5PB of NFS shared disk space. The CGRB encourages high-volume users to contribute to the computational infrastructure. Users are charged $52, $96, and $141 per/month for maintaining each processing machine, web/database server, and file server, respectively. The nodes are provided at the highest priority to the specific project and provided upon request at a lower priority to other CGRB researchers. Thus, subsets of cluster nodes are dedicated to specific research projects, but they function as part of a unified cluster when needed for intensive jobs. This priority-based scheduling has proven quite successful, both in terms of end-user satisfaction and in the execution of systems administration activities. Computational requirements are constantly re-evaluated and new hardware is integrated as needed.
CGRB Biocomputing Group:
The CGRB maintains an active biocomputing group consisting of full-time staff. Chris Sullivan serves as the CGRB systems administrator and has extensive expertise in all aspects of hardware and infrastructure design and maintenance. Chris also has experience in software development and bioinformatics applications. Matthew Peterson maintains CGRB data flow from the Core Lab. Dr. Shawn O’Neill, in addition to participating in research projects, also serves as a bioinformatics trainer. Dr. O’Neill offers training courses in computer programming, runs topical workshops, facilitates peer-to-peer training, and runs the OSU bioinformatics users group. Dr. Kronmiller serves as a bioinformatics consultant. The efforts of the CGRB biocomputing group are directed towards facilitating biological research through the use of computational tools. Projects span from CGRB Core Labs ordering and data management software to developing tools and procedures for analysis and distribution of high-throughput DNA sequencer data. The CGRB computational groups continually works to lower the activation energy needed by researchers to take advantage of the computing resources and creates new tools to help this process. Finally, the CGRB bioinformatics group actively foster collaborative projects with University faculty associated with the Center.
Recent publications include:
Dell Force10 s6000 with 32 ports of 40Gbps network connections.
Multiple Dell Force10 s4810 with 48 ports of 10Gbps network connections and 4 ports of 40Gbps.
Multiple Dell Force10 s55 with 48 ports of 1Gbps and 4 ports of 10Gbps.
CGRB Server Room
Secure and climate-controlled
Liebert Challenger 3000 5-ton HVAC
Liebert DS 12-ton HVAC
Supported by building generator emergency power supply
PowerWare 80,000 KVA Uninterruptible Power Supply (UPS)
Figure 1. Overview of CGRB computational infrastructure. The campus network is public, whereas the CGRB maintains a completely secure private network. Individual machines support specific duties within the infrastructure; for example, dedicated MySQL servers and a dedicated web server. Specific subsets of nodes within the Genome cluster are dedicated to specific projects; for example, the RNA and Fungi sub-clusters. Note that the Private Network Switch represents multiple physical units.