Architecture of High Performance Computing Server at BIT Mesra

High Performance Computing Server was installed a few years back. It was a replacement for PARAM 10000, the Supercomputer which is no longer present for use. Initially, the HPC was under the Department of Computer Science. The Department of Chemical Engineering and Biotechnology were the primary users of HPC (mostly for simulation purposes) and so, the administration decided to move it under Central Instrumentation Facility. You need permission from CIF to access the HPC. HPC is only available for research purposes and you need to provide a good reason along with proper recommendation from a professor to get access to HPC.

HPC is at least 20 times more powerful than the most powerful PC that anyone has in the campus. Also, I recently checked the usage and realized that not even 10% of its power is being used. I hope this blog post will help you in understanding the core architecture of HPC.

Architecture


Architecture of High Performance Computing Server

The Computers

HPC is a clustered computer network. There is a master node which is connected to LAN network of the college and accessible via any computer connected to LAN or Wifi. Master node is only accessible via SSH (Secure Shell). All other interfaces (like SPI, I2C or Telnet) are closed. The master node is accessible at 172.16.23.1. There are 17 computation nodes connected to master node in a star network topology. The first 16 nodes are CPU based compute nodes and the 17th node is GPU based compute node. All compute nodes have the same configuration and that is why HPC is more like a cluster than a distribution. The compute nodes is where all the code execution takes place and master node is the one which distributes the tasks among the compute nodes. Apart from these there is a single storage node. The advantage of this single storage node is that, a file stored on master node is available on all the  compute node.

The Master Node

CPU - 2 x (Intel Xeon E5-2630)
Cores - 8 per CPU
Hyperthreading - Disabled (I don't know if it can be enabled through software)
Virtualization - Avaliable
Total CPUs - 16
Clock speed - 2.4 Ghz
Memory - 64 GB
Internal HDD - Total size - 2 TB
                           Partition 1 - /dev/sde3 ~ 1 TB mounted at /.
                           Partition 2 - /dev/sdf1 ~500 GB mounted at /apps.
                           Partition 3 - /dev/sdf2 ~450 GB mounted at /scratch.
                           Partition 4 - /dev/sde1 ~500 MB mounted at /boot.
Operating System - CentOS 6.5 release
Domain Name - csehpc.bitmesra.ac.in
IP - External - 172.16.23.1 | Internal Primary - 192.168.10.1 | Internal Secondary - 10.10.1.1 

The operating system is an old version of CentOS (CentOS is a linux based OS closely related to Red Hat Enterprise Linux. In case you are familiar with RHEL, then CentOS should be easy). Master node is quite powerful in itself. It is the only node which has access to Internet (without Cyberoam login). The downlink speed is approximately 10 MB/s. The master node is the only node which is reachable from external network (college LAN). All other compute nodes are reachable via master node.

Master node Configuration


The Compute Nodes

CPU - 2 x (Intel Processor.) (The data on college's website is incorrect. Compute nodes have slower processor than Master node.)
Cores - 8 per CPU
Hyperthreading - Disabled (I don't know if it can be enabled through software)
Virtualization - Avaliable
Total CPUs - 16
Clock speed - 1.2 Ghz
GPU - Nvidia Tesla K20m. (Only available on GPU node.)
Memory - 64 GB
Internal HDD - Total size - 500 GB
                           Partition 1 - /dev/sda3 ~415 GB mounted at /.
                           Partition 2 - /dev/sda1 ~500 MB mounted at /boot.
Operating System - CentOS 6.5 release
Domain Name - csehpc-n[x].bitmesra.ac.in where [x] is the node number from 2-18 (Total 17 nodes).
IP - External - Unreachable | Internal Primary - 192.168.10.[x] | Internal Secondary - 10.10.1.[x] 

All compute nodes have the same architecture. They can be used for parallel processing. The compute nodes are unreachable from external network. You have to use the master node to access all compute nodes. Also, none of the compute node have access to Internet (they can't even resolve a domain name).

Compute nodes Configuration


The Storage Node

The storage node has total capacity of 48 TB. Out of this 21 TB is unavailable for use as it is used for cloud storage (I think it is reserved for professors). The remaining storage is mounted on /home which is available across all the nodes (master + compute). This means that any file that is on /home folder of any one node would be accessible at the same location on other nodes. How is this helpful? Since the compute nodes don't have Internet access, you can download a file on the master node and that file would be available across all compute node.

The Network

All the nodes are connected via Infiniband switch. Infiniband switch are specially designed for HPC servers for low latency and high throughput. There are 2 internal network in HPC:

1. Primary network: 
  • Gateway: 192.168.10.0
  • Nodes: 192.168.10.1 - 192.168.10.18
  • Domain: csehpc.bitmesra.ac.in
2. Secondary (Backup) network:
  • Gateway: 10.10.1.0
  • Nodes: 10.10.1.1 - 10.10.1.18
  • Domain: icsehpc

Network Configuration

Conclusion

That's all in the architecture of the HPC. I hope that the architecture fascinated you. HPC is very powerful if you know how to make use of it. In the next article, I will help you set up a Machine Learning environment on a single compute node.


Comments

  1. Going by this https://serverfault.com/questions/913545/enabling-hyperthreading-on-a-centos-7-3-server-using-intel-xeon-e5620 , I think you have to enable HTT in BIOS. Since reboot and BIOS entering requires sudo privileges, you have to approach the admins. Great article BTW !! Ek baar aurora host karne ka try kar na HPC pe !!

    ReplyDelete

Post a Comment

Popular posts from this blog

DDoS Attack on Bitotsav '19 Website

Setting up Machine Learning environment on High Performance Computing Server