Google Cloud Platform vs HPC
Recently, I have enjoyed using Google Cloud Platform (GCP) quite a bit. I have also been doing high throughput computing for the last 5 years. Here I summarize the major difference between the Google Cloud Platform and a typical HPC environment. In my opinion, Google Cloud Platform is much better and more flexible to use regardless of cost. I don’t have a concrete idea on how much HPC setup and maintainence costs, but GCP to me is
1;95;0cnot cheap.
GCP | In-house HPC | |
---|---|---|
Storage | A varietyof kinds for different needs, e.g. Google Cloud Storage, Google Cloud SQL, Google datastore, Google Block storage, etc. | GPFS |
Node/Instance | A variety of CPU, memory sizes, or even customized | Fixed |
Network | Not sure, ~180ms latency based on a 2013 test | InfiniBand |
Queuing time | No | Could be long |
Operating system | A variety of OS, including standard linux distros, so software installation is relatively standard and easy | Not necessarily standard linux, could be specialized, no root access, software installation can be tricky |
Docker Support | ✔ | uncommon |
Horizontal scale | You can secure thousands of CPUs easily | Depending on the actual usage policy, usually it’s not as flexible |
On-demand | Yes, a cluster can be torn down easily after the analysis is done | No, the cluster is always there, busy or idle |
Cluster type | Besides a normal setup of a collection of nodes, you could also set up Hadoop, Spark type of cluster, Google Cloud Dataproc, which will be useful for using frameworks like ADAM in the future | Hadoop or Spark cluster setup is still not very common |