I was having this terrible problem that prevented use of my home server. I was starting to regret many things like buying a refurbished Worskstation, the refurbished memory DIMMs or that fact that I chose Centos7 over Ubuntu which I use for development.
A careful and determined Google Searchathon revealed the issue. My reason to persist was that other than disk IO causing feezing there was no issue with the system. I needed to RSync all my backups from the servers locally. Backups cost me over 500US$ per year. I wanted to save some of that money by atleast getting rid of server snapshots that cost 240US$ per year.
Ok back o the problem:
“CentOS or any distro appears to freeze and become completely urecoverable except for hard reset when heavy Disk I/O task is performed for example like copying files over CIFS, RSync or or any heavy read writes.”
I don’t even think it was massive I/O as many blog posts suggests that would cause this. Just about anything would cause it. Pretty stupid if you ask me that after 6 years the problem is still packaged and shipped to everyone.
The search led to a solution from 2010! This was a blog post and a Stack overflow post linked to blog post.
Gist is, it’s your IO Scheduler, the default one is [cfq] more on this below.
I dont immediately modify my server just because of one ServerFault.com answer or Blog entry so I checked the official documentaion at Redhat https://access.redhat.com/solutions/5427 It does appear to be the case that my system is using [cfq] as well.
I switched to deadline as noop is basically no scheduling, useful only inside VM and Containers.
This is the blog post for reference:
Change your IO Scheduler from CFQ to something else.
Check which scheduler is used by disk. LVM or not is not relevant here. Use
pvscan to find your disk labels.
$ cat /sys/block/sda/queue/scheduler noop anticipatory deadline [cfq]
$echo deadline > /sys/block/sda/queue/scheduler
Above command applies it immediately. The settings are gone on reboot.
Do this for any or all drives. To persist across reboots refer to Redhat documentation linked above. You can use /etc/rc.local as well.