Bitbucket will be unavailable for up to one hour starting Sunday, August 21st, 18:00 GMT. During this maintenance window we will:
- Replace our core switches and upgrade our uplink Internet connections from dual 100 Mbps to dual 1 Gbps.
- Upgrade kernels and apply performance improvements to our database servers.
The kernel upgrades and reboots will address two issues:
- One of our servers randomly rebooted recently. After working with Red Hat to analyze a core dump we concluded it was a result of a divide by zero bug in the kernel.
- While performing routine security audits and stress tests of our infrastructure we found network performance degradation on servers with Broadcom network drivers and Intel c-states enabled.
For more information on the c-state problem we recommend posts from Citrix and Dell’s Linux mailing list.
We found that disabling c-states in the BIOS wasn’t enough – as discussed on the Dell mailing list, we used intel_idle.max_cstate=2 as a kernel boot option, however this did not resolve our performance issues. After a bit more investigation we concluded passing intel_idle.max_cstate=0 to the kernel fixes the problem.
We’ve already applied that change and upgraded kernels on all of our boxes except our database servers. A relatively quick reboot this weekend will put these issues behind us once and for all.