Debugging a WordPress Site (500 error)
Issue Summary
On October 02 , 2020, at approximately 03:27 Pacific Standard Time (PST) until 05:32 (PST), an outage occurred on an isolated Ubuntu 14.04.5 LTS container running an Apache web server. When running GET requests on the server, that led to a500 Internal Server Error
, when the expected response was an HTML file defining a Holberton WordPress site,
All Users where affected by this outage during two hours approximately. The cause of this issue was a typographical error in the /var/www/html/wp-settings.php file showed in the following line. Where the extension phpp should it be php instead
require_once(ABSPATH . WPINC . '/class-wp-locale.phpp' );
Timeline (PST)
- 03:27 Apache is returning a
500 Internal Server Error
and was detected by a staff engineer when tried to curl 127.0.0.1 - 03:30 The Engineer report the Issue and begging the debugging process
- 03:40 The Engineer begging searching the history, and doesn’t find anything strange
- 03:47 Then he checked the error.log and the access.log files located at the folder /var/log/apache2/, unfortunately there are not error reported
- 04:03 The engineer Restart the server to see if it responds ok and everything seems great
- 04:05 Next he checked the permissions on the configuration file in /etc/apache2/apache2.conf, and they seem normal
- 04:13 Looked in the
sites-available
folder of the/etc/apache2/
directory. Determined that the web server was serving content located in/var/www/html/
. - 04:20 The engineer checked running processes using
ps auxf
. Twoapache2
processes -root
andwww-data
- were properly running. - 04: 30 The Engineer decide to use
strace.
In one terminal, ranstrace
on the PID of theroot
Apache process. In another, curled the server. Butstrace
did not give a useful information. - 04:40 He again run
strace
again, but this time he decided to try a child process of apache2 using the PID of thewww-data
process. And checked the result ofstrace.
- 05:05 And great news, finally some useful result,
strace
reveled an-1 ENOENT (No such file or directory)
error occurring upon an attempt to access the file/var/www/html/wp-includes/class-wp-locale.phpp
lstat("/var/www/html/wp-includes/class-wp-locale.phpp", 0x7ffc9cd01270) = -1 ENOENT (No such file or directory)
- 05:10 He starts a search to find out which file contains the typo error running
grep -r phpp /var/www/html/
with great result he has two files that contain the phpp word, and just one with the information he was looking for, the filewp-settings.php
contains the error
wp-settings.php:require_once( ABSPATH . WPINC . '/class-wp-locale.phpp' );
- 05:26 The engineer fix the error deleting the extra p from the line of the file
wp-settings.php
- 05: 32 Tested another
curl
on the server. and Great News a 200 response!
Root cause:
The root cause was a typographical error, the WordPress app was encountering a critical error in wp-settings.php
when trying to load the file class-wp-locale.phpp
. The correct file name, located in the wp-content
directory of the application folder, was class-wp-locale.php
. This let the server to send a 500 error and in this case meaning it was not finding the file because of the wrong extension, the file did not exist.
Resolution
Patch involved a simple fix on the typo, removing the trailing p
.
Prevention
This outage was not a web server error, instead I was an application error, In order to prevent such outages moving forward, keep the following in mind.
- Test the application before deploying. This error would have arisen and could have been addressed earlier if the app has been tested before deploying it.
- Uptime monitoring. Have a monitoring tool like Uptime Robot or Sumo Logic for example, should be attached to each of the web-servers to alert instantly upon outage of the website.. Uptime monitoring software send requests to different endpoints on our system and verifies that they are all returning a status code 200.