A PHP and SQL Server project is migrated from Windows to Linux (tested on Debian and Ubuntu). Records that used to show on the Windows server are now invisible. Mysteriously, the "count" function still returns the correct count.
Here is the minimum setup to reproduce the issue. First, the data:
rec_id | rec_name |
35 | Hello |
62 | Straße |
70 | Õlu |
On Windows, we connect to SQL Server through sqlsrv_connect:
$config = array(
'UID'=>'SA',
'PWD'=>'ABC123'
);
$db = sqlsrv_connect('localhost', $config);
Then we count the records:
$query = "select rec_id from recs";
$rs = sqlsrv_query($db, $query, array());
$count = sqlsrv_num_rows($rs);
The $count is 3 in both Windows and Linux.
Next we list out the records:
while ($myrow = sqlsrv_fetch_array($rs, SQLSRV_FETCH_ASSOC)){
echo $myrow['rec_id']."\r\n";
}
Run the code in the command line. Not surprisingly, both Windows and Linux prints out the following:
35
62
70
What happens next, however, is baffling. Let's select both the ID and Name fields:
$query = "select rec_id, rec_name from recs";
In Windows:
35 | Hello |
62 | Straße |
70 | Õlu |
In Linux:
35 | Hello |
Although it is easy to spot out the special characters in the two missing rows and suspect UTF-8 encoding to be part of the issue, it's difficult to pinpoint *where* the data got "lost". Take a look at the ß character - if this were a storage issue, the PHP side should still get either an over encoded "ß", or an under encoded "?". The records should not simply disappear.
Running sqlcmd we see that the records are not missing. This has to be on the PHP end. Let's revisit the connection setup, and add a default CharacterSet setting:
$config = array(
'UID'=>'SA',
'PWD'=>'ABC123',
'CharacterSet'=>'SQLSRV_ENC_CHAR'
);
$db = sqlsrv_connect('localhost', $config);
The above code behaves the same. Though we can now discuss the default value of SQLSRV_ENC_CHAR. PHP's official documentation says that the code page of the Windows locale is used, and any multi-byte characters are substituted with a single-byte question mark (?) character.
Well, this is simply not true. The "code page of the Windows locale" is not defined in Linux, and when SQL Server gets the directives of "send data however you want" from SQLSRV_ENC_CHAR, it returned an encoding that caused severe allergic reaction on PHP's side.
The one line solution (highlighted in red):
$config = array(
'UID'=>'SA',
'PWD'=>'ABC123',
'CharacterSet'=>'UTF8'
);
$db = sqlsrv_connect('localhost', $config);
It's worth pondering why this issue is not talked about elsewhere, given its simple fix. If the web app runs solely in Linux, whichever code page php-sqlsrv and SQL Server ends up exchanging data with would be functional. In fact, running SQL Server on Windows or Linux would not make a difference. The difference is in interpretation, i.e. the client side. If a project is deployed on a Windows stack (SQL Server, IIS, Windows), it's likely that every other component is also immersed in the Windows ecosystem. Switching the web layer from Windows to Linux is an uncommon maneuver, and therefore the issue with the default code page setting went undetected.
In light of this discover, we have patched the sqlsrv connector in Gyroscope's connector collection. The same connector in SQL Dash is also patched.