Some classes interact with a database via a connection object. How to test? You could mock the connection and set expectations something like this:
<?php
class MyTestCase extends UnitTestCase {
function testSomething() {
...
...
$connection->expectOnce(
'query',
array('SELECT foo FROM bar'));
...
...
}
}
However, this makes tests very sensitive. The same query in lower case: "select foo from bar" would fail despite being valid sql with exactly the same meaning. In a more complex query, different orders of "AND" clauses would cause similar problems. And so on. Simple string comparisons over-constrain possible solutions and hence don't work very well as a means of specifying sql queries.
What we really need is a new expectation class. A nice feature of SimpleTest is the ability to throw in an expectation object in place of a parameter:
<?php
class MyTestCase extends UnitTestCase {
function testSomething() {
...
...
$connection->expectOnce(
'query',
array(new SqlExpectation('select foo from bar')));
...
...
}
}
However, SqlExpectation would need to be clever enough to know when different strings are the same sql query and, frankly, that's not something I would ever want to write, or ever expect to be wholly bug free.
A better option is to forget about mocks. Instead, use a real connection object and a real database. The dbms has a query parser all ready to go.
Databases, tables and data can easily be set up and torn down for each test. If you keep "show create table.." dumps in version control and use these to build fixtures, you've got the added advantage of verifying that the sql statements defined by the tested class are valid in the context of the target database. Although "select foo from bar" might be valid sql it will fail if there is no "bar" table. Or perhaps the real database has a tinytext field which is just too tiny and data is being lost. Mocked connections and simple string expectations cannot detect these kinds of problems.
Note that code written for database fixture set up is often useful in installer scripts.
Watch out for connection state: affected rows, current error, currently used database etc. If the connection used for fixture set up is also passed to the tested class, connection state left over from fixture queries could contaminate the test. Similarly, a series of tests (ie new test methods) which all use the same connection object can contaminate each other. In practice, the risk is low but anything to do with data storage probably ought to go the extra mile.
Doing it this way means that you could end up with hundreds of connections on the go in a large test suite and these should always be closed when you're finished with them. If you don't, you might crash the db server.
What sort of data should you test with? A unit test ought to make a specific point and usually you only need some minimal data to do that. For example, you could try saving an "O'Reilly" value to see if sql strings are being escaped. A micro fixture might create just one table out of dozens in the production database and just a couple of rows of data rather than tens of thousands. All you need to make a point.
Real data is more complex and sometimes this can reveal requirements which you have failed to assert. However it's unpredictably complex — it might not contain an "O'Reilly" at all. Therefore I think it's best to use your own sample data in unit tests ie small, simple fixtures finely tuned to express a specific requirement. If, occasionally, you do run some real data past the tests, and find something you've missed, nail it down with a new, minimal unit test.
Testing with a real database involves some furious server activity. Databases, tables and data are created and destroyed in setUp() and tearDown() hundreds of times per second. The word "destroyed" should get your careful attention. If there are other projects on the server a simple mistake with a database name could lead to the wrong item being dropped at tear down. That may not be so bad on a local development box (you do backup regularly..?) but tests may also be run on a production server where real, live data is at risk. And your job.
Since the tested code could be malfunctioning in almost any way imaginable, ideally some kind of sandbox should be set up on the db server to protect non-fixture data. One way to do this is to create a new connection class which decorates the standard one with some authorisation checks. There is one simple rule: deny any query which attempts to manipulate something which was not created by the test case. Read-only access doesn't need to be restricted. This is one of the ideas behind DataAccessTestCase.
Once you have some authorisation code set up to monitor creates and drops, it's fairly easy to add an automatic fixture clear up feature — a nice bonus which almost comes for free.