David Négrier CTO

Guys,

I write this blog article because I want you to know fully who I am. There is a big part of me I kept mostly hidden and ashamed of.

...

I'm attracted to active records :) .

I know that might sound like a shock to you, the straight PHP developers. But not everybody is attracted to data mappers and everybody has a kink :)

Does it mean I'm shying away from the data mapper design pattern? No of course. Data-mappers are very appealing and I love using Doctrine too. I'm just bi-curious.

How long have I known? For me, it started in 2006, with my first relationship (database). Most of the code I wrote was boilerplate code to access each database column so I developed my first ORM. At that time, there was few ORMs in PHP, no packages, no Composer, no PSR-4. The best you could do was to ship a huge PHP file with a bunch of classes and copy/paste it from project to project. This is how TDBM was born.

Working on a bunch of projects, TDBM grew better and better. And when solid ORM frameworks were released (like Propel or Doctrine), instead of switching to those, I kept improving TDBM with my pals at TheCodingMachine. We tested, refactored, tested again, added unit tests, integration tests...

Internally, we have been using it on hundreds of projects. It went through several complete rewrites, and 5 major revisions... and still, it lived in the shadows.

But no more! TDBM is wonderful and we are going to share it.

Updated on May 14th: replaced "pivot table" wording with "association table".

TDBM 5, the basics

So you read countless articles on best practices and you have been told that your domain model does not match your database schema.

😮😮😮 But what if it did? 😮😮😮

Or what if we could write a tool that was clever enough to "understand" simple to moderately complex database models and that could map them to PHP objects.

This is the challenge we fixed to ourselves when we started TDBM... and we went quite far.

A database first ORM

TDBM understands your database model... so you naturally start with your database.

Does it work?

Of course it works! But it is making a strong assumption: your database model and your object model are closely related. Depending on your project, this can be true or false. Here, at TheCodingMachine, we have found this assertion to be mostly true for small to medium projects and often false for very large projects. So we use TDBM on most of our projects, and for the big ones, we use Doctrine ORM.

So TDBM is generating 2 kinds of classes:

  • Models (we call them "beans") that are objects mapping a column in your database (a model is like an entity in Doctrine)
  • DAOs that are objects containing methods to access a given table (a DAO is like a repository in Doctrine)
// DAOs are used to query the database
$user = $userDao->getById(42);

// Models have getters and setters for each column
$login = $user->getLogin();

Because the PHP classes are generated (we don't use magic methods), your favorite IDE (PHPStorm, Eclipse PDT, Netbeans...) can provide autocompletion for you. This is a super improvement over other ORMs like Eloquent.

Generated code uses the PHP 7.1 type-hinting system to provide the maximum level of type safety possible in PHP.

Furthermore, all non-null columns in your database MUST be passed in the model's constructor. So you cannot create an object that is impossible to save in database.

// If columns username and email are not nullable,
// the constructor will require those.
$user = new User('foo', 'foo@bar.com');

See? Just by analysing the data model, we can already know what deserves to be in the constructor or not.

But this goes way further.

Navigating the database model

TDBM analyses the foreign keys of your database model.

From those keys, TDBM can sort out the relationship between your objects and offer powerful getters / setter.

// Let's get the name of the country of the current user
$user->getCountry()->getLabel();

// Let's get all users in a given country
$users = $country->getUsers();
This encourages TDBM users to build foreign keys in their database model, so this is actually promoting good database design practices.

But it is going even further. TDBM can detect your association tables.

Internally, TDBM thinks:

Boy, the users_roles table is having only 2 columns, with 2 foreign keys... this must be an association table!

When TDBM detects an association table, it will not generate any model or DAO for that table. Instead, you get a direct relationship between the 2 models:

// Let's directly access the list of roles from a user
$roles = $user->getRoles();

// Let's set all the roles at once
$user->setRoles($roles);

// Or one by one
$user->addRole($role);

$user->removeRole($role);

If you are used to Doctrine, you will notice there is no need to define an "owning" side. Relationships work both ways!

Giving meaning to indexes

TDBM does not only look at foreign keys. It also closely analyses your indexes. Indexes are very valuable because they tell TDBM which columns are likely to be used for searching.

Let's imagine your users table has an index on the login column:

CREATE UNIQUE INDEX users_login_idx ON users (login); 

TDBM looks at this index and it thinks:

Man, the login column is indexed. I'm pretty sure the user will try to find a user by his/her login! Also, I'm smart enough to notice this is a "unique" index, so the return of this method will probably be a User model.

And so, in the UserDao, TDBM will automatically add a nice findOneByLogin method:

$user = $userDao->findOneByLogin('foobar');
Once again, this encourages good database design practices. Creating an index means having to write less code for the developer. So this promotes better performances while saving developer's time. It's a win-win!

Also, remember the findOneByLogin method is actually generated. It does not come from magic methods (like __call that is extensively used in Doctrine ORM repository or in Eloquent). Therefore, you get the benefit of auto-completion in your IDE.

Writing queries without all the hassle

Of course, there are times where you have to write complex queries.

Existing ORMs have 2 existing strategies here.

  • Either you use a query builder (as in Eloquent)
    DB::table('users')
      ->leftJoin('posts', 'users.id', '=', 'posts.user_id')
      ->get();
  • or you use a query language (like HQL with Hibernate in Java or DQL with Doctrine ORM).
    SELECT u, a FROM User u JOIN u.address a WHERE a.city = 'Berlin'

    (actually, Doctrine also comes with a query builder for DQL)

TDBM comes with none of those.

Personally, I do understand that custom query languages (like DQL) are very powerful, because you query the object model instead of querying the database model. Yet, this means learning yet another language, with all its quirks and complexities. If one day, you could understand DQL without knowing SQL, I would probably look forward to those languages. But in my experience, you have to be an expert in both SQL and DQL to understand how to use DQL properly (especially when you start working with custom functions).

Ok, so we implemented neither query builders nor DQL... what are we doing?

We do SQL! Or actually, we simply require the "WHERE" clause of your SQL query.

First of all, it's important to understand that TDBM encourages you to put all the query related code in DAOs. So the SQL is clearly scoped in the DAOs and should be nowhere else in your code. And since all the DAOs are generated for you, you have no reason not to use them.

class UserDao extends AbstractUserDao {
    /**
     * Returns the list of users starting with $firstLetters
     *
     * @return User[]
     */
    public function findUsersByLetter(string $firstLetters) {
        // The find can be used to retrieve a list of User
        // It takes in parameter a SQL filter string and a list of parameters.
        return $this->find("name LIKE :name", [ "name" => $firstLetter.'%' ]);
    }
}

We are in the UserDao. Therefore, we require objects from the "users" table. Did you notice the filter?

The query above will generate this SQL query:

SELECT users.* FROM users WHERE users.name LIKE :name
TDBM will always select all the fields from your object. In the example above, TDBM will retrieve all fields from the User model (SELECT users.*). There is no way to tell TDBM: "hey, give me only the name and the login". There is a good reason for that. TDBM is an ORM (Object Relational Mapper). It maps tables to objects. An object is supposed to be self contained and to have a valid state at any point in time. If TDBM could build objects with half the rows, the state of your objects could be invalid.

In practice, we have never felt a need for fetching only a part of the columns (more later about the optimizations we have). If you have a website with very very high load and you find those partial queries to be important, you might want to have a look at Eloquent instead (or do pure SQL).
Also, now is a good time to speak about the way code is generated. Did you notice that UserDao extends from AbstractUserDao? When you generate code in TDBM, it will split the code in 2 classes.
  • UserDao is your class. You are free to do whatever you want in your class. TDBM will generate it only once.
  • AbstractUserDao is owned by TDBM. You are not allowed to touch it as it will be regenerated each time you decide to change you schema.

So you write SQL to query the database. Since TDBM is a "database first" ORM, there is nothing shocking here.

But wait, this is getting more interesting...

Discarding unused parameters with Magic-Parameters

... or how we replaced the query builder by something totally different.

So traditional ORMs use query builders to dynamically add search filters in queries.

Let's say you want to run this query, with a bunch of filters that can be added, or not:

SELECT * FROM users 
WHERE (name LIKE :search OR login LIKE :search) 
    AND country_id = :country_id 
    AND status = :status

In Doctrine:

In Doctrine, you would typically build the query like this:

$qb = $em->createQueryBuilder();
$qb->select('u')
   ->from('User', 'u');
if ($search !== null) {
    $qb->andWhere('u.name LIKE :search OR login LIKE :search')
       ->setParameter('search', '%'.$search.'%');
}
if ($countryId !== null) {
  $qb->andWhere('u.country_id = :countryId')
     ->setParameter('countryId', $countryId);
}
if ($status !== null) {
  $qb->andWhere('u.status = :status')
     ->setParameter('status', $status);
}

In Eloquent:

In Eloquent, you would typically build the query like this:

$users = DB::table('users')
    ->when($search !== null, function ($query) use ($search) {
        return $query->where('name', 'like', '%'.$search.'%')
            ->orWhere('login', '%'.$search.'%')
    })
    ->when($countryId !== null, function ($query) use ($countryId) {
        return $query->where('country_id', $countryId);
    })->when($status !== null, function ($query) use ($status) {
        return $query->where('status', $status);
    })->get();

In TDBM:

// The find method can be used to retrieve a list of User (because it is part of the UserDao)
// It takes in parameter a SQL filter string and a list of parameters.
$users = $this->find("(name LIKE :search OR login LIKE :search) 
        AND country_id = :country_id 
        AND status = :status", 
    [ 
        'search' => $search ? '%'.$search.'%' : null,
        'country_id' => $country_id,
        'status' => $status  
    ]);

See? Pure SQL. No query builder, no conditions.

No conditions??? TDBM has this absolute kick-ass feature called "MagicParameters". It will automatically strip from the WHERE clause any parameter that is set to null.

Behind the scene, TDBM:

  • parses the whole SQL query,
  • builds an abstract syntax tree (AST) out of it,
  • analyzes the NULL parameters,
  • strips the parameters from the tree
  • and finally rebuilds the modified SQL query from the AST.



And that, my friends, will change the way you write filters for a data-grid forever.

🤘 BOOM! 🤘
If using TDBM is not an option for you but if you are still interested into this magic-parameters feature and all the SQL to AST voodoo, you will be pleased to know that we extracted this feature in a third party library named Magic Query. And it actually does much more! Check it out!

Managing JOIN clauses

Of course, most of the time, you want to perform queries on several tables and you have to write the JOIN clauses.

Writing JOINs is tedious. And most of the time, the JOINs we are following are obvious (you simply need to follow the foreign keys). So what if your ORM could simply understand the model and propose the right joins?

Let's see this through an example:

This model is quite common (a user has several roles, a role has several rights).

I want to fetch all the rights for a given user. TDBM can automatically find the path between the users and the rights table.

So your query will look like this in the RightDao:

class RightDao extends AbstractRightDao {
    /**
     * Returns the list of rights for a given user
     *
     * @param User $user
     * @return Right[]
     */
    public function getRightsForUser(User $user) {
        // Behold the magic!
        return $this->find($user);
    }
}

Behind the scene, TDBM analyzes your DB model and finds the shortest path between all linked tables.

🤘 Re-BOOM! 🤘
And now is the right time for a quick reminder: "With Great Power Comes Great Responsibility". TDBM finds the shortest path but along the life of your application, your model may evolve and when the model evolves, the shortest path can evolve too! In the many years I've used TDBM, I've been bitten by this problem 3 times. You have been warned, if you use this feature, it is better to shield your code with integration tests. Also, TDBM deals poorly with models that have "loops". For these models, you should go back writing the JOINs.

... more about JOINs in the TDBM documentation

Integrations

Since version 5, TDBM is framework-agnostic. You can use it in the framework of your liking (or even without a framework).

Out of the box, we provide integrations with:

So far, we have been reluctant to work on a Symfony integration, because the Symfony community is generally very fond of the data-mapper pattern and we were kind of afraid of the welcoming TDBM might get there. But we are done with the impostor syndrome so we will be building a Symfony bundle for TDBM 5.1.

Avoiding the active record pitfalls

TDBM has been developed for more than 10 years. In this time lapse, the ORM landscape has particularly changed. We are well aware of the pitfalls of the "active record" model. We understand why Doctrine is advocating the use of the data-mapper design pattern, but this is not the path we chose.

But most of the pitfalls attributed to active record ORMs don't apply to TDBM.

Here is a list of what we did to avoid those pitfalls:

Active records are not testable

TDBM models can be created using the new keyword. So you can definitely use those in unit tests without hitting the database.

The active record pattern violates the single responsibility principle (objects used to query the database are the same that hold data)

Not in TDBM. TDBM does a strict distinction between Repositories (also called DAOs) that are used to perform the query and Models (also called beans or entities) that map rows. In that regard, TDBM is not an active record ORM.

Active records leads to anemic models

It is true that TDBM generates getters and setters for all columns. But:

  • all non-nullable columns are part of the object's constructor so you cannot create an object with missing non-nullable columns.
  • the model is split in 2 classes. For instance: User and AbstractUser. You can safely modify the User class to add or extend any method in it. It is therefore easy (and recommended) to provide a rich model.
We are working on adding more tools to help you avoid the anemic model. The next version of TDBM will contain a feature to avoid generating getters and setters on some columns.

One pitfall will however be true and remain true in the foreseeable future: TDBM uses your database model to generate your model! That means your model is tied to your ORM (unless you design a bridge between the beans and your true model).

If you are looking for a framework-agnostic model, data-mappers are the way to go. Look at Doctrine (with XML mapping files for a truly framework-agnostic model).

Performances

A lot has been done to have great performance. TDBM is caching most of heavy-lifting done on queries (AST parsing, shortest path finding, etc... are cached).

But the most significant feature regarding performance is TDBM's support of weak references (with the help of the "weakref" PHP extension).

Weak what???

Weak references. Let me explain.

Have you ever tried to use an ORM in a batch that manages a million objects? Did it go well? I bet it didn't!

When an ORM is fetching an object from the database (for instance a "User" with ID 42), it keeps internally a reference to this object. This way, if the program queries the same object again, the ORM will not query the database again but it will return the reference it already has.

In Doctrine, this is the role of the EntityManager class.

Now, if you start fetching tens of thousands of objects from the database, the ORM will keep a reference on all these objects (even if you don't need those!). Your memory will slowly fill up until no more memory is available.

With ORMs like Doctrine, the only workaround I'm aware of is to manually "detach" the object from the EntityManager. But this must be done manually by the developer and this ties a bit more your application to the ORM.

TDBM, on the other hand, has a support for weak references. Using the "weakref" extension, it can automatically free the memory of your objects as soon as your application don't need them anymore!

This means you can actually use TDBM in scripts to deal with large quantity of data without sacrificing performance or memory.

Other features

This article is already quite long and there are still some features we haven't talked about.

You might be interested to know that TDBM has native support for:

  • object inheritance: you can have models/beans that inherit from one another
  • pagination using beberlei/porpaginas: allows to paginate a TDBM ResultSet with any compatible paginator
  • UUIDs: TDBM can generate UUID for you
  • customisable naming strategies: If you want to fine-tune the naming of your classes and methods
  • JSON serialization: TDBM does its best to offer meaningful default serialization for models (but you should still consider writing your own)

Also worth to mention, TDBM has no support for:

  • aggregating data (SUM, AVG, HAVING, ...): TDBM is an ORM and aggregated data does not belong to the model objects. You can still use the underlying Magic Query library to get most of the amazing features of TDBM with your regular SQL queries.

Conclusion

So this was TDBM's coming out :)

Remember... there is no silver-bullet. Depending on your application, TDBM might be a great fit, or not so great. For small to moderately complex projects, we really enjoy using this tool at TheCodingMachine. It helps us being productive.

And I hope this article will give you the desire to give it a try.

Also, now is a good time to thank all the maintainers of the packages we are using in TDBM, and in particular the maintainers of the Doctrine DBAL library (the low level database abstraction library TDBM is built upon).

And as any other open-source project, we definitely welcome contributions!

Want to stay tuned on the latest TDBM releases or PHP related news? Follow me on Twitter!

About the author

David is CTO and co-founder of TheCodingMachine and WorkAdventure. He is the co-editor of PSR-11, the standard that provides interoperability between dependency injection containers. He is also the lead developper of GraphQLite, a framework-agnostic PHP library to implement a GraphQL API easily.