Friday, May 13, 2022

[FIXED] DDD: choose relationship or only id reference with JPA/Hibernate

May 13, 2022 domain-driven-design, hibernate, java, jpa

Issue

Here is a situation makes me quite confusing.

I have two tables: users and articles. One user can write multiple articles and one article can only have one author. From this business. I have two entity:

class User {
  long id;
  String username;
}

class Article {
  long id;
  String title;
  String content;
}

If I follow the JPA style, the Article should be like this:

class Article {
  long id;
  String title;
  String content;
  @ManyToOne
  User author;
}

This will make the query service quite easy. For example, I may have a query service to get data like fetchNewestArticlesWithUserInfo(Page page). The @ManyToOne is quite useful for mapping the Article into ArticleDTO including UserDTO.

interface ArticleDTO {
  long getId();
  String getTitle();
  UserDTO getAuthor();
}

interface UserDTO {
  long getId();
  String getUsername();
}

interface ArticleRepository {
  @Query("select a from Article a left join fetch a.author")
  Page<ArticleDTO> fetchNewestArticlesWithUserInfo(PageRequest page);
}

But if the User entity getting more and more complex in the future, fetching Article with author (eager fetch by default for @ManyToOne) seems quite unnecessary.

If I follow the no reference between aggregates constraint in DDD, this should look like this:

class Article {
  long id;
  String title;
  String content;
  @ManyToOne
  long authorId;
}

This makes the Article looks clean (in my opinion) and easy to build even the User if more complex. But it makes the query service quite hard to implement. You will lose the benifit of the relationship in JPQL and have to write code for DTO assembling.

class ArticleQueryService {
  private ArticleRepository articleRepository;
  private UserRepository userRepository;

  Page<ArticleDTO> fetchNewestArticlesWithUserInfo(PageRequest page) {
    Page<Article> articles = articleRepository.fetchArticles(page);
    Map<Long, UserDTO> users = userRepository.findByIds(articles.stream().map(Article::getAuthorId).collect(toList()))
      .stream().collect(toMap(u => u.getId(), u => u));
    return articles.stream().map(a => return new ArticleDTO(a.getId(), a.getTitle(), users.get(a.getAuthorId()))).collect(toList());
  }
}

So which one should be used? Or is there any better idea?

Solution

The problem

The write/command & read/query needs are orthogonal and those are pulling the model in opposite directions which creates tension in a unified model and can (and often does) lead to a huge mess.

Commands

On the one hand, you want aggregate roots (ARs) to be very behavior-focused and only own the minimal amount of data necessary to enforce invariants in a strongly-consistent way. That makes for a model that is easy to test, that's scalable, concurrency-friendly and lets you immediately identify which data is part of the transactional boundary. When ARs are properly modeled the commands will generally involve a single AR per transaction.

Queries

On the other hand, queries tends to need to pull data across multiple ARs, which encourages defining each & every relationship as object references in the domain model. That totally works against our command side goal. We are then left with a model where we need to optimize with lazy loading, a model that's quite opaque regarding which data is getting persisted when saving an object (have to check cascade configs), that's harder to setup for tests, that introduces direct coupling between ARs, etc.

The solution? CQRS!

The solution is actually very simple and akin to why we have bounded contexts. Rather than attempting to make a single model fulfill different goals we can have two models: the command model & the query model.

That's generally referred to as Command Query Responsibility Segregation (CQRS). In it's most complex and optimized form, CQRS could mean having an entirely different database (even in kind) to process reads, allowing to optimize indexes for reads rather than writes, de-normalize data to avoid joins, etc.

Fortunately, for most systems you actually do not need such scalability (and complexity) and can implement CQRS using a much more simplistic approach by having a logical read/write segregation. In practical terms that generally just means having two sets of services or handlers. Command services/handlers and Query services/handlers.

e.g. you may have a CommandOrderService and a QueryOrderService to process commands & queries respectively.

While the command services would usually load ARs from repositories, execute commands on those and save them back, the query services would be free to use any practical means to gather the data. Sometimes that means leveraging repositories and aggregating data at the application level, sometimes it means executing raw SQL, leveraging database-specific features and by-passing the domain model entirely.

The point is, by having that very simple command/query service split then you can focus on optimizing the domain model for writes/commands and then resort on any data strategy you would like to fulfill query needs without polluting your command processing flows. Query services tends to require a range of different dependencies and will often be much more coupled to the infrastructure, which isn't something you'd want for commands, but is a perfectly fine trade-off for queries.

There's many examples of such lightweight CQRS implementation in practice, but you can have a look at the application layer of the Implementing Domain-Driven Design (IDDD) Collaboration's BC code on GitHub.

Challenges

Even though I made it sound so simple, you are still most likely to face challenges. For instance, a different model for commands & queries means you can't easily re-use query object specifications for both, commands & queries. If you used to model authorization rules as AR specifications, you now might have to duplicate those rules on the query side or write custom translators (e.g. spec to SQL).

Another common challenge to face is to map complex specialized hierarchies. For instance, you might have a case management system where there's hundreds of different case specializations with their own schema. Manually crafting queries to load data and them map those graphs effectively could be tedious. For that reason sometimes I use dedicated query entities (not the domain model) where I map object relationships and let the ORM do the work.

Sometimes, you may even store JSON in the DB and leverage JSON-indexing features of your DB to process queries, etc.

In the context of Spring specifically, you could need additional boilerplate to integrate Pageable with hand-made queries or even JPAQuery written with Query DSL.

As you can see there's not a one size fits all strategy to process queries and that's fine because that's carefully abstracted away in a different logical model where you can do whatever works.

Conclusion

You can't imagine how often I could have written a query in 2 minutes (and have) and map it manually in a DTO an instead got drawn deep into making it work forcefully through Spring Data with awful annotations and end up with a sub-optimal and overly complex solution.

Queries are also so much easier to process looking at an homogeneous data model. Ever tried to query specialized types where the root of the hierarchy didn't own the data you need? It's very impractical with ORMs.

Anyway, in my experience lightweight CQRS always was better than the running queries through the domain model despite the new challenges that could comes with it.

Answered By - plalx
Answer Checked By - Pedro (JavaFixing Volunteer)

This Answer collected from stackoverflow and tested by JavaFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0