Devlane Blog
Data Engineering

Data Modeling: FIFA World Cup 2022

Want to build a data model with the World Cup teams and players? Learn how to do it on this article.

Matias Mendoza
Matias Mendoza
January 2023

It’s World Cup time and we’re so excited about it!

So it’s a good time to see how we could model on a database the event, think about the entities and how they relate to each other.

Before starting, I want to introduce two abbreviations we’re going to use on each table to make it more clear:

  • PK - Primary Key: This is to identify the field that makes each row on the table unique. In this case, it’s always the id row.

  • FK - Foreign Key: This is to identify the field that relates a table to another. In practice, this is the id of another entity.

When starting to think about a model, a good approach is to first tackle the smallest pieces of data, tables that don't make much sense alone but will serve to other ones.


First, we have to start with the participants of the event, the countries.

We will include the name as string and also a string for the flag, which represents the image url.


The stadium will only have its name since it is just a fun fact for reference that does not give too much value in this case.

Match type

This entity is needed to identify whether a match is part of the group stage or knockouts.

That is why it does not have more than a name to represent it.

After defining the smaller entities, we can continue with the second level which includes the first ones.


This entity not only contains information about the player like name, the shirt number and the birth date but it also links each player with their corresponding country.


As expected, this is one of the most important entities, this is where all happens.

The match has a date and it is linked with the two participant countries and the stadium.

It does not contain a result or some other information because we will calculate all of this dynamically.


Finally, the entity that does everything to make sense is the goal.

It contains the date with the minutes and seconds when it was done, the player (from where we can infer the country) and the match.

Something to take in consideration is that in this example, we are not taking care about performance, since there is no context on which application this could be used. That’s why, all the values that can be calculated dynamically, were not represented on the tables.

Finally, if we take a look at the entire database model, it would be something like this:

We can notice that countries and matches have a many to many relationship but in this case we don’t need a table for the relation because the match has a specific number of countries on it.

Just to finish, I would like to comment that there is more than one right way to data model anything, so in this article I present, based on my experience and with no application context the best approach I consider, which could not be the best for other developers and that’s fine.

Thanks for reading and keep enjoying the World Cup 2022!