Data Viz Competition 2024

Competition: 2024 Love Data Week BTAA Data Viz Competition

The LEGO database was originally compiled to help people who owned some LEGO sets already figure out what other sets they could build with the pieces they had. My research question is focused on examining set compatibility over the years by finding the LEGO sets that will maximize the amount of other sets that can be built using their parts. For a set to be buildable, it must have a subset of the parts of the owned set, in a smaller quantity. I treat similar parts in different colors as distinct. Additionally, only non-spare parts are taken into account as spare ones aren't needed to build sets. The research can be useful in cases such as when kids are limited to asking for a single LEGO set for Christmas, so they pick the optimal one to have multiple sets in one.

Finding the optimal set is a complex process due to the large amount of data and its inconsistency. For each set, we would have to find possible adjacent sets to find the maximum. This process becomes even harder if we increase the owned sets because we would need to go through all combinations. Hence, I opted for another strategy that is an approximation. I built an algorithm that scores each set using weights such as on the number of parts it contains and the diversity of parts. Assuming Santa can build all sets, that is, the sets don't have to be accessible in an inventory (set_num doesn't have to be in inventory_sets.csv), the output I get is that the LEGO Set '9609-1' (Technology Resource Set) is the optimal one to buy, which was made in 1995. If we had this set, we would be able to build 71 other existing sets. On the graph below, we can see the buildable sets, which are distributed by year of creation.

As expected, the sets closer in time to the optimal set are more compatible. The original sets aren't buildable because they are quite old compared to the optimal set, and more recent sets don't have many buildable sets due to an increase of more complex sets. Below, we can see how every year LEGO is coming up with sets that have more new parts never seen before. Consequently, the compatibility of sets is reducing.

Now, envision a scenario where a child is aiming to choose the optimal LEGO set for his birthday, and the selected set must be available in inventory for his parents to make the purchase. Assuming it's the year 2017, the majority of available sets are recent, with 72.3% of them being newer than the year 2000. When running the algorithm with sets accessible in inventory, a potential optimal choice is '60052-1' (Cargo Train), manufactured in 2014. However, this set can only be used to construct five other train-related sets.

As part of an experiment, I re-executed the algorithm, this time treating similar parts with different colors as identical, yielding surprising results. Using the updated approximate optimal set '9474-1' (The Battle of Helmā€™s Deep), it was possible to build 54 sets. The graph below illustrates the sets that can be constructed, organized by their respective years of creation.

The substantial difference in compatibility observed when color variation is disregarded implies that the color of parts significantly influences the compatibility of two sets. The graph below supports the hypothesis that set compatibility has reduced over time. As more colors are introduced, there is more variation in parts. The plot illustrates the yearly increase of new colors, indicating how it has increased a lot in recent years.

In conclusion, the examination of the LEGO database exposed a trend indicating a decrease in set compatibility over time. This shift is related to the introduction of new complex sets and a growing spectrum of colors, resulting in a high part variation. One contributing factor is the increasing popularity of LEGO over the years, prompting the creation of more intricate sets tailored to specific themes to appeal to a broader audience. Another potential factor would be a strategic decision to design sets with reduced compatibility to boost sells and make more profit.

Author: BeƱat Froemming - University of Minnesota TC