When do neural nets outperform boosted trees on tabular data?

Otherwise, tree ensembles continue to outperform neural networks. The decision tree in the figure shows the winner among the top five methods.

Now, the background:

I explored the why of this question before, but didn’t get very far. This may be expected, given the black-box and data-driven nature of these methods.

This is another study, this time testing larger tabular datasets. By comparing 19 methods on 176 datasets, this paper shows that 𝗳𝗼𝗿 𝗮 𝗹𝗮𝗿𝗴𝗲 𝗻𝘂𝗺𝗯𝗲𝗿 𝗼𝗳 𝗱𝗮𝘁𝗮𝘀𝗲𝘁𝘀, 𝗲𝗶𝘁𝗵𝗲𝗿 𝗮 𝘀𝗶𝗺𝗽𝗹𝗲 𝗯𝗮𝘀𝗲𝗹𝗶𝗻𝗲 𝗺𝗲𝘁𝗵𝗼𝗱 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗮𝘀 𝘄𝗲𝗹𝗹 𝗮𝘀 𝗮𝗻𝘆 𝗼𝘁𝗵𝗲𝗿 𝗺𝗲𝘁𝗵𝗼𝗱, 𝗼𝗿 𝗯𝗮𝘀𝗶𝗰 𝗵𝘆𝗽𝗲𝗿𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿 𝘁𝘂𝗻𝗶𝗻𝗴 𝗼𝗻 𝗮 𝘁𝗿𝗲𝗲-𝗯𝗮𝘀𝗲𝗱 𝗲𝗻𝘀𝗲𝗺𝗯𝗹𝗲 𝗺𝗲𝘁𝗵𝗼𝗱 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝘀 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗺𝗼𝗿𝗲 𝘁𝗵𝗮𝗻 𝗰𝗵𝗼𝗼𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝗯𝗲𝘀𝘁 𝗮𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺.

This project also comes with a great resource. This time it comes with a ready-to-use codebase and testbed along with the paper.

Source