As seen onFavicon for

An analysis of the leaked personal data of 1.5 million @SNCBEurope customers

What interesting data can we extract from such a large dump of one of the largest Belgian public companies? And what's the danger of such a leak? Those are the two questions that I'll try to answer through this data analysis. #SNCBGATE #NMBSGATE


  1. Last week, someone found out that the Belgian Railway Company SNCB/NMBS posted mistakenly publicly posted a file on their web server a file containing 1.5M personal data of its customers (see my previous story about it). That is a significant amount of data, especially for a public company in Belgium (10M inhabitants).
  2. The data base containing 1,460,734 customers was freely accessible via a trivial query on a search engine. This management of personal data is shockingly irresponsible. The SNCB made no effort whatsoever to ensure that these data are inaccessible to the public and failed in its duty to protect its customers' personal data." says André Loconte, spokesman of NURPA.
  3. For obvious reasons I can't publish the leaked data. I wouldn't want people who are in that listing to be victims twice. However, we can mine some interesting information. Read on.
  4. Top email providers

    I've computed the top 1000 email providers. The top 5 are:
    1. HOTMAIL.COM (363,202 email addresses) 
    2. GMAIL.COM (187,090) 
    3. SKYNET.BE (99,806) (very first Internet provider in Belgium) 
    4. TELENET.BE (60,011) 
    5. YAHOO.COM (59,293)
  5. Birth years distribution

    The listing includes the birthdate for 281,276 people. Here is the distribution (I had to remove 1980 and 2003 which were, for unknown reasons, disproportionally represented. Maybe a default value in their registration form?).
  6. Language distribution

    Guess what are the 3 official languages in Belgium?
  7. It's actually Dutch (NL), French (FR) and German (DE). Don't ask. 
    Note: That's why I think that we should at least add English as an official language. As the capital of Europe, I think it's the least we should do (if only politics were data driven...).
  8. Geographic distribution [new]

    @tgouverneur mapped the data on Google Maps to have a better view of the geographic distribution of all those customers.
  9. Most customers are based in Belgium as you could imagine, but there are also many records from people abroad (probably tourists or expatriates).
  10. Governments and embassies 

    There are 5,682 emails from EC.EUROPA.EU (European Commission), 1,668 from people from the European Parliament, 222 people from the UK Foreign Office, 163 people from the US Department of State (@StateDept), and many more from other embassies (India, Qatar, Georgia, Afghanistan, Iran, Myanmar, ... to only name a few)... Remember, in many cases with phone number, birthdate, personal address and so on. Not sure I would be thrilled to hear about this leak if I was working for one of those organizations.


    There are 42 email addresses from the political party @ECOLO (including Jean-Michel Javaux @jmjavaux, Isabelle Durant @Isabelle_Durant, Evelyne Huytebroeck, ...), 13 from @GROEN but only 11 from LECDH.BE, 10 from the N-VA.BE, 5 from MR.BE (including @FlorenceReuter), 3 from OPENVLD.BE and none from PS.BE
    Note: We can tell who is used to take the train and who isn't.

    This listing also includes personal data of Paul Magnette (@PaulMagnette, Belgian Minister of Public Companies, including... SNCB!), Joelle Milquet (Vice Prime Minister of Belgium), Vincent Van Quickenborne (@VincentVQ, former minister of economy).


    - 394 people from the public radio television company @RTBF
    - 127 from RTL (@RTLTVI)
    - 65 from @LeSoir
    - 34 from Sud Presse (@SudPresseOnline)
    - 22 from @Tijd
    - 20 from @DeStandaard
    - 17 from @LaLibre
    - ...

    Who has this listing?

    Too many. It's impossible to tell how many people have downloaded this file. When such document is freely available on the Internet even for just a few hours you can't control its distribution (and in this case it has been available for a few weeks!).

    Why is that a big deal?

    Since we can't track who downloaded this file, we don't know who has access to it. This file contains personal private information from people working in embassies, governmental organizations, and other prominent people. I'm a cool guy, I won't share or publish the list (in fact I'll delete it after posting this analysis), but some people may have other motivations. Some may even sell it. 
    (If you have access to this file, please note that it is illegal to publish it. I would highly recommend that you just delete it.)

    Don't panic

    At the end of the day, for most of you, this listing is just a little more than the white pages of the old days. Unless you are a public person, that shouldn't change your life. There is no credit card information or bank accounts in there. That doesn't mean that you shouldn't care. It's generally not a good idea to have your home address publicly available on the Internet. People could cross that information with your social media accounts and find out when you are not at home and take the opportunity to pay you a visit.

    What can we do about it?

    Not much. It's a bit too late to be honest. The best we can do is to make clear to the people who do have a copy that publishing it or selling it is illegal. But we will never be able to guarantee that this file has been entirely removed from the Internet. That's life. Move on.

    But it's never too late to learn from our mistakes. We need to make sure that it won't happen again. That's why it's important that we understand what happened and what are the possible consequences. Our government needs to take that very seriously. I'm baffled by their lack of reaction regarding this case so far. I'm also disappointed by the lack of analysis in traditional Belgian media. But since in this new connected world, we are the media and we are the watchdog of our governments and democracies, it's our duty to do our part and to wake them up when they need to. So please write blog posts about it and try to educate the people around you about the issue.
  11. File a claim

    If you are Belgian and if you take the train, you are very likely impacted by this leak. I highly recommend you to file a claim at the Commission for Protection of Privacy. The more people, the more likely they will take this seriously and take the necessary measures so that it won't happen again. You could also technically ask for reparation (free rides?).

    A more sustainable solution for the future

    Let's face the truth. The real problem here is that our public institutions don't understand technology. They are not prepared to embrace this new connected world. There are many more vulnerabilities in all those systems. How can we fix that? They need to be more open and work more closely with hackers --people who do understand technology-- instead of working against them. The OpenData initiative exists so that hackers can access our data that has been collected by those institutions in a safe way (without confidential information). Keep in mind that the primary motivation for hackers is to use technology to improve people's life. Not to do harm.
  12. Hackers, be responsible

    Whenever you get access to sensitive information, please be responsible. The right thing to do is to first report it to the owner of the site so that they can fix the breach. It's only if they don't react in a reasonable timeframe that you can start telling the world about it (and be smart about that too, it's better to just leak a sample of the data after making sure you've obfuscated sensitive information, rather than telling everyone how to take advantage of the security breach).