How Bad Are Geolocation Tools? Really, Really Bad

from the what-a-mess dept

Geolocation is one of those tools that the less technically minded like to use to feel smart. At its core it’s a database, showing locations for IP addresses, but like most database-based tools, the old maxim of GIGO [Garbage In, Garbage Out] applies. Over the weekend Fusion’s Kashmir Hill wrote a great story about how one geolocation company has sent hundreds of people to one farm in Kansas for no reason other than laziness. And yes, it’s exactly as bad as it sounds.

Most people often aren’t the most technically minded, give them a tool, tell them it CAN produce an output, and they’ll assume that any output that looks like the best quality possible, IS the best one available. It’s extremely common with ‘forensic evidence’ and jurors in court cases, where it’s given weight well beyond its actual evidentiary value (to the point that they now distrust cases without it) ? there’s even a name for it, “the CSI effect“, named after one of the TV shows that uses it as a cornerstone.

One of the latest tools to get the blind trust of morons is IP Geolocation. At its basic level, it’s a database of IP addresses with latitude and longitude listed, so when you look up an IP address, you get a pair of coordinates you can associate as an ‘origin’ for that.

However, there’s a number of problems with that.:

  • First, what about those that don’t have a lat/long listed?
  • Secondly, how often are they updated?
  • Third, how do they deal with cellular or ‘mobile’ devices?

So let’s quickly address them.

Those that don’t have a lat/long listed.

Well, there’s a few ways to do it, but the way some chose to do, is just to guess. In the article that started me on this, it points out that the company MaxMind decided to guess at the average closest place it could ? the geographical center of the US, except 39?50’N 98?35’W. is a messy decimal (39.8333333 N,98.585522W) so it rounded them to 38N, 97W. It’s the front yard of a farm in Kansas.

Other times they just guess and get a town and put it somewhere there, although even that can be off a bit. It can be a lot off, as you’ll see shortly.

How often are they updated?

There’s no telling. With the great shortage of IPv4 addresses now, but with an ever-expanding list of devices, from cell phones to thermostats and even fridges, IP addresses are shifting around everywhere. There’s also mergers and splits of companies, bankruptcies and so on. So unless the database is frequently updated, there’s no chance that anything it has to say will be accurate ? again we’ll see that directly.

Finally, how does it deal with cellular devices?

Simply put, they don’t. The handoff mechanism means that you’ll often carry one IP address from one tower to the next (otherwise you’d have to terminate and restart any data transfer as you shifted between towers. In addition most cellular providers hide their cell customers behind NAT, precisely because of the lack of discreet IPv4 addresses to give out (and their? slowness in migrating to IPv6)

Odds are you’re going to get a local network control center, or regional corporate office instead, which means it’s practically no use at all.

Oh dear….

This all assumes as well that entries are made in good faith. One of the more common uses of geolocation is for targeted adverts, especially with ‘adult websites’, where they promise there’s a horny woman (or man, if your browsing is detected as such, or the ‘content’ suggests you may be female) close by. Or you may have seen it in the scam adverts on news sites that should know better than to accept low-rate advertising based on scams (with easy to tell, clickbait headlines about insurance ‘tricks’ or similar).

This means that if you can ‘rig’ the database, you can expose the stupidity in parts of it, as was best demonstrated by Randall Munroe in his XKCD comic series.

So just how inaccurate are these systems? The easiest way to tell by far is to run some IP addresses where you know the location through these systems and see how far off they can be. So I did.

The most obvious one to start with is my own home connection’s IP address. So I tried the link in the story, and boy was it off! Just for the record, I live on the south side of Atlanta’s metro area, near Macon ? Walking Dead country in fact

That’s right, it put me in Ottawa, capital of Canada, roughly 1900km (1180 miles) and 1 whole country off. Part of that comes from the second question, how current the data is. It’s listing my IP as belonging to Nortel networks. Problem is, I’m not a subscriber to Nortel ? no-one is, the company was wound down years ago. Yet some databases still have them listed.

Cellphones don’t fare much better either. I used the same service on a 4G Verizon phone sitting at my computer. It’s location, San Diego. That’s 1900 miles (3050km) off. Others services gave locations of New York, Atlanta, and Macon.

Wondering if it’s just my semi-rural system that’s messed up, I called a few friends who live in the Atlanta suburbs (a few streets from each other) and asked for their IP addresses, one used Comcast, and the other AT&T. Maybe things will be better and more accurate in a big-city environment?

I ran a number of different GeoIP services, and it was a very mixed bag of results.. One thing’s certain though, none of the four set of coordinates gave an accurate location for the person (for obvious reasons I’m not going to give you their address, or mine for that matter)

Of them all, only one service ? IPCIM.com ? gave an error circle with a location, (twenty five mile radius), but it didn’t do it for all. To me that indicates knowledge of its inaccuracy, but it’s lack at other times seems to show it just doesn’t care.

The second and third locations are the same coordinates, but they’re less certain of the third than the second, despite both being off.

There’s also something specific to note. There’s 4 providers covered here. Two were done from the exact same location, yet their locations came nowhere near matching. Two more were IP addresses just streets away, but they also didn’t match that well, although many went to the same default locations, including two which went to the ‘lazy US Center’ investigated in the Fusion piece.

More importantly, of the 30+ geolocating attempts made here, not a single one managed to be within a mile of the actual location (although one location was within a mile and a half, while another was within 3 miles ? again, I’m not going to give out specifics). So for those who want to rely on them as being a source of where something is, the simple answer is “don’t“. This applies as much to those tracking down people who are leaving spammy comments, as it does to police officers and lawyers seeking to use them for court actions criminal or civil.

In fact lawyers and the police have absolutely NO excuse to use these kinds of databases in litigation at all as there are better, more accurate tools at their disposal ? the courts themselves. In criminal cases a warrant is the preferred method, obtaining subscriber information from the ISP (fixed or cellular) which is far more accurate than any geolocation service because it’s data coming from the entity actually providing the connection. In a civil trial you have a discovery subpoena to do pretty much the same thing and for the same reasons.

If you’re doing it ‘on your own’, remember that these tools are as accurate as taking a dart and throwing it not at a map on the wall, but at a Google map display on your computer screen. Sure you’ll be out a display, but you won’t be potentially facing criminal charges when you go to act on what it basically bullshit data. At the very best, it can be used to advise, but it can be INCREDIBLY off, sometimes thousands of miles.

Data

The following services were used

There were 4 IP addresses used, three residential and one cellular comprising four of the biggest ISP’s in the US.

IP addresses

  • 32.99.122 (Charter fixed line cable internet connection ? K`Tetch)
  • 193.166.88 (Verizon 4G cellular connection ? K`Tetch )
  • 137.147.28 (Comcast fixed line cable internet connection ? James)
  • 172.126.144.9 (AT&T gigapower fixed line internet connection, less than 6 months old ? David)

The first two were located in south metro Atlanta, near Macon. David and James are located approximately half a mile apart in north Cobb county, Georgia.

Raw coordinates

Service

Charter Verizon Comcast

AT&T

checkIP.org 45.4167, -84.3246 32.7977, -117.1322 NOT TESTED BLANK RESULT
IP2Location 33.95621, -83.98796 32.55376, -83.88741 34.02342, -84.61549 34.02342, -84.61549
IPinfo.io 32.8685, -84.3246 32.8975, -83.7536 34.0247, -84.5033 38.0000, -97.0000
EurekAPI 32.8685, -84.3246 33.7981, -84.3877 34.1015, -84.5194 34.0247, -84.5033
DB-IP 33.9562, -83.988 40.7128, -74.0059 33.9413, -84.5177 (“Marietta (bedroom)”) 33.8545, -84.2171
IPCIM.com 32.8685, -84.3246 (? 25 mile)  NOT TESTED 34.0247, -84.5033 34.0247, -84.5033 (? 25 mile)
MaxMind (geoLiteCity) 32.8685, -84.3246 32.8975, -83.7536 34.0247, -84.5033 38, -97
MaxMind (GeoIP2) 32.8685, -84.3246 33.7844, -84.2135 34.0247, -84.5033 34.0247, -84.5033

If you’d rather see them on a map, they’re here. (Legend Charter in green, Verizon in red, Comcast in blue, AT&T in yellow)

NOTE: One data source was extremely interesting in its provision of 11+ decimal places in its results. While this might seek to imply accuracy, it actually underscores how inaccurate it actually is. Eight decimal places gives a resolution of 1.1 millimeters ? half the thickness of a CD/DVD. 11 decimal places as given in all their results is going to extremes, with locations given to less than a hair’s thickness. It has been rounded down.
The “Marietta (bedroom)” label was actually on the output from their database.

I would like to thank David and James for their help with this. And for obvious reasons, we have forced changes in IP addresses for all our connections (and the release of this article was delayed to ensure that).

This is a repost from Andrew Norton’s Politics & P2P blog

Filed Under: , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “How Bad Are Geolocation Tools? Really, Really Bad”

Subscribe: RSS Leave a comment
33 Comments
Andrew (profile) says:

Re: Re:

how’d you guess? 🙂

My wife was actually a bit nervous about this piece, because we’ve had issues in the past (Anonymous members have tried to Doxx me, and so did Jeremy Hammond in an attempt to dissuade me from reporting a parole violation in 09). None of them even came close to where I live though, so it’s not that big of a risk.

But yes, I should maybe do a followup using some of the VPN’s and proxies I’ve got access to, see where they come out. Probably something I should get to after I come down from the high of BattleBots filming this next week, if anyone’s interested?

Whatever (profile) says:

accuracy... ph yeah

First off, let me say that the databases used for these things are often out of date or not updated recently. With the shortage of IPV4 addresses, there has been a lot of blocks re-assigned, “sold”, transferred, and given new uses without all of the various scraped databases getting updated.

Second, GEOIP generally isn’t a tool of absolute precision. Rather, it’s a good indicator. If your ISP has properly filled out it’s information, if they are consistent about using the same IP blocks in one area (rather than over a wide network area), then you are very likely to see reasonable results.

Also, geo tool like these are NOT the best tools to use, because they do a lot of guess work on the way. Better tools are those that are based on user input. As an example, many dating sites use geo specific advertising to try to entire people into their sites. They use actual user signup data to general their own geo lists, honing the results based on that information and other geo tools (such as maxmind, which is usually not that far off the map).

So the end result, as an example, is that you 47.32 address returns reasonable information if you look more closely. It is assigned to Charter Communications (use to belong to an ISP in Canada, from what I could find), and they have “located” it to Lawrenceville, GA, which is likely where Charter has a regional hub, office, or perhaps a data switch point.

Now, is that accurate enough to say, land an ICBM in your driveway? Nope. Is it close enough to target a political ad to you? Yes. Is it close enough to target regional marketing? Absolutely. Local marketing? Maybe not quite as much.

It’s still a pretty good tool overall, nowhere near perfect, but maintained and updated databases plus user provided data usually leads to pretty good guesses, more than enough to target advertising and get those ads in front of the right people most of the time.

Jamie (profile) says:

Re: Re: Re: accuracy... ph yeah

Putting error circles on the map will make it look like there’s some degree of certainty in the location. More often than not, the database has no idea.

With any internet connection, your modem will connect to some point of presence (PoP) for your ISP. The PoP is effectively the bridge between customer connections and the ISP’s backbone. Each PoP will have a range of addresses that it can hand out to connecting customers. Unless you’ve got a static IP allocation, you could be handed out any address in that pool.

A large city may contain dozens of PoPs, due to the sheer number of customers each one needs to support. The distance from the PoP to where the customer is will probably be quite short, maybe a few miles at most. The location will be fairly accurate.

In rural areas, there will be fewer PoPs (due to their cost) covering much larger areas. You may have customers connecting in from 50+ miles away. Assuming that there’s a PoP in Macon, you could have an IP address handed out to someone in Forsyth one week and Cochran (40mi away) the next. The location will be fairly inaccurate.

In order for a GeoIP database to give any sort of accuracy, it needs to know: (a) where the PoPs are; (b) the size of the area served, and (c) the range of addresses it can hand out. No public GeoIP system knows this information. The only public information is which ISPs own which blocks of IP addresses. Once an ISP owns a block of addresses, it can reallocate them wherever it likes, whenever it likes. It never needs to tell anybody else about these changes.

GeoIP databases work with whatever information they can get their hands on. All they know is that some point in time, some IP address is being used by some reported location. The rest is guesswork.

Andrew (profile) says:

Re: accuracy... ph yeah

I quite like Lawrenceville, had a friend that lived there. And I like Sugerloaf Mills there too (which has a Dave+busters, a lego shop, 36 hole blacklight minigolf,, an indoor airsoft arena AND a medieval times). It is NOT, however even on the right side of Atlanta (I’m on the south) or that close.

Whatever (profile) says:

Re: Re: accuracy... ph yeah

Yes, but the problem here is that you are trying to get mile accurate for a tool that never will be – because the ISPs don’t give out that kind of information.

Now, if you were using a more commercial based system (say in part seeded with IP / city pairs from a credit card processor) you would see a lot more accuracy. Generally ISPs use IP blocks in one area (easier routing), and thus your IP block might turn up within a few miles, close enough for a Russian nuke.

Use the right tools, you get better answers. Use the wrong tools, and you are just a tool 🙂

K`Tetch (profile) says:

Re: Re: Re: accuracy... ph yeah

no, I’m not trying to, others are. I’ve seen court cases where plaintiffs have used the tools to try and get that. I even specifically say that at the start of the second paragraph.

Most people often aren’t the most technically minded, give them a tool, tell them it CAN produce an output, and they’ll assume that any output that looks like the best quality possible, IS the best one available.

And the first paragraph refers to a story where LOTS of people have assumed just that.

Me, I’m just quantifying how bad they are… as I say right at the start.

Anonymous Coward says:

Re: accuracy... ph yeah

Unless people use the bane of modern law enforcement: Routing tools. As such anyone using TOR, hideyourass, any of a number paid VPNs, a work VPN or any number of manual tricks. Geo-locating data is pretty good at finding nodes used in rerouting schemes, but that is pointless. I use rerouting and encryption if what I do is considered confidential to me. I could use it all the time, but I have a policy regarding ads: If a site I trust needs ads as funding, I will let it past my filters, but I want to control where ads are allowed. I have had ads run trojans on earlier computers. So if the ads hit close enough most of the time is probably true, but hopefully they become worse as time goes on and people learn to deal with bad actors including law enforcement.

Ninja (profile) says:

Re: accuracy... ph yeah

Unless there is info paired with that IP (IE: geo location collected from gps, towers on a mobile phone that uses such ip, internal nat address) then the accuracy is as good as me guessing the next Powerball numbers. And I’m not even taking into account vpns, proxies, spoofing, dynamic ips, open networks etc etc etc.

I think the conclusion here is that an IP as an identification tool (including everything besides Geo) sucks. And yet it is being used widely as if it’s infallible evidence of whatever. So, yes, me receiving Romanian advertising because my browser spoofs my IP is ok but a Romanian being prosecuted and arrested because something wrong I did using his IP is quite problematic.

Whatever (profile) says:

Re: Re: accuracy... ph yeah

“IP as an identification tool (including everything besides Geo) sucks.”

Not at all. For the most part, an IP generally does a very good job (because of network design) is being able to track down to a single end connection point. In fact, it would be almost unreasonable to think that an ISP doesn’t know where a connection is.

However, with the proliferation of wi fi networks, open wifi connections, TOR exit nodes, VPNs, and the like, it makes it harder to be truly accurate. However, if those services were required to maintain logs, the accuracy rate would be very high!

Anonymous Coward says:

Re: Re: Re: accuracy... ph yeah

Not at all. For the most part, an IP generally does a very good job (because of network design) is being able to track down to a single end connection point.

Not true these days, as due to IP V4 address shortages ISPs are now using NAT to share IP address, as well as dynamic address allocation. You need date, time and port as well for an ISP to locate an endpoint from their logs.

Anonymous Coward says:

Walking Dead country

I live on the south side of Atlanta’s metro area, near Macon – Walking Dead country in fact

Now, don’t be so pessimistic! The Hawks have a good chance of getting past the Celtics in the first round of the NBA playoffs, and they might even push the Cavs to, say, six games in the second round.

I mean, that is what you were referring to, right?

Anon E. Mous (profile) says:

Well this further’s my suspicion when it comes to copyright trolls suing alleged infringers of their clients movies/music and using geo location tools to locate the alleged infringer.

To listen to the copyright trolls claim in their submission to the court and as part of their application to get expedited discovery from the ISP for the subscribers information they often (and I mean religiously) say how the geo location tools they use have a 99% accuracy in pin pointing the alleged infringer lives in this are and is served by so and so ISP.

I have always thought that this was baloney, and much like Andrew has pointed out in his article you should believe in the baloney too. If you look at the data Andrew included and his test he ran, there is a lot of hocus pocus in the trolls claims off 99% accuracy.

Not only have their been cases were they the trolls sued a small business because the geo location information led them to that IP address as being the infringer (mind you the business owner replied to the court with a WTF) and then promptly dismissed the business owner once it was pointed out to the trolls it was a business, but that points out how geo location tools are not as accurate as the trolls claim.

Yet to the courts as part of their boilerplate complaints you will see the trolls always always claim how their geo location tools are 99% accurate and their investigative methods point to the IP address of the ISP subscriber as the infringer.

It really is a joke at how much the courts are fooled by the B.S. the trolls fill their submissions to the court with in their complaint to get the whole extortion scheme going and the trolls continue to get away with this

Coyne Tibbets (profile) says:

Am I Nukeable?

I started an “Am I Nukeable?” game on a blog, based on an early article on this problem. That particular game uses MaxMind as the geo-source (but you could use any locator service).

The basic idea is this:

1) Go to the locator service from your home computer;
2) Find out where it indicates you are;
3) If needed, use the Google distance measurement to find out how far the location is from your home.

If the distance is less than 1 mile (1.6 kilometers) then you are nukable: someone could look up your IP address and route a missile to you to finish you off.

(The 1-mile radius is that of the Hiroshima blast and was chosen for its historic value. Today, a Topol SS-25 missile from Russia has a radius of 7 miles, approximately, so you could use that if you prefer.)

The game is instructive, for two reasons: First of all, it teaches you about the breadth and depth of the data stored online for each and every one of us. It also teaches about the shoddiness of that data, because many people playing the game find they are not nukeable…which means if someone tried to nuke them, it would be a miss, which wouldn’t help the people where it does land, one tiny bit.

(Such as that front yard in Kansas, destined to be a volcanic pit.)

K`Tetch (profile) says:

Re: Fun fact

If you want to be picky, and the earth has, as you say, been the center of our observable universe, then I can say I’ve certainly never been within 3 meters of the center of the observable universe ever. Maybe 3900 miles from it, yes, but not ahandful of meters (I’d be cooked)

Also doesn’t count the fact that yesterday I tromped up and down Stone Mountain (elv. 1680ft, and the redneck version of the quentulus quazgar mountains) which is a lot more than 3 meters; and you don’t want to know about the flights to the UK, California, etc.

If you’re going to do facts, know your facts first!

Of course all this presupposes that he earth is the center of the observable universe, which is a hypothesis, not a fact. For instance, it’s a theoretical construct that we can see equally far in all directions, but all we really know is how far away certain objects that we can see are. And if we have objects that are further in a rough direction (say because all the scopes are better on that area – think north v south hemisphere) then the center is not ‘the center’. And of course, if we were a bit like the planet Kriket, and had a massive dust cloud obscuring part of our view, that too would skew the center.

(claiming astronomical facts against someone that does astrophysics for fun, probably not the best idea either, doubly so

Anonymous Coward says:

Yes, it's bad. It should be bad. Please stop normalizing this.

GeoIP was good for diagnostics, much like ping was good for diagnostics before SMURF. It will be deprecated as protocols start being designed to constrain the meta data problem. Don’t encourage people to think this should get better. In all likelyhood it will die a slow and painful death and get replaced with something voluntary that operates above OSI layer 4.

That has been all for this public service announcement.

Jeremy says:

Maxmind is a fraud of itself

Maxmind has me flagged as a high risk for fraud to their customers including Powerwerx a radio comms company out of California that our fire department purchases radio gear from.

I’m a firefighter and fire investigator, and yet incompetent Maxmind has ME blacklisted so we can’t purchase radios for firefighting! ?????

I called up Powerwerx after my order was canceled, and powerwerx said that I was flagged as a high risk for fraud by Maxmind and I couldn’t purchase anything over the phone either.

Leave a Reply to K`Tetch Cancel reply

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...