Scientists Forced To Change Names Of Human Genes Because Of Microsoft's Failure To Patch Excel
from the code-is-law dept
Six years ago, Techdirt wrote about a curious issue with Microsoft’s Excel. A default date conversion feature was altering the names of genes, because they looked like dates. For example, the tumor suppressor gene DEC1 (Deleted in Esophageal Cancer 1) was being converted to “1-DEC”. Hardly a widespread problem, you might think. Not so: research in 2016 found that nearly 20% of 3500 papers taken from leading genomic journals contained gene lists that had been corrupted by Excel’s re-interpretation of names as dates. Although there don’t seem to be any instances where this led to serious errors, there is a natural concern that it could distort research results. The good news is this problem has now been fixed. The rather surprising news is that it wasn’t Microsoft that fixed it, even though Excel was at fault. As an article in The Verge reports:
Help has arrived, though, in the form of the scientific body in charge of standardizing the names of genes, the HUGO Gene Nomenclature Committee, or HGNC. This week, the HGNC published new guidelines for gene naming, including for “symbols that affect data handling and retrieval.” From now on, they say, human genes and the proteins they expressed will be named with one eye on Excel’s auto-formatting. That means the symbol MARCH1 has now become MARCHF1, while SEPT1 has become SEPTIN1, and so on. A record of old symbols and names will be stored by HGNC to avoid confusion in the future.
So far, 27 genes have been re-named in this way. Modifying gene names in itself is not unheard of. The Verge article notes that, in the past, names that made sense to experts, but which might alarm or offend lay people, are also changed from time to time:
“We always have to imagine a clinician having to explain to a parent that their child has a mutation in a particular gene,? says [Elspeth Bruford, the coordinator of HGNC]. “For example, HECA [a cancer-related human gene] used to have the gene name ‘headcase homolog (Drosophila),’ named after the equivalent gene in fruit fly, but we changed it to ‘hdc homolog, cell cycle regulator’ to avoid potential offense.”
It is nice to know that we won’t need to worry about serious problems flowing from Excel’s habit of automatically re-naming cell entries. But it’s rather troubling that Microsoft doesn’t seem to have thought the problem worthy of its attention or a fix, despite it being known for at least six years. It shows once again how people are being forced to adapt to the software they use, rather than the other way around. Or, as Lawrence Lessig famously wrote: “code is law“?
Follow me @glynmoody on Twitter, Diaspora, or Mastodon.
Filed Under: autoconversion, dates, excel, gene names, genes, spreadsheets
Comments on “Scientists Forced To Change Names Of Human Genes Because Of Microsoft's Failure To Patch Excel”
Of course they could just highlight the column or row in question, and pick format cell, and choose plain text.
From what I understand, that may not work in all cases.
Depending on how the document is saved, that data may not be preserved. So if somebody else open up that document on another computer, it would change the names to dates before anybody could adjust the column… and changing the column type after the fact won’t change the data back.
Re: Re: Format
This as well. If it converts to date it’s actually saving as a number.
Re: Re: Format
You are correct. A very common means of sharing data is the CSV (comma seperated values) file format. The file is structured so that commas seperate columns, and line breaks seperate rows.
The CSV is a very file-size efficient way of sharing data sets, but it stores literally no formatting information.
Excel is usually registered as the default application to open CSVs, and because there is no meta information about the data in the cells, Excel tries to be clever with its parsing of the information.
This leads to all sorts of fun issues – strings being interpreted as dates like in the article above, phone numbers being interpreted as numbers (and consequently losing their leading 0 if they’re not in international format), fractional prices losing their precision when being rounded to 2 places, etc.
When you have large data sets this may be quite annoying. It should be the other way around, if you want Excel to treat stuff as dates then you tell it and it converts things otherwise it should treat cells with numbers as plain numbers and cells with text as text.
Except that Excel will drop that choice whenever possible.
I am an accountant. I use .csv files to upload large transactions into my accounting software, and requires very specific formatting to acomplish.
Notably I need to not use the ‘date’ format in excel which stores the date as a number instead of the actual mm/dd/yy formatting, and I needed leading zeros without any puncutation in one column.
If I used plain text and saved my CSV file, excel would helpfully save space by removing very critical leading zeros. If I loaded the file, it would strip the dates out of the file and force them into date format. It fought me. I have since switched to Libre office which gives me the option to set formatting before i load a file, and will accept leading zeros being saved to CSV format.
Its an issue with any data set saved as a CSV or other file type that doesn’t carry format. Its most critical with large data sets that benefit in size from a lack of formatting. Excel auto format on load kills many of its best uses for me. Just formatting the sheet once doesn’t help plain data files like CSV when excel gets its hands on it. All it takes is one intern opening the file and autosave catching the changes for you to lose lots of data.
Its a toggle, Microsoft. a free software company has better CSV auto-format code then you. Get it together.
Re: Re: Format
I was just bitching on another social media site today about Excel’s poor choices in everything. Today’s gripe was about the welcome screen that helpfully shows you all of your recent files but neglects to show the file extension so you can’t tell which is which.
Also high on my aggravation list, the ISO-8601 date format that the entire world uses (YYYY-MM-DD) is not a standard date format type and has to be manually added every time you want it to make your dates look like that.
And yes, every time you load any kind of file or copy/paste any data into it, it tries to auto-format the data using what could be best described as the "best guess of a 5th grader currently flunking math and science".
Yeah, this topic makes this developer a bit not happy.
Re: Re: Re: Format
Auto-format itself wouldn’t be much of a problem, if it weren’t for the completely brain-dead handling of the result: Excel tries to auto-format, AND DISCARDS PRE-FORMATTED DATA, with no way to get it back.
You’d think a ~35 years old application whose primary purpose is organizing data wouldn’t work so hard on actively destroying said data.
Re: Re: Format
Surely you know that every MS product is required to have a totally insane and broken solution to a problem which has either been solved sensibly for decades by everyone else, or which only exists as a problem because of other insane design decisions.its in their deign requirements, right next to the one forbidding compliance with the user interface standards.
Re: Re: Format
"Its a toggle, Microsoft. a free software company has better CSV auto-format code then you. Get it together."
Hey, don’t give them ideas. MS never managed to create the illusion of "choice" without including that damn paperclip to provide you with "helpful" suggestions. No one wants Clippy back.
Re: Re: Re: Format
And the good news is that you need not deal with Clippy. MS offered several options, mostly at about the same annoyance level, for people who did not like Clippy. Just adjust your user options.
Where is Microsoft Bob when you need it?
Re: Re: Format
MS and older formats..
the Amiga had a very neat trick. The First part of loading hte file, TOLD the prog/Data what format it was in.
If you know windows and dos, it can have any ext. and Still not be able to decide what the format is, unless its told. with The thought of Loading and excel deciding what to do…its MS..THEY decide what to do.
Iv loaded Standard files into MS office and other MS products and the prog tends to DO ITS OWN THING.. Where If I used an Alt. Program designed for the ORIGINAL Formats, it never had a problem. It was Like files in/exported back and forth to Apple.
Postscript was Wonderful for this..talk about a mess.
You’ve clearly never worked a day of data analytics in your life.
Me: spends 8 hours getting everything perfectly formatted
Excel: You meant to save this all in a European accounting format, right? Done!
Microsoft: Where do we want you to go Today?
I think it’s worthwhile to ask what percentage of users type in "SEPT1" and expect it to be interpreted as a date versus the percentage who type it in and don’t. I don’t know if there’s data for that, but I would expect the majority of people typing "SEPT1" to mean September 1st. In which case no, I don’t think the correct solution is for Microsoft to modify Excel’s behavior in a way that’s unexpected and inconvenient for a larger number of users for the sake of a smaller minority, I think it’s for the smaller minority to learn how to change the default settings in Excel.
Re: what percentage of users
It’s hardly Microsoft’s fault that geneticists chose names that look like dates.
When you decide what to call things, choose wisely.
Microsoft isn’t my favorite firm, but this is hardly their fault.
Re: Re: what percentage of users
It’s Microsoft’s fault their application mangles data in the course of trying to think for you.
Re: Re: what percentage of users
Microsoft should make this behaviour turn on and off-able. (And store that state in the spreadsheet, not just the options of the installation)
Then those that need the stability of their data to be maintain can get that behaviour, and those who like the convenience of using the most widely use spreadsheet application in the world can have it hold their hand for them.
Re: Re: Re: what percentage of users
Which is a trap, because they end up being limited by what the hand holding allows, rather than learning to get to where they want to be.
Re: Re: what percentage of users
"It’s hardly Microsoft’s fault that geneticists chose names that look like dates"
It’s hardly geneticists fault that Microsoft tries to double guess every decision they make and insert their own interpretation without permission. It should be easy to turn off Microsoft’s automatic alteration of your data, and it’s their fault if they make design choices that ensure that it’s not an easy choice to specify.
No, but the correct solution is for Microsoft to incorporate persistence in the .xls file format for retaining preferences set during document creation. If you designated a column as plaintext when creating the spreadsheet, it should REMAIN plaintext when opened on other systems by your coworkers.
Autoconverting to a datatype in a spreadsheet, whatever the type, is a defect, plain and simple. The fact that it’s often absurdly aggressive in its conversion makes it even worse. Your example – SEPT1 – should not even be considered valid for conversion into a date as it doesn’t have a year. Any assumption as to what year to use – i.e. this year, the next September to occur – is a very strong assumption. Plus Excel will interpret a whole bunch of separators as being an attempt at a date, even those that don’t fit the current locale.
A lack of a conversion can always be fixed after the fact. An unwanted conversion often cannot.
Excel isn’t alone in this defect, most spreadsheets mimic this defective behaviour. It’s an antifeature, and should be purged ASAP, from every piece of software with it.
Re: Re: Re:
Attempting to mangle anything vaguely date like into a date if the cell is date or general makes sense when tying in a single cell of input, but doing it to imported data is completely wrong.
Expect somewhere in the neighborhood of epsilon. That is a mighty odd way of typing a date. I would expect users to give a space, and most of them to leave out the `t’.
And remember, we are not actually looking at user keyed input here. We are looking at data from a file. What percentage of useful programs would store a date as "SEPT1"? I would expect roughly zero.
Someone at Microsoft is immune to brain injury.
They should have gone with LibreOffice.
just put a ‘ in front of it…
Yes, certainly–the way Microsoft does it. (Of course "the way Microsoft does it" is, in the programming world, a euphemism for "ostentatiously wrong".) Merely autoconverting would just be wrong with a capital "R". The ostentation consists in refusing not to preserve the error across saves.
I suspect Microsoft fell into the "premature optimization" trap–which is a temptation even to real programmers: Let’s make the buggy program fast (or in small memory), then (maybe) try to fix the bugs. The fix always turns out to be much harder (more likely to break something else) and generally cancels out the speedup of the optimization or worse. (Microsoft is not keen on fixing bugs that the user can theoretically work around–they are considered user errors. (Which isn’t WRONG … if you include the choice of software in the first place.)
Microsoft does not have customers.
It has hostages. Everyone, escape while you still can. If more people don’t jump on the Linux bandwagon, and soon, Microsoft’s incorporation of a Linux subsystem into Windows will wind up swallowing Linux completely, and there will be no escape.
Re: Microsoft does not have customers.
I have succeeded using Apple Numbers in place of Excel and it works like a charm.
Re: Re: Microsoft does not have customers.
Apple has hostages, too. Apple just goes about the hostage-taking process a bit differently. As has already been mentioned, LibreOffice / Linux works just fine, for free people, and for free.
I am reminded of the joke:
Q: why do programmers confuse Halloween and Christmas?
A: because OCT31 is DEC25.
Another example of "smart" things being dumb.
Just use Apple Numbers app instead of Excel. I finally got away from Microsoft after 30 years and I just chuckle at some of the bullshit they made me do that I did not want to do.
Apple exercises more control over what its users can do than Microsoft. Try Linux where you have a choice of spread sheets, Libreoffice calc, Gnumeric, Caligra sheets amongst others for a spread sheet.
Though in recent years MS has been gradually moving towards Apple-style totalitarian control (and combining it with Google-style adware and spyware, for good measure), I’d say at this point they’re still an order of magnitude better than Apple.
That advantage is shrinking, but sadly not because Apple is getting better…
Forgot to say: the problem in the article is Excel insisting it knows what you need better than you do.
Suggesting Apple as a solution for that is… well, ever heard of the phrase "out of the frying pan and into the fire"?
Meme I wish I could post.
Circle 1 – Excel
Circle 2 – Incel
Overlap – Thinking everything is a date.
Excel is not the problem; your browser is!
I feel one of the most common sources of corrupted data is when you find a cvs file on the internet and want to save it on your computer.
But unfortunately, in order to help you, the browser does not simply save the file locally. Instead it launches Excel and hands it the file. And as a naive user, you immediately save the file from within Excel, not realising you just saved a bunch of autoconverted and possibly corrupted data…
Had the browser just saved the file locally, that data would be fine and you could try to your hearts content to import or open it in a number of applications, formats, with or without any automatic conversion or datatype detection, until you were satisfied that your software of choice had interpreted the data correctly.
Main workaround: don’t let the browser open an application to handle csv files.
In fact, don’t let the computer in general (I’m looking at you, Explorer) use any ‘default’ application to open any csv file which attempts to automagically convert or interpret your data! It will probably fail at some point, and possibly without you realising at that time.
Re: Excel is not the problem; your browser is!
I’m not following… you’re saying that you have your OS set to open CVS files, then Excel corrupts the data, but the fault is the browser’s for using your OS specified defaults, and not Excel for corrupting the file?
Re: Excel is not the problem; your browser is!
In IT we like to refer to this problem with error codes like —
You can blame Microsoft for a lot of things, but the settings you have on your browser of what to automatically do (like opening an APP instead of saving the file) is totally your problem, not theirs.
Found out the hard way
I so hate MS excel. Sadly, my job only allows us to use it with the "SAS" solution thru 365. I could go on a 10-page rant about the never ending ribbon changes.
Back in the day, we had the option of using a cloth ribbon instead of a carbon ribbon. You did not have to change it very often, and it was really up to you as to how faint you were willing for your type to be as the ink was transferred from the ribbon to the paper.
Ask your office supply vendor about cloth ribbons. For some people, this may be an upgrade.
how to tell if a vietnamese girl likes you
online dating sites Email Tips
have you been dating online and are frustrated with poor responses to your emails?
Or not only that, Women not replying to initial emails at all? This will make you give up isn it? particularly if you are recently divorced, seperated and single. I know the feeling and have been there and felt the pain.
very, I have good news for you because consider I am giving away extremely valuable information for free. If you go through the steps I about to outline, you will have a lot of success with online dating. These online dating tips will triple your response and dates through the internet dating sites. Let go to them. these guidelines are specific to the FIRST contact email.
You must state at the beginning of your email
That her profile shows off.
Angel64, I Mick and I writing as you sound interesting to me and your profile stands right out of the others. Then mention a few phrases of what you liked. Without gushing or sucking up in fact.
This will obviously please the lady and create her own a fixation with you. She will feel special and rightly so as you have opted to contact her. She is the lucky one to get your attraction.
Be sure post and lightheartedly poke fun of something in [url=https://www.love-sites.com/signs-that-you-can-recognise-when-a-vietnamese-lady-is-into-you/%5Dhow to tell if a vietnamese girl likes you[/url] her profile.
continuously put a smiley or [Just joking] In brackets so as not to offend. This says you are playful and not a soft guy that afraid to have a great time. She appreciate your energy.
Mention something specific in her profile that you found significant.
You worked working in london for 12 months, I was there for two many years. When were you in that respect?
This proves to her you have read her profile in
Detail and have something in keeping. this creates repore and her interest immediately.
Ask something appealing about her.
Said you do dance dvds, What style and design. could it possibly be salsa? How often do you visit dance. I similar to the club down on East esplanade on Friday nights. there is a great Latin band and free lessons from 7pm until 8pm.
This gives her a specific question to deal to.
Avoid uncertainties that are mundane and common like, your own colour?
Pick something she mentioned on her profile and ask her particulars on that.
the guidelines presented are in sequence. wind up and keep it brief. Four or Five sentences is enough. Don go on about on your and spill your guts to her. Finish with a line that says something similar to this.
Don we chat on the phone for a while and become familiar each other. It saves time beginner’s guide to long emails. Here my total number, Mick
02 47476534. I home looking for 6.30 and you catch me in a moment. Or here is my cell phone 098975643.
that’s the plan. These are very helpful and valuable tips that can save you time, Money and misery. you’ll get THREE times the replies and dates by using these tips.