Using Spreadsheets In Bioinformatics Can Corrupt Data, Changing Gene Names Into Dates

from the careful,-now dept

A few years back, people were rather disturbed to find out about the famous Excel bug, whereby the multiplication of two numbers in Microsoft's spreadsheet gave the wrong number. It turns out there are other circumstances in which Excel (and, to be fair, presumably other spreadsheets) can give incorrect results, but they are unlikely to be encountered in typical everyday tasks. However, in the specialized world of bioinformatics, which uses computers to analyze data about genes and related areas, careless use of spreadsheets can throw up a significant numbers of errors, as this paper in BMC Bioinformatics explains:

Use of one of the research community's most valuable and extensively applied tools for manipulation of genomic data can introduce erroneous names. A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates. For example, the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] was being converted to '1-DEC.'
Here we have the interesting interaction of two very different fields, where the name of a gene involved in esophageal cancer, DEC1, was interpreted by Excel to mean the date, 1 December. As the paper points out, these kinds of substitution errors are already to be found in key public databases:
DEC1, a possible target for cancer therapy, was incorrectly rendered, and it could potentially be missed in downstream data analysis. The same type of error can infect, and propagate through, the major public data resources. For example, this type of error occurs several times in even the immaculately curated LocusLink database.
As that notes, a gene that might be relevant for treating cancer could well be missed because of this incorrect conversion to a date by Excel. Although it is unlikely that any serious harm has been caused by this -- yet -- it's a useful reminder of the dangers of depending a little too heavily on the results of software without checking for corruption of this kind.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+


Reader Comments (rss)

(Flattened / Threaded)

  1.  
    identicon
    Anonymous Coward, Jul 30th, 2014 @ 9:09pm

    Some software excels, some not so much.

     

    reply to this | link to this | view in thread ]

  2.  
    identicon
    Anonymous Coward, Jul 30th, 2014 @ 9:14pm

    "DAMN AUTOCORRECT!"

     

    reply to this | link to this | view in thread ]

  3.  
    identicon
    Zem, Jul 30th, 2014 @ 9:21pm

    A perfect example of why we need software patents. If Visicalc had had the proper patent protection MS would never have been able to produce this bug ridden product that now dominates the market.

    am I doing this right?

     

    reply to this | link to this | view in thread ]

  4.  
    icon
    sinsi (profile), Jul 30th, 2014 @ 9:30pm

    Select column, Format as Text.

     

    reply to this | link to this | view in thread ]

  5.  
    identicon
    Editor-In-Chief, Jul 30th, 2014 @ 9:41pm

    This problem occurs in many different data analysis situations using Microsoft spreadsheets

    I first came across this problem around 2001 with loading sub-contract data for analysis. it just so happened that some of the text data would be interpreted as floating point numbers, thereby rendering the data analysis completely wrong.

    The data had to be pre-processed to insert "" around the relevant fields before loading the data into the spreadsheet. This did stop the spreadsheet conversion routines from touching that specific data. Unfortunately, the data was supplied in varying formats so that a uniform method couldn't be developed at the time in the timeframe required.

    David Oliver Graeme Samuel Offenbach

     

    reply to this | link to this | view in thread ]

  6.  
    identicon
    Lawrence D’Oliveiro, Jul 30th, 2014 @ 11:09pm

    Spreadsheets Are Undebuggable

    As an experienced software developer, I just have to roll my eyes in disbelief at the number of people who trust complicated spreadsheets to give the right answers without the ability to check that they actually do.

    Even important economic research is being called into question because of spreadsheet errors.

     

    reply to this | link to this | view in thread ]

  7.  
    identicon
    Anonymous Coward, Jul 30th, 2014 @ 11:26pm

    So instead of spreadsheets...

    what should they use? Clearly spreadsheets are efficient but unreliable, so what other data software is out there that can do the same job?

     

    reply to this | link to this | view in thread ]

  8.  
    identicon
    Rekrul, Jul 31st, 2014 @ 12:19am

    This is a perfect example of Microsoft's attitude of thinking they know what's best for the user. Instead of allowing the user to set such options, their software just goes ahead and does it.

    It's the same thinking that lead to them adding a virus distribution mechanism to Windows and calling it AutoRun/AutoPlay.

     

    reply to this | link to this | view in thread ]

  9.  
    icon
    Woadan (profile), Jul 31st, 2014 @ 12:37am

    Re:

    Click on the square above row 1 and to the left of column a to highlight all cells, then right-click a cell, choose format, choose text, OK as many times as needed to get out and get back to the sheet.

     

    reply to this | link to this | view in thread ]

  10.  
    icon
    Woadan (profile), Jul 31st, 2014 @ 12:40am

    Re: So instead of spreadsheets...

    Sheets app for Android; OpenOffice or LibreOffice. (Not sure if they share similar issues.)

     

    reply to this | link to this | view in thread ]

  11.  
    icon
    Woadan (profile), Jul 31st, 2014 @ 12:50am

    Re:

    The likelihood is that MS responded to many Excel community requests to make data entry easier for some sorts of data, such as dates, and so Excel now "senses" a date has been entered and converts the data to its default date setting.

    Excel is used by many organizations for many purposes. You need to at least be aware of how the software functions. Otherwise why are you using it?

     

    reply to this | link to this | view in thread ]

  12.  
    icon
    Woadan (profile), Jul 31st, 2014 @ 12:57am

    Why is a major undertaking like what BMC Bioinformatics was doing not being entered into a database? Use the right tool for the job, and Excel is not the right tool for this job.

     

    reply to this | link to this | view in thread ]

  13.  
    icon
    MrTroy (profile), Jul 31st, 2014 @ 1:02am

    Re: Re:

    This looks to be a text import/entry bug rather than a formatting bug, so I'm not sure that this would help.

     

    reply to this | link to this | view in thread ]

  14.  
    icon
    MrTroy (profile), Jul 31st, 2014 @ 1:07am

    Re:

    Why do you think Excel isn't a database? And even if it isn't a sufficiently generic database, there's no guarantee that the front-end for any other custom-built database won't have any data entry issues.

    Why do you even think you know what the requirements for the job are? Presumably being able to share the document around is one of the requirements, and databases aren't usually very good at that.

    Also, from the article: For example, this type of error occurs several times in even the immaculately curated LocusLink database.... use of spreadsheets doesn't preclude the use of databases.

     

    reply to this | link to this | view in thread ]

  15.  
    icon
    MrTroy (profile), Jul 31st, 2014 @ 1:09am

    Re: Re: So instead of spreadsheets...

    Confirmed that OpenOffice does, and I think that Google Docs does as well.

     

    reply to this | link to this | view in thread ]

  16.  
    icon
    Eldakka (profile), Jul 31st, 2014 @ 1:49am

    Re: So instead of spreadsheets...

    It's not so much the spreadsheet that's the issue, it's how people are using it. If the data was imported in the correct way it wouldn't have been an issue. However, most of the people doing this sort of data analysis (geneticists, biologists, economists) aren't Computer Science people, therefore they may not even be aware that this type of issue can occur.

    It's the same as climate scientists stating "If you aren't a climate scientist, you have no authority to comment on our work." But the problem is, most of climate science (computer modelling, statistical analysis) isn't "Climate" Science, it's a mix of Computer Science (programming for computer modelling), Fluid Dynamics (also for computer modelling, you've got to accurately model fluid dynamics to have an accurate atmospheric model), Statistical analysis (all the climate chronologies that are done are based on Statistics) and so on.

    When a non-Computer Scientist programs an advanced model, without input from a physicist/Engineer (for fluid dynamics), and picks and uses statistical methods without an in-depth knowledge of statistics to know which method to use when, you are going to have problems.

    However, all these users (Climate Scientists, Biologist, Engineer and what have you) go: "Wonderful, I don't need all these specialists (Computer Scientist, programmer, Engineer, Climate Scientist, Biologist), I can just use an off the shelf spreadsheet, whip up a few formulas in it, and presto!".

    A Computer Scientist might go: "I'll just write my own program to do this climate modelling thing, I've got a fluid dynamics text book, I'll just read that and implement it in code." The code may be beautiful, concise, with the formula's implemented flawlessly...until a fluid dynamics specialist looks at it and goes "why did you use that formula/paradigm there? That's a rather exotic case, superheated plasmas, it's not very good when you are modelling a layer of air that's at -45C..."

    Of course, if you don't have an in-depth knowledge of spreadsheets (hell, a civil servant who lives, breathes, finance and the spreadsheets that are daily created probably knows how to use a spreadsheet program better than a 'scientist', whether that's an Engineer, Computer Scientist, Economist, Climate Scientist what have you), or Fluid Dynamics, or programming or Computer Science, then you'll probably end up with a balls-up.

    Unless you have expertise in the other field, you should engage an expert in that field to at least advise on what you are doing. If a Biologist needs to manipulate a lot of genes with a computer...consult a computer scientist on the best way to do what you want done, and have them implement it for you if necessary...and if the implementation requires expertise from another field, say fluid dynamics, then the computer scientist should probably consult a fluid dynamacist, or statistician or whatever else is necessary.

     

    reply to this | link to this | view in thread ]

  17.  
    icon
    Richard (profile), Jul 31st, 2014 @ 2:08am

    Once upon a time

    Once upon a time there was a spreadsheet that was different. The formulae were clearly visible (they weren't hidden in the cells). The data items had names - not cell addresses. In short it was to existing spreadsheets as high level languages are to assembly language.

    I bought a copy (one of my very few software purchases). Twenty-one years later I still use it. It is still hugely better than excel and its clones. It was (is) called Improv.

    The fact that it did not take over is a tragedy of market failure. The fact that I have increasing difficulty keeping it going (confined now to a virtual xp box) is a tragedy of copyright.

     

    reply to this | link to this | view in thread ]

  18.  
    icon
    MrTroy (profile), Jul 31st, 2014 @ 2:18am

    Re: Once upon a time

    I feel your pain, but things aren't so bad in the rest of the world now. I know they were just two examples, but you can specify names for cells and cell ranges in most spreadsheeting software now, and at least Excel allows showing formulas instead of results across all cells. It's entirely possible that you could use Excel (or a different competing product) in the same way that you're used to using Improv.

     

    reply to this | link to this | view in thread ]

  19.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 2:43am

    EXTRA EXTRA!!

    Journal publisher states the obvious. TD runs it.

    Get your copy now!

     

    reply to this | link to this | view in thread ]

  20.  
    identicon
    Donglebert The Needlessly Unready, Jul 31st, 2014 @ 3:01am

    Re: Re:

    True, there's no guarantee. But anyone writing a database has to define the field data types, whereas Excel by default makes assumptions for you.

    And no, Excel is not a database application. It's a spreadsheet application that can do some database type work.

    Excel could be used for this sort of data work, but Woadan is entirely correct that a database would be the better solution.

     

    reply to this | link to this | view in thread ]

  21.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 3:05am

    Old news, maybe?

    BMC Bioinformatics 2004, 5:80 doi:10.1186/1471-2105-5-80

    The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/5/80

    Received: 5 March 2004
    Accepted: 23 June 2004
    Published: 23 June 2004

     

    reply to this | link to this | view in thread ]

  22.  
    identicon
    Donglebert The Needlessly Unready, Jul 31st, 2014 @ 3:08am

    Re: Once upon a time

    Yup, Improv was great. More of a db data modeller posing as a spreadsheet.

     

    reply to this | link to this | view in thread ]

  23.  
    identicon
    Lawrence D’Oliveiro, Jul 31st, 2014 @ 3:37am

    Re: So instead of spreadsheets...

    How about proper stats analysis and data visualization software?

    * NumPy and SciPy
    * matplotlib
    * The R statistical language
    * SAGE
    * GNU Octave

    ... just to name a few.

     

    reply to this | link to this | view in thread ]

  24.  
    icon
    ottermaton (profile), Jul 31st, 2014 @ 3:41am

    Re:

    This is a perfect example of Microsoft's attitude of thinking they know what's best for the user.

    This. A thousand times over.

     

    reply to this | link to this | view in thread ]

  25.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:03am

    Maybe someone from the open source community needs to create a spreadsheet system for just data analysis it would make sense that a hammer is a hammer until you need a wrench , keep things very simple , I don't open VLC to edit photos I use GIMP.

     

    reply to this | link to this | view in thread ]

  26.  
    icon
    Zeissmann (profile), Jul 31st, 2014 @ 5:17am

    Idiocy

    If anything, this is a remainder that using spreadsheets for stuff they weren't meant to be used for (like maintaining databases) is a sign of idiocy. Especially if the spreadsheet is buggy and heavy on control-free spellchecking.

     

    reply to this | link to this | view in thread ]

  27.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:18am

    Re:

    Doesn't work. you import the data to excel and it changes the content. SEPT9 goes to 9-Sept, and when you "format as text", it doesn't get changed back to SEPT9.

     

    reply to this | link to this | view in thread ]

  28.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:36am

    Re: Spreadsheets Are Undebuggable

    The errors in Reinhart-Rogoff went beyond mere incompetence with Excel. But yes, they severely undercut their analysis with Excel based stupidity

     

    reply to this | link to this | view in thread ]

  29.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:41am

    Re:

    Yup, spot on - all it needs is a monkey dance.

     

    reply to this | link to this | view in thread ]

  30.  
    identicon
    Michael, Jul 31st, 2014 @ 5:43am

    Re:

    This is a perfect example of Microsoft's attitude of thinking they know what's best for the user. Instead of allowing the user to set such options, their software just goes ahead and does it.

    Yay for Microsoft bashing. However, this is the DEFAULT behavior of excel when pasting in data from the clipboard that is in certain common formats. Someone actually familiar with the software would have know this was a possibility and could have taken lots of different steps to avoid it - the simplest being to set the format of the column to text before pasting (all of three clicks), or properly using the import features of the software to identify data types correctly.

    This particular "issue" is not a software problem.

    If I had to select the individual data type and formatting of everything I ever pasted into Excel, I would never get anything done.

     

    reply to this | link to this | view in thread ]

  31.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:45am

    Re: Re:

    And they call this a feature because they design to the lowest common denominator. They might release a "professional" version for a few more Benjamins.

     

    reply to this | link to this | view in thread ]

  32.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:48am

    Re: Spreadsheets Are Undebuggable

    But if it is in a pie chart then it has to be correct, even when the total percentages of one pie add up to over 100.

     

    reply to this | link to this | view in thread ]

  33.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:54am

    Re: So instead of spreadsheets...

    "Clearly spreadsheets are efficient"

    I do not understand this statement. Are you saying that the software in question is efficient at producing erroneous output? How can this be considered efficient when the user then has to expend countless hours finding and correcting the introduced errors. I find this to be grossly inefficient. You would be better off writing your own scripts.

     

    reply to this | link to this | view in thread ]

  34.  
    identicon
    Michael, Jul 31st, 2014 @ 5:56am

    Re: Re: Re:

    Database:
    a collection of pieces of information that is organized and used on a computer


    Excel most certainly falls into the category of database applications - although I would argue that Excel is a database, a user interface, a calculation engine, and programming platform, a reporting tool, and an analysis toolset.

    But anyone writing a database has to define the field data types, whereas Excel by default makes assumptions for you.

    Absolutely not true. I have worked with databases in the past that were entirely made up of un-typed data, lots of them I have worked with have effectively been defined as nothing but character data or blobs, and Excel makes default assumptions, but they are only defaults and you can override them very easily and it stores everything as typed data.

    Woadan is entirely correct that a database would be the better solution

    This is the kind of thing that gets me furious. Without any specifications, you cannot possibly make this determination. For all we know, the data we are talking about is a single table with 10 columns and 300 rows. Excel is a great solution for that. Frankly, a "database" would be pretty useless for this "problem" as it was not the data storage engine that set the type and reformatted the values - that was the user interface.

    The best solution here would be a user that knows how to use whatever software they were using to import data - and setting the type properly when they did it.

     

    reply to this | link to this | view in thread ]

  35.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:57am

    Re: Re:

    Yes, it is sort of like using crayons to make your presentation to the board.

     

    reply to this | link to this | view in thread ]

  36.  
    identicon
    Adrian, Jul 31st, 2014 @ 5:58am

    Re:

    I thought it looked familiar - from 10yrs ago

    http://www.theregister.co.uk/2004/07/16/excel_vanishing_dna/

     

    reply to this | link to this | view in thread ]

  37.  
    icon
    Ninja (profile), Jul 31st, 2014 @ 6:01am

    Re:

    Yeah, it keeps trying to correct my cursing and swearing. I want to call people a bunch of aunts damn it! I want to use dock, not some fancy medical name! Duck it!

     

    reply to this | link to this | view in thread ]

  38.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 6:03am

    Re: Re:

    "Yay for Microsoft bashing."

    Well deserved bashing in this case.

    Rather than make excuses and describe workarounds for a poorly designed software application, one might simply use a more sophisticated suite from a different source that is more suited to their purpose. I realize that in many corporate environments obtaining such special treatment is an uphill battle, so good luck with that.

     

    reply to this | link to this | view in thread ]

  39.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 6:05am

    Re: Re: Re: Re:

    Technically, a flat file can be called a database.

     

    reply to this | link to this | view in thread ]

  40.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 6:06am

    This is a perfect example of Microsoft's attitude of thinking they know what's best for the user.

    this is exactly the situation, and it's not just microsoft but pretty much all software are certain they know infinitely more about what the user should be doing than the user. for instance, doubling spacing after a sentence-ending period. most websites refuse that even though at times it really aids readability. momma knows what's right for you.

    another real pain to me is headers when sorting. about half the time excel declares the top cell to be a header and doesn't include it in the sort. if you aren't vigilant about checking, you may get an error there.

    super annoying to have to do handstands to use the stupid software. if that auto-dating feature (which annoys me, too) causes a serious error, i hope the offended party sues the shit out of microsoft.

     

    reply to this | link to this | view in thread ]

  41.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 6:08am

    Re:

    Sounds like the voice of experience ... must've got butt hurt in a big meeting with pie chart enthusiasts.

     

    reply to this | link to this | view in thread ]

  42.  
    icon
    RadioactiveSmurf (profile), Jul 31st, 2014 @ 6:22am

    This is why I hate Microsofts "helpful" features. If I wanted December 1st I would type that. I know what I'm doing and it takes twice as long to go back through and correct the mistakes that Microsoft created for me.

     

    reply to this | link to this | view in thread ]

  43.  
    identicon
    Donglebert The Needlessly Unready, Jul 31st, 2014 @ 6:31am

    Re: Re:

    If you open excel, and then try to open a text or csv file, the Import Wizard will start and give the option to define each column's data type before import. It won't convert it.

    The wizard doesn't run if you click on a file and select to open it in Excel, or if you set your exporting application to directly open Excel, or if you paste the data into a sheet. It will then autocorrect the data into a date field that, at it's heart, is a formatted number. If you convert that back to text, you'll see the number.

     

    reply to this | link to this | view in thread ]

  44.  
    icon
    LduN (profile), Jul 31st, 2014 @ 6:31am

    Re: So instead of spreadsheets...

    Any database should do the trick... quite simple too

     

    reply to this | link to this | view in thread ]

  45.  
    icon
    nasch (profile), Jul 31st, 2014 @ 6:39am

    Re: Re: So instead of spreadsheets...

    You would be better off writing your own scripts.

    Because that will work right the first time?

     

    reply to this | link to this | view in thread ]

  46.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 6:42am

    Re: Re: Re:

    Excel sucks at importing CSV data, it is a well known feature.

     

    reply to this | link to this | view in thread ]

  47.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 6:43am

    Maybe they should have checked through the formatting on the spreadsheet. Maybe they should have used a different statistical analysis software. Maybe they shouldn't have assume that using a basic consumer level program in a basic manner would be fine for statistical analysis.

    We knew better than this in 2nd year stats classes in college. In 3rd year we were using alternative stat programs with more built in functionality. This isn't a failure of Microsoft. This is people not being careful, not doing work how they were supposed to, being cheap, and trying to find a scapegoat for their sloppy work.

     

    reply to this | link to this | view in thread ]

  48.  
    icon
    nasch (profile), Jul 31st, 2014 @ 6:43am

    Re: Re: Re:

    Rather than make excuses and describe workarounds for a poorly designed software application

    It's not a workaround, it's using the software correctly. Maybe the default behavior isn't right for these users, but that doesn't mean the software is flawed (in this particular way). It's not "Excel for Bioinformatics" after all. If a user doesn't know how to import data correctly, doesn't learn how before doing it, and ends up with incorrect data, why is that Microsoft's fault?

     

    reply to this | link to this | view in thread ]

  49.  
    identicon
    Donglebert The Needlessly Unready, Jul 31st, 2014 @ 6:51am

    Re: Re: Re: Re:

    I didn't say that Excel couldn't be a database, I said it was not explicitly a database application. That's why Microsoft also produce Access - which is a database app, and SQL server, which is a rdms.

    Excel is a spreadsheet app. Yes, like some database apps, it holds data in a tabular format. It also typically displays data in a tabular format, which most databases don't do (unless you run queries - an area where Excel is also weak (but, then, it's a spreadsheet app, not a db app). It can't cope with significantly large amounts of data. In day to day use it relies on the same formulas being duplicated again and again with different inputs. It doesn't create indexes effectively.

    A pen and paper can be a database too. So can Word. So can Photoshop. That doesn't mean they're database apps.

    And, if you come across databases where a field that requires a specific datatype has not been defined as that data type, then the developer didn't know what they were doing or the spec they were working from was flawed.

    And we're talking genome data here. It's not unreasonable to assume that the dataset was very large. Otherwise, this story wouldn't be a story.

     

    reply to this | link to this | view in thread ]

  50.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 7:01am

    Re: Re: Re: Re: Incorrect data

    If a user doesn't know how to import data correctly, doesn't learn how before doing it, and ends up with incorrect data, why is that Microsoft's fault?
    The data is mangled by a "feature" that is on-by-default in every new document, and as I understand it, the user is never proactively informed that the feature exists. I understand the appeal of a tool that is immediately ready for use on first install (after Click Once gets done downloading the multi-GB installer for 15+ minutes ;)), but presenting an interface that looks like it will always "do the right thing" is an attractive nuisance. It lures users into assuming Excel will do the right thing, without any hint that its idea of right differs from what the user intends.

    A dialog on first import saying "Excel has normalized some date-like strings." could have alerted the user that the import mangled the results. The burden would then be on the user to go back and reimport the data correctly. A fancier version might offer links to documentation that explains what kind of normalization Excel enables and options to re-run the import in place with those normalizations disabled. A very fancy version could even import the data in non-normalized form, but provide an option to have Excel interact with it (render, search, transform, etc.) in normalized or as-imported form.

     

    reply to this | link to this | view in thread ]

  51.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 7:04am

    Re: Re: Re: So instead of spreadsheets...

    No, because you control the outcome.

    It was an example of how inefficient a poorly designed application is, when doing it yourself is potentially better.

     

    reply to this | link to this | view in thread ]

  52.  
    icon
    Sheogorath (profile), Jul 31st, 2014 @ 7:13am

    A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates.

    Okay, which idiot didn't turn this 'default' feature off?

     

    reply to this | link to this | view in thread ]

  53.  
    icon
    nasch (profile), Jul 31st, 2014 @ 7:18am

    Re: Re: Re: Re: Re: Incorrect data

    A dialog on first import saying "Excel has normalized some date-like strings." could have alerted the user that the import mangled the results.

    Sadly, users rarely read dialogs. Reading documentation is probably even less likely. Given that they're not going to get much help from the user, software designers have to choose what is the best default behavior for the greatest number of users. Perhaps you disagree with the default behavior, but either way it's going to be wrong for someone.

     

    reply to this | link to this | view in thread ]

  54.  
    icon
    nasch (profile), Jul 31st, 2014 @ 7:24am

    Re: Re: Re: Re: So instead of spreadsheets...

    It was an example of how inefficient a poorly designed application is, when doing it yourself is potentially better.

    So you're saying making mistakes using Excel (because that's what happened here) is inefficient, and it would be a better use of time for that same person (who is a scientist, not a software developer, and can't use Excel properly) to write their own software from scratch?

     

    reply to this | link to this | view in thread ]

  55.  
    identicon
    Kyle, Jul 31st, 2014 @ 7:37am

    Re: Once upon a time

    Not all is lost, it can still be got from here:

    http://www.vetusware.com/download/Lotus%20Improv%202.1/?id=5797

     

    reply to this | link to this | view in thread ]

  56.  
    identicon
    Kyle, Jul 31st, 2014 @ 7:48am

    Re:

    A good drop in replacement for Excel and Access can be found at

    https://www.libreoffice.org/

    But as a number of posters above have said, nothing beats inputting the correct information and, just as important, make sure cell-formatting is done correctly.

    I suppose any spreadsheet / database program will fall flat if these two criteria aren't met.

    I remember the good old days of DBASE III. Currently, they can be found at

    http://www.dbase.com/

    and they are sitting at version 9

    Just my - stating the obvious - 2c worth. :-)

     

    reply to this | link to this | view in thread ]

  57.  
    icon
    The Groove Tiger (profile), Jul 31st, 2014 @ 8:07am

    Re: Re: Re:

    Nah, that's only for fixed width text import.

    CSV import "just works!"

     

    reply to this | link to this | view in thread ]

  58.  
    icon
    Avatar28 (profile), Jul 31st, 2014 @ 8:16am

    Re: Re: Re: Re:

    > It's not "Excel for Bioinformatics" after all.

    Exactly. People need to remember that Excel isn't marketed for that particular use. The vast VAST majority of users of this software benefit from these sorts of changes. Remember, most will be using it for things like financial data and stuff like that. For those people importing a date that displays in a different format can create issues. In the US we would write Dec 1 but in Europe 1 Dec is more common. I can pretty much guarantee that displaying gene abbreviations correctly is a niche use that NO designer ever had cross their mind. I'd also be surprised if the other packages on the market didn't do the same thing.

     

    reply to this | link to this | view in thread ]

  59.  
    icon
    Avatar28 (profile), Jul 31st, 2014 @ 8:20am

    Re:

    And for every person like you there are probably 10 that find that sort of feature helpful. Just like when Word first went WYSIWYG. A few people bitched about it. They wanted their old display where it just showed the document with a markup-style layout instead of WYSIWYG. Most people wanted the latter and that's what we get now.

     

    reply to this | link to this | view in thread ]

  60.  
    identicon
    Michael, Jul 31st, 2014 @ 8:37am

    Re: Re: Re: Re: Re: Incorrect data

    A dialog on first import saying "Excel has normalized some date-like strings." could have alerted the user that the import mangled the results.

    I'm not sure if you have noticed that software has really stopped popping up dialogs like this, but while they used to be standard practice, sometime in the late 80's software companies realized that their customers HATE THOSE THINGS MORE THAN GETTING SOMETHING WRONG NOW AND THEN.

     

    reply to this | link to this | view in thread ]

  61.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 9:34am

    both the date-whether-you-need-it-or-not issue and the imposition-of-headers-sometimes-for-sorting-when-you-don't-have-them issue are really annoying and force the user to check what the stupid software is doing rather than concentrating on his or her task.

    bad design really is bad design.

     

    reply to this | link to this | view in thread ]

  62.  
    identicon
    Rekrul, Jul 31st, 2014 @ 10:02am

    Re: Re:

    The likelihood is that MS responded to many Excel community requests to make data entry easier for some sorts of data, such as dates, and so Excel now "senses" a date has been entered and converts the data to its default date setting.

    That should be an option.

    I'm sure that there are probably quite a few people who wouldn't mind having all their figures rounded up or down to make things simpler. Should the software just go ahead and do that for everyone?

     

    reply to this | link to this | view in thread ]

  63.  
    icon
    mermaldad (profile), Jul 31st, 2014 @ 10:12am

    Re:

    Nope. The problem is that Excel converts the *value* of the cell to the numeric equivalent that represents the date, and also converts the cell format to Date. So if you format as text, you fix one problem, but not the other. "DEC1" becomes 41974, for example.

     

    reply to this | link to this | view in thread ]

  64.  
    icon
    John Fenderson (profile), Jul 31st, 2014 @ 10:21am

    Re:

    "This is a perfect example of Microsoft's attitude of thinking they know what's best for the user. Instead of allowing the user to set such options, their software just goes ahead and does it."

    A million times this. This is a consistent problem with Microsoft software.

     

    reply to this | link to this | view in thread ]

  65.  
    icon
    John Fenderson (profile), Jul 31st, 2014 @ 10:23am

    Re: Re:

    "This particular "issue" is not a software problem."

    I couldn't disagree more. This issue is most definitely a software problem. At the very least, conversion warnings should have been issued.

     

    reply to this | link to this | view in thread ]

  66.  
    icon
    mdpopescu (profile), Jul 31st, 2014 @ 10:51am

    Re: Re: So instead of spreadsheets...

    I wish I could give more votes to this comment. Bravo.

     

    reply to this | link to this | view in thread ]

  67.  
    icon
    mdpopescu (profile), Jul 31st, 2014 @ 10:57am

    Re: Re: Re:

    Yes. In general, the less options you present to the user, the better. See various articles on the subject, like Joel's or Jeff Atwood's.

     

    reply to this | link to this | view in thread ]

  68.  
    icon
    nasch (profile), Jul 31st, 2014 @ 11:15am

    Re: Re: Re:

    At the very least, conversion warnings should have been issued.

    That would merely be a CYA issue, since generally users would ignore them. It wouldn't have any substantive effect.

     

    reply to this | link to this | view in thread ]

  69.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 11:36am

    In general, the less options you present to the user, the better.

    some years ago i was employed using the design software catia v4, which was very amendable to make the environment the most useful to a designer that it could be.

    one day a fairly new user complained to me that he wished there weren't so many things that could be adjusted. i told him i agreed exactly and that all machines should be set up the way i like mine.

     

    reply to this | link to this | view in thread ]

  70.  
    identicon
    Michael, Jul 31st, 2014 @ 11:50am

    Re: Re: Re:

    It IS an option.

    You can turn this option off by selecting your worksheet and formatting the cells as text. Anything you add from that point forward will be treated as text.

    It's not rocket science, but sometimes it helps to RTFM.

     

    reply to this | link to this | view in thread ]

  71.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 11:54am

    The problem is that Excel converts the *value* of the cell to the numeric equivalent that represents the date, and also converts the cell format to Date. So if you format as text, you fix one problem, but not the other. "DEC1" becomes 41974, for example.

    yes, but it's even worse than that. if excel can't discern a year, it will insert this year into that date. so the spreadsheet you make that error on this year will equate that cell to 41974, but if you go back to a 2013 table, it won't be that number for dec1.

     

    reply to this | link to this | view in thread ]

  72.  
    identicon
    Michael, Jul 31st, 2014 @ 11:56am

    Re: Re: Re:

    These options are easily turned off.

    They are on by default because that is what most of the users want most of the time. They have gotten years of feedback, complaints, and requests to come up with the defaults - they didn't arbitrarily decide that automatic date detection should be on and automatic fraction detection should be off.

    However, the included documentation and the ability to control all of these things - just in case you don't want them on for a specific purpose.

    If they had gone your way, you would type in a list of numbers, highlight them, and wonder why it didn't show you the total.

     

    reply to this | link to this | view in thread ]

  73.  
    identicon
    Michael, Jul 31st, 2014 @ 12:02pm

    Re:

    he wished there weren't so many things that could be adjusted

    Damned if you do, damned if you don't. In my experience it has always been that each user has their handful of things that should be configurable and "everything else is just clutter that should be removed". And guess what, it was a different handful of things for each user.

     

    reply to this | link to this | view in thread ]

  74.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 12:36pm

    michael, i hope you realize that my comment #69 and your comment #73 are essentially saying the same thing.

    i'm pleased, of course, when someone adds heft to my argument, but i always hope that someone realizes it.

     

    reply to this | link to this | view in thread ]

  75.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 12:40pm

    Re: Re: Re: Re:

    It's not rocket science, but sometimes it helps to RTFM.

    But Micro$oft claims their software is intuitive to use, therefore users should not need to RTFM.
    /big myth

     

    reply to this | link to this | view in thread ]

  76.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 1:18pm

    Re: Re:

    It works fine IF you format the columns BEFORE the import. That takes a little planning and advance work, so maybe not.
    .

     

    reply to this | link to this | view in thread ]

  77.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:30pm

    It works fine IF you format the columns BEFORE the import.

    no, it doesn't.  i just tried it.

    now, it does if i set the format of a cell or group of cells to be TEXT and enter the value into the one-line input field up top or copy/paste from a text document one item of information into that field.

    problem is if i copy/paste a large amount of info, say 30 football players' info, from a document, i can't put it all in that one-line entry field.  i have to pick a location in the sheet and dump all that info with respect to the place i picked to put it.  when i do that it changes the format of those TEXT cells to CUSTOM and changes those 5-10 football players' heights to 10-May.  if i force that format back to either GENERAL or TEXT, i get 41769.

    it doesn't work unless i handpick some cells to format and transfer the information one item at a time and be careful to select the correct cell and dump the info into that upper entry field. ONE HEIGHT AT A TIME.

    let me spell it: S U C K S.

    i tried mass dumping three players' heights and weights into that one-line upper entry field, and it took it, but put all those heights and weights of three players into that one cell, not an array of cells like i need.

    like i say, S U C K S.

     

    reply to this | link to this | view in thread ]

  78.  
    identicon
    Lawrence D’Oliveiro, Jul 31st, 2014 @ 5:44pm

    Re: would be a better use of time for that same person (who is a scientist, not a software developer

    Hence the well-known saying, that there is no time to do it right, but there is time to do it over.

     

    reply to this | link to this | view in thread ]

  79.  
    identicon
    Anonymous Coward, Jul 31st, 2014 @ 5:58pm

    i found another way the transfer works fine.

    if i have info in a spreadsheet carefully formatted so that anything that excel would misread as a date is handled as TEXT, then i can select a bunch of cells, copy/paste that info anywhere in another spreadsheet, and it will reformat the cells to be TEXT in the new spreadsheet as needed.  i don't have to pre-format those cells.  so that's not too bad.

    it evens transfers the justification changes i had made in the original spreadsheet.  in other words, it brings the formatting with the data.

    transfers from any other source than another excel spreadsheet probably have to be very carefully handled, but if i painstakingly set up and excel spreadsheet, i can transfer without losing formatting.

     

    reply to this | link to this | view in thread ]

  80.  
    icon
    JP Jones (profile), Jul 31st, 2014 @ 7:09pm

    Re: Once upon a time

    Um, Excel can do all of that. You can show formulas instead of results, you can assign cells and ranges specific names, and has an actual (if basic) programming language behind its macros.

    The thing is that Excel is designed for simple stuff. It can do more complex things but it isn't really designed for it. Excel is great for small forms with a couple of automatically formatted outputs, or a home budget, or keeping track of documents in a small office. It is not designed for advanced statistical analysis of gene structures.

    Office is designed with several specialized tools for specific purposes rather than one swiss army knife that tries to do everything. You can make a presentation in Word, a simple spreadsheet in Access, and write a report in Powerpoint. It works. It just doesn't do it well.

    There was an easy way to avoid this, which a simple Google search would have revealed...click the upper left box to select the entire sheet, and change the drop down box from "General" to "Text." Problem solved.

    I personally use Excel extensively in my own job and am very familiar with both its strengths and its limitations. Is it perfect? Heck, no...I'd love for the ability to write formulas in multiple lines, sort of like how VBA is formatted, to help keep track of coding in a complex sheet when macros aren't an option (usually due to business rules). But overall the quick formatting of data is useful for prototyping and reformatting data, creating useful rosters and graphs, and other simple tasks when your boss asks for a quick analysis of something.

    This is like complaining that using the flat side of a power drill to hammer a nail isn't very easy. Why not use a hammer? Well, I want to use my power drill, screw hammers.

    Sorry. It's not the drill's fault you don't know how to use a hammer.

     

    reply to this | link to this | view in thread ]

  81.  
    icon
    MrTroy (profile), Jul 31st, 2014 @ 7:17pm

    Re: Re:

    Microsoft does a lot of bad things in a lot of software, and don't get me started on the APIs... but this is just a case of Microsoft providing what most of its customers want.

    As mentioned above, the target audience for Excel was never bio-informatics... and yet they are still catered for because steps can be taken to import data correctly, instead of blindly pasting stuff around, not checking the results, then blaming the software for getting it wrong.

    There are lots of suggestions that Excel should have warned that it had converted data to a date... but nobody has considered how much they would be pissed off when Excel issued a warning every time they entered a date, or a number - that's just training your users to ignore warnings.

    There are lessons to be learned from this story, but blaming it all on Microsoft is choosing not to learn.

     

    reply to this | link to this | view in thread ]

  82.  
    icon
    nasch (profile), Aug 1st, 2014 @ 8:00am

    Re:

    no, it doesn't. i just tried it.

    now, it does if i set the format of a cell or group of cells to be TEXT and enter the value into the one-line input field up top or copy/paste from a text document one item of information into that field.


    I don't have Excel to test it, but are you only testing with copy/paste, or are you importing from a file? Because the latter has all kinds of options that the former doesn't.

     

    reply to this | link to this | view in thread ]

  83.  
    icon
    John Fenderson (profile), Aug 1st, 2014 @ 9:24am

    Re: Re: Re:

    "but this is just a case of Microsoft providing what most of its customers want."

    I'm aware of that, but catering to what most of its customers want means that they produce software which is dumbed down and allows for stupid errors to be easily overlooked. This effect is one of the reasons I avoid Microsoft software as much as I can.

    "blaming it all on Microsoft is choosing not to learn."

    Don't misunderstand me -- I'm not blaming it all on Microsoft at all. This type of problem is hard to resolve in a way that pleases everyone. I'm just saying that Microsoft's approach is one that consistently fails to meet my needs.

     

    reply to this | link to this | view in thread ]

  84.  
    identicon
    Zonker, Aug 1st, 2014 @ 5:25pm

    It looks like you're trying to insert some dates into your spreadsheet.

    Would you like help?

    * Convert all text into date format.

    * Leave all my gene names alone, you fool!

    * Yes, I desperately in need of a date! Do you have any attractive single friends who have low enough standards to go on a date with me? (Forever alone!)

    [ ] Don't show me this tip again

     

    reply to this | link to this | view in thread ]

  85.  
    identicon
    Anonymous Coward, Aug 4th, 2014 @ 5:22am

    Re: Re: Re:

    Warning that it has performed inconsistent conversions on a column would be useful - it is very likely to mean that either it has munged something that happened to look date-like or number-like (phone numbers, SKU codes, etc.), or there was malformed data in the column which you'll need to fix.

     

    reply to this | link to this | view in thread ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Save me a cookie
  • Note: A CRLF will be replaced by a break tag (<br>), all other allowable HTML will remain intact
  • Allowed HTML Tags: <b> <i> <a> <em> <br> <strong> <blockquote> <hr> <tt>
Follow Techdirt
Advertisement
Essential Reading
Techdirt Reading List
Techdirt Insider Chat
Advertisement
Recent Stories
Advertisement
Support Techdirt - Get Great Stuff!

Close

Email This

This feature is only available to registered users. Register or sign in to use it.