The author said it himself, these errors were caused by humans, not excel. I don't think making excel "smart" enough to prevent people from making dumb mistakes is a good idea. In fact I think that would transform excel into microsoft word, which is the only thing that could make life worse for your typical competent office worker.
Excel already has a lot of this crap. It already requires an act of god to prevent excel from changing your string of digits into a number or date. It already shows you little exclamation point icons when your formulas omit adjacent rows or are different from other formulas in the same row/column.
A tool will never make it possible for dumb people to solve hard problems easily. It's like trying to design a knife that makes it impossible to cut yourself. Nobody with any kind of a clue would want that knife.
A tool should be straightforward and intuitive, but it shouldn't aim to be smarter than it's user.
Making Excel "smart" is not exactly what he's talking about. He's playing with ideas. It's high level, but you can do high level sometimes. You need to.
Your analysis is too low-level and nitpicky for this high level idea playground.
Excel is still a dumb tool. He's absolutely right.
We have ways of making smart tools. It's called "programming" and "database engineering." That part's a little too hard right now, and part of why it's hard is because if you try to make it easy to mould a flexible program to try to define what your data means and how it should work, it becomes so generic as to be as unusable and amorphic as a cloud of smoke.
I've tried this, moving from a space where Excel was predominantly used, and trying to capture the process into an application. We tried to keep all the customizability and malleability as Excel: but I'm now convinced that was a huge mistake. It led us into genericland.
It is possible to build applications that work for these processes, using the tools of programming and good data design. But that's why we have thousands of different applications all trying to solve these different problems, all of which you could probably represent in Excel in some way. The fact that all those applications exist means that people want something smarter than Excel. It proves the case.
But we haven't yet bridged the gap between Excel's extreme flexibility and an Application's intelligence and process fit. That's because it's really difficult to make that work; to fit all the pieces together into something that makes sense in both realms. It's really hard!
The solution will be a user interface masterpiece. I'm convinced of that. And it will be layered, like an onion, allowing people to build any application they need just by telling the computer about their data and how it fits together and what it should allow them to do. Someday we'll have a system—some sort of super-Rails—that makes this so easy that anyone can do it. Someday Programming will be an ancient art, something that only your Grandfather did, like crafting your own tools or woodworking or making jams and jellies.
Someday that will be true. But the way we get there is by thinking about the difference between Dumb Flexible Excel and Smart Rigid Applications and how we bridge that gap, because it's difficult and it's possible. We have to think at this high level, way up in the clouds—and more people should, and there's no reason to shoot them down.
You can have adaptation or adaptiveness, but not both. Either a system is well-fitted to a purpose and needs professional intervention to change, or it is permissive and undemanding and can be turned to any purpose and do it poorly.
Life is full of tradeoffs. Sometimes the general, permissive loosey-goosey system is what you want. Sometimes it isn't. It takes judgement to decide which is which and when to switch.
Every time I point out that there are irreducible tradeoffs in life, somebody tells me in a hand-wavy way that some genius is all that is required to overcome it.
Einstein can't travel faster than light, Maxwell can't reverse entropy and I can't change that different people are good at different things.
Optimism is the grease of evolution and economics. Without lots of people exploring fruitless plains of minima we'd never find unexpected maxima. But almost all who try will fail and those who work with the known good maxima will probably succeed.
Convergence vs coverage -- another irreducible tradeoff.
Hey man, I'm a zen buddhist, I know about the balance and the tradeoffs.
I'm also a realistic optimist, and you're right, it's necessary but often fruitless. But the way I see it, we're headed in that direction pretty surely. Ten years ago we couldn't imagine software doing anything different from Excel, and Excel (or, er, Lotus 123) was the most powerful way to solve any problem, and they were indeed amazing. But they have this problem of being unstructured. Now we have Rails and other frameworks that make it easier to make custom logic to solve problems in specific controlled ways. They're making it easy to let programmers specify how a tool works. We're in the woodworking and craftsman stage of software: we need people who can build the tools and painstakingly design every detail.
We may always need that, and we may always value it. But I can envision a future where there's some in-between: some way to let the computer be extremely smart about the problems we're trying to solve with these tools. I think frameworks like Rails are one giant leap away from being usable not by programmers, but by people just illustrating rules and relationships.
Would it be too generic? Would the tradeoffs be too great? Maybe. But I am optimistic these are problems we can solve and not great unsurmountable rules of the universe.
I know because I did it once. I made something that worked for everyone and allowed you to define complex multi-dimensional relationships between data. It would cover almost any business need, and in a stricter smarter way than excel. There were details that we missed and my business partner was an asshole; those were the problems. The software and the idea were sound and, in fact, incredible. I think the trivial problems are solvable. So, forgive me my optimism, but it's based on experience.
There is low hanging fruit in the problems to solve though. For example, finding and highlighting outliers in a column. It's a relatively simple feature but yields big benefits to users.
Except it isn't that simple. Your definition of an outlier may be completely different from some one else's (95th percentile, median/regression method, or simple max/min, some of these are already possible in excel). Then, how is the information displayed? Highlighting cells? Well if you have +10k rows of data, than that won't be too useful. Creating a summary in another column/sheet will add to the already cluttered 'results' spreadsheet, etc...
This is why I recommend using programming based tools for data analysis. Any non-programming tool has to find a balance between the number of features offered and the complexity/ease-of-use of the tool. With programming tools, you merely have to find the right package (or build your own) which essentially results in getting the exact set of features that you need to solve your problem.
I know people bag on VBA, but I'm a big advocate of effectively utilizing Excels tools along side using VBA to customize where necessary. This solves the problem in your first comment (users can define their own statistical limits and methods), while keeping the simplicity of excel and its tools for other users.
The problem is that not everyone is a programmer. A "programming" tool might be great for you, but when you show it to co-workers for them to work with, they'll inevitably ask "Can I get this in Excel?". Excel+VBA allows for customization when the "complexity/ease-of-use" balance is out of sync with your needs, but to everyone else it's still just Excel.
With this, those that can program have the option to solve the problem their way, while allowing those that don't program the means to solve the problem the way their used to.
I have used VBA here and there, and try not to bag on it too much. There are some cases where using it can make sense, but in general, it seems to be a compromise.
I get a little confused about the programmer/non-programmer dichotomy. If you are capable of implementing a complex model in excel, you are probably more than capable of learning a programming language. If you are just tallying up a few numbers to throw into a report or presentation, then yeah, no need to switch. As I mentioned in another post, it probably has to do with exposure and motivation.
Excel is a tool, and it's a really good tool for what it does, which is for applications that require:
- very fast iteration cycles
- a particular data model: grid of numbers which is very common in business world
- support for everything under the sun: persistence (just hit save :), dialog UI, math formula, stats models, string functions, date/time functions (better than even Java's Joda), internationalization, localization, utf 8, and plenty more
Excel already does this to a degree. If a cell has a formula that is inconsistent with its neighbors, it will be flagged. If a formula omits cells adjacent to the range it is working with, it will be flagged.
Check out Options: Formulas: Error checking rules. It's almost as if Microsoft has been continually developing this software for years and years and they have seen many common mistakes. ;)
I work with an army of very smart people who need to build incredibly complex models to drive our business. Like most companies, we turn to Excel. It's the only widely used application with a low enough barrier to entry that "non-technical" people can build legitimately complex (if not always robust) models.
But the problems with Excel and general spreadsheet modeling limitations offer such a great opportunity for improvement that would impact just about every business out there. I wish I saw more people tackling this, or I had a great idea to do it myself. The problems I see are around barrier to entry, in that it must be usable in just a couple minutes, simple enough for non-programmers to learn, and equally or more efficient to get a basic model functioning. Otherwise I just don't see wide adoption despite possible maintenance, accuracy, and reliability benefits.
Interesting and agreed. What if excel was very basic in functionality, and then as soon as you get into complex modeling or advanced functions you pull from a user generated (either private or public) template library. The trick would be in making the implementation of templating simple and intuitive.
The goal behind templating would be to prevent complex formulas from being entered in tiny cells, and to leverage a userbase that has "been there, done that." Do you think that would shorten new hire ramp up in using existing complex models etc?
I think Data Nitro is really cool, and as a data geek the idea of having a real language available inside the Excel environment gets me very excited.
Unfortunately, by increasing the power and capability of what a careless user can do, while expanding the learning curve and moving Python code to a different place than any cell-level functions or global VBA functions are stored... all of this just makes the potential for issues of human failure far greater.
I use those languages frequently, when they are the right tool for the job. Well, R and Stata, at least — but also Processing and D3 and Google Maps...
Not sure how this is relevant to my concern, though.
I worked in a similar environment and came to similar conclusions. In my view, your three requirements (usable without a big learning curve, intuitive to non-programmers, equally or more efficient than Excel) point directly back to the spreadsheet UI. Spreadsheet users get spreadsheets in a way they don't get anything else. It's their mental model and they love it and there's no way they're going to give it up. (In particular, there's no way they're going to migrate to even the simplest of general-purpose scripting languages.) Therefore the only thing that will get them off Excel is a new kind of spreadsheet. The question is, what would that look like? If it's too different from Excel, it won't meet your three requirements. But if it's too like Excel, there won't be any reason to use it, nor will it address Excel's limitations which are the subject of the OP. I've been thinking about and working on this ever since—it's a deeper problem than it appears to be!
Edit: by the way, you're right (in my opinion) to emphasize barrier-to-entry as a great danger in this area. If you look at the history of attempts to innovate in the spreadsheet space (most famously Lotus Improv), that is the reef on which they tend to run aground.
I don't think scripting languages, of themselves, are the real barrier, after all, excel formulas are not too far off from what one would use in a language like R. I think the barrier is the data import/manipulation phase. Scripting languages loose out to the ease of copy/paste and data inspection that spreadsheets offer. It's not that using scripting languages for these tasks is very difficult, it's just that spreadsheets are, comparatively, much easier to use.
I worked in a place that used both, and the programmers were successful in preaching the benefits and converting some of the excel users, where it made sense. However, I imagine a large majority of analysts using excel in the corporate world do not have that exposure and don't realize there are other alternatives out there, or if they do know about the alternatives, they get quickly turned off from them when they try to import/manipulate the data. I don't think any tool, by itself, is going to push individuals to change. People need to be exposed to the benefits of approaching data analysis as a small software/programming project over the long term. It is short term thinking that pushes people to use excel in the first place.
I respectfully but strongly disagree. There is a chasm between Excel formulas and general-purpose programming languages. Very many (even very sophisticated) spreadsheet users have no interest in crossing it. These are the users I was talking about. They're not programmers-in-waiting. They think in spreadsheets.
The difference between the two paradigms is much deeper than data import/export, and exposing most spreadsheet users to scripting languages (even with good import/export) won't convert them, it will just turn them off. I know this from hard experience, having built a domain-specific scripting language on a previous project that was trying to make software for petroleum economic modelers (i.e., sophisticated but non-programmer business experts). Every feature request we got boiled down to, "make it be like Excel, complete with an interactive grid UI".
It isn't short-term thinking that drives this kind of user to build complex models in Excel. It's that Excel fits how their minds work. Partly that's because spreadsheets are intuitive for many people, and partly it's because spreadsheets been mass-adopted for so long that they're utterly entrenched. Trying to get spreadsheet users to think more like programmers is a sure way to keep them stuck on Excel forever.
Here's an example of how different the thinking is. In a programming language one first builds a computation abstractly (code), then instantiates it on concrete data (input) and looks at the results (output). Spreadsheet users don't do that. They start with a concrete example, build the formulas they need to get that particular example working, and then (if they need a similar example) copy what they did and tweak it. In other words, a spreadsheet user is never looking at a pure abstraction; they are always looking at a concrete example, even when the computations they're building up are complex. If you showed them all the formulas at once, they would recoil. I think this low tolerance for abstraction is one reason why spreadsheets fit the minds of non-programmers better, and also why it is so difficult for those of us who are comfortable with abstraction (and thus with programming languages) to appreciate how different spreadsheets are.
I agree about your point on abstraction, but I also think there are different methods to deal with it, and that those methods often get overlooked.
My experience with SAS and R suggests it is possible to somewhat re-create the typical excel workflow. The SAS IDE allows you to open the data sets created in your code, which means you can write code one line (well, more accurately, one data step or procedure, as it is called in SAS) at a time and inspect the intermediate results at each point. Similarly, RStudio allows you to inspect the values of data frames and variables, once again allowing the user to step through the code writing process and get relatively immediate feedback.
Regardless, there are going to be a large portion of excel users that won't or can't switch, and there are also a number of use cases where excel makes perfect sense (smaller data, simpler summaries), but I have personally 'converted' a few analysts, and think that, with more exposure, a not insignificant portion of excel users could be convinced to make the leap.
disagree completely. MS Excel is a masterpiece in software engineering, definitely the best piece of software MS ever created. Large pieces of the economy run on Excel, finance, controlling, etc. are unthinkable without it.
the flexibility, ease of use, speed is astounding. applying colors to cells and then being able to sort by color still blows my mind - not the feature per se, but the fact that they included it and solved a common problem (how many times did a client mark something by color in a big spreadsheet...).
excel is mobile, works offline. replacing it with some webapp or too much logic just shows the complete misunderstanding of its appeal.
Excel is a great tool, but it has its shortcomings. The article is highlighting the main one: it's easy to build huge, brittle systems with Excel.
This immediately brings up questions: What tools exist for building robust spreadsheets? How can we encourage people to use them? Should we make better ones?
This. It's important to remember that from the perspective of millions of corporate users (not the hacker crowd), this is is THE tool. When you think about it, Microsoft really is Excel. That is their core. Why is it difficult for corporations to switch from Windows? Office. Why Office? Excel. If anyone attempted to create something that not only matched Excel (with all the messy file compatibility issues and macros), but surpassed it, they would be striking at the heart of Microsoft.
Word isn´t the problem... Converting from Word to something else is some pain for users. Converting from Excel (especially Excel Macros) to something else is in the Millions of $ for developers/re-engineering of those Macros...
This is a problem I've spent a lot of time pondering. The author makes several suggestions; one in particular (a GitHub for datasets) is an idea I felt strongly enough about that I founded a start-up called BuzzData to address this concern. Unfortunately, we did not find our critical mass and the product has since successfully pivoted into an entirely different product called LookBookHQ.
I do have another idea, which is to bring the concept of testing to the Excel world.
I think that the combination of:
- a framework for building up test cases that can be run against the data as part of the saving process (think rSpec)
- an English-like, indented grammar for defining user stories that can potentially generate code for tests (think Cucumber)
- a strong push for a culture of testing in the Excel community that would have to include Microsoft
I dunno, honestly. Could be crazy. The culture might never accept testing.
But then again, one of the basic ways to sell something is to give people insurance against humiliating themselves. It's possible that this thinking would make a lot of sense to the twice shy, highly risk-averse board members of the major financial institutions and governments of the world.
Ever read "Specification By Example? That approach along with some of your suggestions could be used to prove your testable specification spreadsheet concept.
The fundamental problem with big spreadsheets is that they're complex pieces of software, and they're being designed by people who haven't studied software engineering.
Building a complex program is hard. It's untenable if you don't approach it properly. This is true whether you're using Excel, Python, LISP, or anything else. Better tools can go a long way to help, but the ultimate answer is educating people on how to properly engineer systems.
On that last point - at DataNitro[1], we let people script Excel with Python, which is a better tool than VBA or than doing things by hand. The result is that for a given level of complexity, you see fewer errors and more robust spreadsheets; but when you give people more power, they naturally build more sophisticated spreadsheets. All else being equal, this might result in more, not fewer errors.
The great thing about introducing a new tool, though, is that there's also a chance to introduce new ways of working. Suddenly, people start building spreadsheets that work with version control and run unit tests. This helps tremendously with robustness and reliability.
At a previous job, I tried to convince them (and succeeded in some cases) to build a domain-specific web app instead of sharing and e-mailing ever-changing Excel docs.
I used CakePHP, and it was really easy to whip up a complete CRUD web-app with domain-specific information and DB tables, allow people to collaborate through a web interface, and be happy users.
I dare say that the problem is less about Excel being dumb than that if you're interested in the underlying logic Excel hides all that from you the minute you click off a cell. There's black box programming and then there's just silliness.
As for data-entry, ideally, that's not something that humans should be doing for anything but the most trivial concern. If you have huge columns of financial data or test results or what have you, then having someone type it into a computer is silliness of the highest order.
I think it's impossible to replace Excel en masse, so lots of folks are starting to pick off bits of it to replace. And hopefully in doing so they'll invest in making those bits a lot smarter than they are in Excel.
Examples from the S12 YC batch alone:
. VBA isn't great, so DataNitro is letting you use Python.
. Grid is (mobiley) replacing the basic list-making use case of Excel.
. (Stretch example:) Financial modeling of big personal finance decisions is a nice use case of Excel, but too hard for most people, so SmartAsset bakes those kinds of models into an on-site calculator.
. Statwing is replacing pivot tables, which are clunky, devoid of statistics, do no error checking, etc.; in a few clicks Statwing will automatically choose, run, and interpret a statistical analysis, all the while looking for and alerting users to outliers or (some) other data issues. [I'm a cofounder. Also: https://www.statwing.com/demo]
I'd expect that soon someone will come along soon and pick off the financial modeling use case in a way that's a bit more generalized than SmartAsset. Really tough UI challenge, but I think doable.
Having worked in nyc financial world, I can say excel is here to stay - at least for foreseeable future. Heck, the only reason I have windows running in parallel in my MBA is excel. We use to have what we called "excel test" for all quants and analyst positions. We use to QA our excel sheets just like any other software application. I am surprised that many companies don't do that - it only makes sense.
You nailed it. I would use less generous language than you did to describe VBA's deficiencies in Mac Excel 2011. Huge parts of what could be considered the stdlib are totally missing in the Mac version. The language doesn't even have native dictionaries.
You can turn Excel spreadsheets into web applications quite easily with Google Apps Script. We wrote a REST interface to a massive multi tab Excel file recently and it only took like an hour, as opposed to days of reverse engineering the calculations (the client did not know what the equations in the file were because the employee who created it left).
Basically I'm saying it will get much, much worse before it gets better.
There are entire organisations and conferences about the dangers of spreadsheets[1] and there are now programming standards for develop spreadsheets[2].
Basically, the normal rules apply. If you want safe spreadsheets, you have to plan and verify.
The reason this doesn't happen is because firstly, spreadsheets are very accessible. Every working large system evolves from a working small system, as John Gall observed. Excel et al make that evolution relatively easy.
The second is that spreadsheets enable non-programmers to work with concrete concepts instead of the abstract ones that programming languages encourage. It's easy to visualise locations on a sheet when you are looking at the sheet.
Finally, spreadsheets support calculation by propagation. They are an excellent declarative environment. Most programming languages do not support this model of programming (though we're groping towards it with concepts of binding).
We need to think beyond the limitations of tools no matter how pervasive their use.
It reminded me of : http://vimeo.com/66085662
(you can skip to the demo)
I got theory. There is a limit what you can do with WYSIWYG editors. Once you reach the limit, the software starts getting in your way, hiding your errors and making you work harder.
It applies for many software. From HTML to 3D modeling.
Excel already has a lot of this crap. It already requires an act of god to prevent excel from changing your string of digits into a number or date. It already shows you little exclamation point icons when your formulas omit adjacent rows or are different from other formulas in the same row/column.
A tool will never make it possible for dumb people to solve hard problems easily. It's like trying to design a knife that makes it impossible to cut yourself. Nobody with any kind of a clue would want that knife.
A tool should be straightforward and intuitive, but it shouldn't aim to be smarter than it's user.