We often get questions related to how to handle large datasets when building applications using our Developer SDK. The simple answer is that there are no hard-coded limits in the SDK that restrict how many items an application can load into a dataset and visualize. Of course, there are some practical limits when you display data in a visualization, including:
1. Intrinsic visualization capacity
Visualizations that utilize the pre-attentive processing capabilities of the human psycho-visual system, like Treemaps, can display in excess of tens of thousands of items simultaneously. Some time series visualizations can show thousands of time series values per data element on a single screen.
Other visualizations are not designed to be pre-attentive and can display a few thousand simultaneous values at most. For example, a tabular report can easily handle up to 50 rows with 50 attributes simultaneously in a single display. However, with more rows, the user needs to scroll and pan to see all the data, which makes it very difficult — almost impossible really — to compare values. With more than 50 attributes, a table becomes very confusing. In either case, a table with too much data a very ineffective way to visualize data.
The lesson is that the developer must take the capabilities and limitations of the people who have to view and use the visualizations into account. A well designed implementation will provide users with a fun, easy-to-use, easy-to-understand set of visualizations that allow them to very quickly understand their data, see patterns, compare values, and make well-informed decisions for their organization.
2. Hardware capacity
The computing capacity available to the software sets other practical limits. These include available memory, CPU speed, bus speeds, and so on. For example, this limits how much data can be kept in memory by the application and how many updates in the data can be processed per second.
3. System architecture and design
Applications must be supported by an architecture designed to support the task at hand. Many systems are designed to utilize servers that keep huge data sets in repositories (such as OLAP servers), with middleware responsible for staging relevant parts of the data — sometimes using clever caching strategies. This provides access to the client applications that can — on an “as necessary” basis — access the datasets in the repositories that is needed by the application at any given time.
It is generally not wise to send a complete copy of a very large database over the network to the client; it may not be possible given the hardware limitations of the client and it can tie up a huge proportion of your network and server resources unnecessarily. Query languages exist to provide practical, high speed access to large datasheets.
Having said that, however, some visualizations — like the interactive Treemaps included in our SDK — can handle and display very large amounts of data effectively. In the right circumstances, you can really push the limits of your hardware.
Markus Skyttner
CTO
Panopticon Software