Last week we started talking about scalability in terms of the API and that got me thinking about what we’d do if and when we need to process orders of magnitude more data than we do currently.
Currently, the largest single table of data any one customer has comprises just over thirty million records. That may be a lot of business data in some contexts but in database terms it’s hardly ‘big data’. What about if we needed to store and work with billions of records?
A natural direction to look in would be to www.citusdata.com. As you may know, agileBase uses the open source PostgreSQL database. Citus transforms PostgreSQL into a distributed database. And it’s open source too. Their pitch is ‘Never worry about scaling again’.
Citus has recently been bought by Microsoft and is now available as Hyperscale (Citus), a built-in deployment option for Azure Database for PostgreSQL. It can also be used on AWS or your own hosting. So there are many options to allow scaling to many nodes to handle billions or hundreds of billions of rows, while maintaining high performance.
That’s for the future, but if anyone has an inkling of any projects they’d like to put forward which deal in high volumes of data a.k.a. ‘big data’ in the IT world, then drop us a line. We’d be keen to be involved.