Handling very large data with mysql

  • Can MySQL reasonably perform queries on billions of rows? — MySQL can ‘handle’ billions of rows. “Reasonably” depends on the queries; let’s see them.

  • Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? — 5.7 has some improvements, but 5.5 is pretty good, in spite of being nearly 6 8 years old, and on the verge of no longer being supported.

  • Best data store for billions of rows — If you mean ‘Engine’, then InnoDB.

  • How big can a MySQL database get before the performance starts to degrade — Again, that depends on the queries. I can show you a 1K row table that will meltdown; I have worked with billion-row tables that hum along.

  • Why MySQL could be slow with large tables? — range scans lead to I/O, which is the slow part.

  • Can Mysql handle tables which will hold about 300 million records? — again, yes. The limit is somewhere around a trillion rows.

  • (for InnoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in Percona blog — yes

  • having proper indexes on the table (using EXPLAIN on queries) — well, let’s see them. There are a lot of mistakes that can be made in this critical area.

  • partitioning the table — “Partitioning is not a panacea!” I harp on that in my blog

  • MySQL Sharding — Currently this is DIY

  • MySQL clustering — Currently the best answer is some Galera-based option (PXC, MariaDB 10, DIY w/Oracle). Oracle’s “Group Replication” is a viable contender.

  • Partitioning does not support FOREIGN KEY or “global” UNIQUE.

  • UUIDs, at the scale you are talking about, will not just slow down the system, but actually kill it. Type 1 UUIDs may be a workaround.

  • Insert and index-build speed — There are too many variations to give a single answer. Let’s see your tentative CREATE TABLE and how you intend to feed the data in.

  • Lots of joins — “Normalize, but don’t over-normalize.” In particular, do not normalize datetimes or floats or other “continuous” values.

  • Do build summary tables

  • 2,3 million transactions per day — If that is 2.3M inserts (30/sec), then there is not much of a performance problem. If more complex, then RAID, SSD, batching, etc, may be necessary.

  • deal with such volume of data — If most activity is with the “recent” rows, then the buffer_pool will nicely ‘cache’ the activity, thereby avoiding I/O. If the activity is “random”, then MySQL (or anyone else) will have I/O issues.

  • Shrinking the datatypes helps in a table like yours. I doubt if you need 4 bytes to specify fuel_type. There are multiple 1-byte approaches.

Leave a Comment