By Nawaz Sohail


2015-09-08 14:50:19 8 Comments

Problem:

Is there any way of calculating number of bytes occupied by the table creation, i know that you can get some information from information_schema.tables but that information is not accurate enough.

What actually required is the number of bytes according to the definition of the table for innodb only and collation could also be considered as utf-8-general-ci

For example a table test is as following

create table test
(
col1 varchar(25),
col2 int,
col3 varchar(3),
col4 char(15),
col5 datetime
);

Now would require to know the total row size that can be accumulated in one row according to the types of columns in the table.

Found a some sort of similar solution in MSSQL but need its MySQL version

Script to estimate row sizes for any table

Any help is much appreciated.

5 comments

@srcritical 2019-11-02 10:53:36

I've made a rough bash script to calculate row size and warn if it passes the limit based on schema:

#!/bin/bash
​
#
# usage: mysqldump --no-data | check_row_size.sh
#
​
#
#
# https://dev.mysql.com/doc/refman/8.0/en/column-count-limit.html#row-size-limits
#
# The maximum row size for an InnoDB table, which applies to data stored locally within a database page, is slightly less than half a page for 4KB, 8KB, 16KB, and 32KB innodb_page_size settings.
# For example, the maximum row size is slightly less than 8KB for the default 16KB InnoDB page size.
#
#
# MariaDB [(none)]> show variables like 'innodb_page_size';
#+------------------+-------+
#| Variable_name    | Value |
#+------------------+-------+
#| innodb_page_size | 16384 |
#+------------------+-------+
#1 row in set (0.00 sec)
#
#
# Options:
# 1. Change default innodb_page_size to 32k
# 2. Change storage engine to DYNAMIC for tables
# 3. ?
#
​
#===========================================================================================
# Functions
#===========================================================================================
RETVAL=0
​
calc_row_size() {
    local -n TABLE_FIELDS=$1
    local -n TABLE_CHARSET=$2
    local FIELD_TYPE=""
    local FIELD_SIZE=""
    local FIELD=""
    local ROW_SIZE=0
    local IFS=$'|' # To split the vars using set
    for FIELD in "${TABLE_FIELDS[@]}"  
    do  
        set $FIELD
        FIELD_NAME=$1
        FIELD_TYPE=$2
        FIELD_SIZE=$3        
        calc_field_size_in_bytes $FIELD_TYPE $FIELD_SIZE $TABLE_CHARSET
        ROW_SIZE=$((ROW_SIZE + RETVAL))
        [ $DEBUG -gt 0 ] && echo "DEBUG1: Field name: $FIELD_NAME type: $FIELD_TYPE lenght: $FIELD_SIZE size: $RETVAL bytes Row size: $ROW_SIZE"
    done  
    RETVAL=$ROW_SIZE
}
​
calc_field_size_in_bytes() {
    local TYPE=$1
    local SIZE=$2
    local CHARSET=$3

    case $FIELD_TYPE in
        varchar)
            # https://adayinthelifeof.nl/2010/12/04/about-using-utf-8-fields-in-mysql/
            # Max 3 bytes per utf-8 chat in mysql
            case $CHARSET in
                utf8)
                    RETVAL=$((SIZE * 3))  # 3 bytes per character for utf8 
                ;;
                latin1)
                    RETVAL=$((SIZE))  # 1 byte per character for latin1
                ;;
                *)
                    echo "Unknown charset ($CHARSET), please fix the script"
                    exit 1
                ;;
            esac
        ;;
        smallint|int|bigint|tinyint|varbinary)
            RETVAL=$SIZE
        ;;
        blob)
            # https://dev.mysql.com/doc/refman/8.0/en/column-count-limit.html#row-size-limits
            # BLOB and TEXT columns only contribute 9 to 12 bytes toward the row size limit because their contents are stored separately from the rest of the row.
            RETVAL=9
        ;;
        text)
            RETVAL=12
        ;;
        timestamp)
            RETVAL=4 
        ;; 
        decimal)
            # https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-numeric
            # Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. 
            if [[ $SIZE =~ ([0-9]+),([0-9]+) ]] 
            then
              INTEGER_PART=${BASH_REMATCH[1]}
              FRACTIONAL_PART=${BASH_REMATCH[2]}
​
              INTEGER_BYTES=$((INTEGER_PART / 9 * 4))
              REMAINDER=$((INTEGER_PART % 9))
              case $REMAINDER in
                  0) INTEGER_BYTES=$((INTEGER_BYTES + 0)); ;;
                  1) INTEGER_BYTES=$((INTEGER_BYTES + 1)); ;;
                  2) INTEGER_BYTES=$((INTEGER_BYTES + 1)); ;;
                  3) INTEGER_BYTES=$((INTEGER_BYTES + 2)); ;;
                  4) INTEGER_BYTES=$((INTEGER_BYTES + 2)); ;;
                  5) INTEGER_BYTES=$((INTEGER_BYTES + 3)); ;;
                  6) INTEGER_BYTES=$((INTEGER_BYTES + 3)); ;;
                  7) INTEGER_BYTES=$((INTEGER_BYTES + 4)); ;;
                  8) INTEGER_BYTES=$((INTEGER_BYTES + 4)); ;;
              esac
​
              FRACTIONAL_BYTES=$((FRACTIONAL_PART / 9 * 4))
              REMAINDER=$((FRACTIONAL_PART % 9))
              case $REMAINDER in
                  0) FRACTIONAL_BYTES=$((FRACTIONAL_BYTES + 0)); ;;
                  1) FRACTIONAL_BYTES=$((FRACTIONAL_BYTES + 1)); ;;
                  2) FRACTIONAL_BYTES=$((FRACTIONAL_BYTES + 1)); ;;
                  3) FRACTIONAL_BYTES=$((FRACTIONAL_BYTES + 2)); ;;
                  4) FRACTIONAL_BYTES=$((FRACTIONAL_BYTES + 2)); ;;
                  5) FRACTIONAL_BYTES=$((FRACTIONAL_BYTES + 3)); ;;
                  6) FRACTIONAL_BYTES=$((FRACTIONAL_BYTES + 3)); ;;
                  7) FRACTIONAL_BYTES=$((FRACTIONAL_BYTES + 4)); ;;
                  8) FRACTIONAL_BYTES=$((FRACTIONAL_BYTES + 4)); ;;
              esac
              [ $DEBUG -gt 0 ] && echo "DEBUG1: Calulation of decimal: SIZE: $SIZE INTEGER_PART:$INTEGER_PART FRACTIONAL_PART:$FRACTIONAL_PART TOTAL = INTEGER_BYTES($INTEGER_BYTES) + FRACTIONAL_BYTES($FRACTIONAL_BYTES)"
              RETVAL=$((INTEGER_BYTES + FRACTIONAL_BYTES)) 
            else
                echo "Seems like SIZE ($SIZE) for a decimal field doesn't match pattern ([0-9]+),([0-9]+). Please investigate"
                exit 1
            fi
        ;;
        *)
            echo "Found a field type that is not handled: $TYPE. Please fix before proceeding."
            exit 1
        ;;
    esac
}
​
​
#===========================================================================================
# INIT
#===========================================================================================
INSIDE_CREATE_TABLE_STATEMENT=false # True if we are within a create table statement
TABLE_NAME=''  # Current table name
ROW_SIZE=0 # Current row size being calculated
DEBUG=0
VERBOSE=0
MAX_SIZE=8126 # Default
declare -a FIELDS # List of fields from the current CREATE TABLE statement
​
#===========================================================================================
# Parameters
#===========================================================================================
OPTIND=1         # Reset in case getopts has been used previously in the shell.
​
while getopts "hvdt:" opt; do
    case "$opt" in
    h)
        echo "Usage: mysqldump --no-data | ./check_row_size [-v|-d] [-t threshold]"
        exit 0
        ;;
    v) VERBOSE=1
        ;;
    d) DEBUG=2
        ;;
    t) MAX_SIZE=$OPTARG
        ;;
    esac
done
​
​
#===========================================================================================
# MAIN Loop - parses schema then calc row_size based on charset
#===========================================================================================
while IFS= read -r LINE
do
    [ $DEBUG -gt 1 ] && echo "DEBUG2: Read: $LINE"
    # Are we within a CREATE TABLE statement?
    if [ $INSIDE_CREATE_TABLE_STATEMENT == "false" ]
    then
        # Nope, is the current line a 'CREATE TABLE' statement?
        if [[ $LINE =~ ^"CREATE TABLE \`"([^\`]+) ]] 
        then
            [ $DEBUG -gt 0 ] && echo "CREATE TABLE FOUND!: $TABLE_NAME"
            TABLE_NAME=${BASH_REMATCH[1]} # What has been caught between pattern parenthesis
            INSIDE_CREATE_TABLE_STATEMENT='true'
            FIELDS=()
        fi
        continue # Ok, next line 
    fi
    # Is this a create table field definition line?
    if [[ $LINE =~ ^' '+'`'([^'`']+)'` '([a-z]+)'('([^')']+) ]]
    then
        FIELD_NAME=${BASH_REMATCH[1]}
        FIELD_TYPE=${BASH_REMATCH[2]}
        FIELD_SIZE=${BASH_REMATCH[3]}
        FIELDS+=( "$FIELD_NAME|$FIELD_TYPE|$FIELD_SIZE" )
        continue
    fi
    # Have we reached the end of the CREATE TABLE statement?
    if [[ $LINE =~ ^") ENGINE=InnoDB DEFAULT CHARSET="([^ ]+) ]] 
    then
        CHARSET=${BASH_REMATCH[1]}
        [ $DEBUG -gt 0 ] && echo "End of CREATE TABLE statement"
        calc_row_size FIELDS CHARSET
        ROW_SIZE=$RETVAL
        if [ $ROW_SIZE -gt $MAX_SIZE ]
        then
            echo "Table: $TABLE_NAME has a row size: $ROW_SIZE Bytes > $MAX_SIZE Bytes Charset: $CHARSET"
            # and is going to cause problem if the we upgrade to tables in ROW_FORMAT compact. See https://mariadb.com/kb/en/library/troubleshooting-row-size-too-large-errors-with-innodb/ for more details."
        fi
        INSIDE_CREATE_TABLE_STATEMENT='false'
    fi
done 

@Nawaz Sohail 2015-09-12 21:56:38

After much of thinking and researching found one answer which really helped in achieving what was required. It is a perl script and reference link is

http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html

#!/usr/bin/perl
use strict;
$| = 1;

my %DataType = (
"TINYINT"=>1, "SMALLINT"=>2, "MEDIUMINT"=>3, "INT"=>4, "INTEGER"=>4, "BIGINT"=>8,
"FLOAT"=>'$M<=24?4:8', "DOUBLE"=>8,
"DECIMAL"=>'int(($M-$D)/9)*4+int(((($M-$D)%9)+1)/2)+int($D/9)*4+int((($D%9)+1)/2)',
"NUMERIC"=>'int(($M-$D)/9)*4+int(((($M-$D)%9)+1)/2)+int($D/9)*4+int((($D%9)+1)/2)',
"BIT"=>'($M+7)>>3',
"DATE"=>3, "TIME"=>3, "DATETIME"=>8, "TIMESTAMP"=>4, "YEAR"=>1,
"BINARY"=>'$M',"CHAR"=>'$M*$CL',
"VARBINARY"=>'$M+($M>255?2:1)', "VARCHAR"=>'$M*$CL+($M>255?2:1)',
"ENUM"=>'$M>255?2:1', "SET"=>'($M+7)>>3',
"TINYBLOB"=>9, "TINYTEXT"=>9,
"BLOB"=>10, "TEXT"=>10,
"MEDIUMBLOB"=>11, "MEDIUMTEXT"=>11,
"LONGBLOB"=>12, "LONGTEXT"=>12
);

my %DataTypeMin = (
"VARBINARY"=>'($M>255?2:1)', "VARCHAR"=>'($M>255?2:1)'
);

my ($D, $M, $S, $C, $L, $dt, $dp ,$bc, $CL);
my $fieldCount = 0;
my $byteCount = 0;
my $byteCountMin = 0;
my @fields = ();
my $fieldName;
my $tableName;
my $defaultDbCL = 1;
my $defaultTableCL = 1;
my %charsetMaxLen;
my %collationMaxLen;

open (CHARSETS, "mysql -B --skip-column-names information_schema -e 'select CHARACTER_SET_NAME,MAXLEN from CHARACTER_SETS;' |");
%charsetMaxLen = map ( ( /^(\w+)/ => /(\d+)$/ ), <CHARSETS>);
close CHARSETS;

open (COLLATIONS, "mysql -B --skip-column-names information_schema -e 'select COLLATION_NAME,MAXLEN from CHARACTER_SETS INNER JOIN COLLATIONS USING(CHARACTER_SET_NAME);' |");
%collationMaxLen = map ( ( /^(\w+)/ => /(\d+)$/ ), <COLLATIONS>);
close COLLATIONS;

open (TABLEINFO, "mysqldump -d --compact ".join(" ",@ARGV)." |");

while (<TABLEINFO>) {
chomp;
if ( ($S,$C) = /create database.*?`([^`]+)`.*default\scharacter\sset\s+(\w+)/i ) {
$defaultDbCL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : 1;
print "Database: $S".($C?" DEFAULT":"").($C?" CHARSET $C":"")." (bytes per char: $defaultDbCL)\n\n";
next;
}
if ( /^create table\s+`([^`]+)`.*/i ) {
$tableName = $1;
@fields = ();
next;
}
if ( $tableName && (($C,$L) = /^\)(?:.*?default\scharset=(\w+))?(?:.*?collate=(\w+))?/i) ) {
$defaultTableCL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : (exists $collationMaxLen{$L} ? $collationMaxLen{$L} : $defaultDbCL);
print "Table: $tableName".($C||$L?" DEFAULT":"").($C?" CHARSET $C":"").($L?" COLLATION $L":"")." (bytes per char: $defaultTableCL)\n";
$tableName = "";
$fieldCount = 0;
$byteCount = 0;
$byteCountMin = 0;
while ($_ = shift @fields) {
if ( ($fieldName,$dt,$dp,$M,$D,$S,$C,$L) = /\s\s`([^`]+)`\s+([a-z]+)(\((\d+)(?:,(\d+))?\)|\((.*)\))?(?:.*?character\sset\s+(\w+))?(?:.*?collate\s+(\w+))?/i ) {
$dt = uc $dt;
if (exists $DataType{$dt}) {
if (length $S) {
$M = ($S =~ s/(\'.*?\'(?!\')(?=,|$))/$1/g);
$dp = "($M : $S)"
}
$D = 0 if !$D;
$CL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : (exists $collationMaxLen{$L} ? $collationMaxLen{$L} : $defaultTableCL);
$bc = eval($DataType{$dt});
$byteCount += $bc;
$byteCountMin += exists $DataTypeMin{$dt} ? $DataTypeMin{$dt} : $bc;
} else {
$bc = "??";
}
$fieldName.="\t" if length($fieldName) < 8;
print "bytes:\t".$bc."\t$fieldName\t$dt$dp".($C?" $C":"").($L?" COLL $L":"")."\n";
++$fieldCount;
}
}
print "total:\t$byteCount".($byteCountMin!=$byteCount?"\tleast: $byteCountMin":"\t\t")."\tcolumns: $fieldCount\n\n";
next;
}
push @fields, $_;
}
close TABLEINFO;

Thanks all for great help.

@srcritical 2019-11-01 03:12:20

I'm not getting any output when running that script. What am I missing?

@David Spillett 2015-09-08 16:35:40

There are a few questions of this type already, for instance this one: How to estimate/predict data size and index size of a table in MySQL

One difference between that question and your table is the presence of variable length strings in your's - remember to account for the maximum size that they can be.

Also remember that from version 5 onwards that varchar(25) is up to 25 characters not up to 25 bytes so if you are likely to see non-ASCII characters in your strings the column size could balloon up to a maximum of 100 bytes because some characters take four bytes to represent - for instance "pile of poo emoji" (which, I jest ye not, does exist - if your current browser+font support it it looks like: 💩) is 0xF0 0x9F 0x92 0xA9. Prior to v5 mySQL counted bytes not characters when specifying string type lengths.

Edit Regarding Automation

In terms of automating the process, you should be able to derive all the information needed from the INFORMATION_SCHEMA tables in a similar manner to the script you have found for MS SQL Server. See https://dev.mysql.com/doc/refman/5.0/en/information-schema.html for some documentation covering that.

@Nawaz Sohail 2015-09-08 16:45:03

I saw the shared answer already before posting this question, the thing i am looking for does not need to point information_schema.tables as it might not be accurate rather a solution which might check up the table structure and give me row size accordingly.

@David Spillett 2015-09-09 00:26:57

You could no doubt create a mySQL version of the script you've already found. The INFORMATION_SCHEMA tables should include the information you need. See dev.mysql.com/doc/refman/5.0/en/information-schema.html for some documentation covering that.

@srcritical 2019-11-01 03:22:12

@DavidSpillett The op is asking about max row size, information schema only provides information aobut average row size.

@Rick James 2015-09-08 22:13:35

Step 1:

col1 varchar(25),  2 + avg_byte_len
col2 int,          4
col4 char(15),     1*15 or 3*15 or...
col5 datetime      Pre-5.6: 8; then 5

SELECT AVG(LENGTH(col1)) as avg_byte_len,
       AVG(CHAR_LENGTH(col1) as avg_num_chars FROM ...;

20 English characters: 2 + 1*20
20 Mid-eastern / Slavic characters: 2 + 2*20
20 Asian characters: 2 + 3*20
20 Emoji characters: 2 + 4*20 (and you need utf8mb4)

Step 2: Add those up.

Step 3: Multiply by somewhere between 2 and 3 to allow for InnoDB overhead. I have found that that factor usually works. (But not for tiny tables, and not necessarily well for partitioned tables.)

I see no reason for taking the maximum size of each column.

You can get closer than SHOW TABLE STATUS or the equivalent information_schema data:

Step 1: SELECT COUNT(*) -- us this in place of Rows

Step 2: Get Data_length + Index_length + Data_free

Step 3: Divide.

@Nawaz Sohail 2015-09-09 13:39:38

thanks for your great help but what if a table has more than 100 columns with variant data types then how would we get the row size estimations?

@Rick James 2016-06-29 16:02:28

SELECT AVG(LENGTH(varchar_col)) -- Note: LENGTH is already bytes; no need to multiply by 2/3/4. (CHAR_LENGTH gets the length in characters.)

@dr01 2015-09-08 15:42:17

You need to know the size in bytes of each field according to the data type (MySQL reference here), then sum up these values together.

@Nawaz Sohail 2015-09-08 16:30:44

can't it be done using some dynamic query to check which columns are there of which length and sum it up?..that's why i asked for it..if you can share it would be great help

Related Questions

Sponsored Content

3 Answered Questions

[SOLVED] Index max row size error

2 Answered Questions

2 Answered Questions

[SOLVED] InnoDB create table error: "Row size too large"

  • 2011-01-26 16:02:27
  • Evan
  • 33142 View
  • 11 Score
  • 2 Answer
  • Tags:   mysql innodb

1 Answered Questions

2 Answered Questions

[SOLVED] InnoDB row metadata size (minimum row size)

  • 2015-03-02 22:03:01
  • Christopher McGowan
  • 416 View
  • 1 Score
  • 2 Answer
  • Tags:   mysql innodb metadata

1 Answered Questions

[SOLVED] Row size error with MySQL

  • 2014-08-29 15:49:05
  • geoffliu
  • 14747 View
  • 11 Score
  • 1 Answer
  • Tags:   mysql innodb

1 Answered Questions

1 Answered Questions

[SOLVED] Determining max database and table size supported and present size

  • 2011-09-13 16:33:20
  • RPK
  • 1695 View
  • 4 Score
  • 1 Answer
  • Tags:   mysql mysql-5

Sponsored Content