By Jason Whitish


2019-02-08 16:15:06 8 Comments

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process.

For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are usually one to one to the invoice, so there would only ever be one control number per invoice. There are also some invoices that will just not have a control number because they're older invoices being imported in from a system that didn't keep track of control numbers (they'll represent about 25% of the initial data). Even those older invoices may eventually get a control number (although most likely they're old enough that won't happen).

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).

This is part of an OLTP database.

Probably 75% of the rows will have a control number eventually.

1 comments

@Solomon Rutzky 2019-02-11 16:17:54

I am a fan of separating items that truly have separate purposes (given that this is an OLTP system, as stated by the O.P.).

Please see my answer to the following DBA.SE question that covers this in more detail, and has links to more of my answers on this same topic showing various examples of implementing this:

Abstract classes in SQL Server. Are they even possible?

The general reasoning is this:

  1. Performance is generally better when normalizing to separate tables. After all, we are working in a "Relational" Database Management System, right? Yes, there will be JOINs require by some consumers of this data, but less data per row = more rows per data page = more data loaded into memory per each read = more efficient I/O and memory usage.
  2. Maintainability is better when normalizing out. It is easier to explain to new members of the team that data comes from related tables (explicitly denoted by the PK -> FK relationship(s)) than it is for them to remember all of the whacky rules for which columns pertain to which teams / consumers.
  3. It is not difficult to adjust the data model if / when business rules change. Yes, things always change. However, it is fairly easy to move a column from a related table back into the main table (simplified here):
    1. ALTER main table to ADD new column
    2. UPDATE main table with values from related table
    3. Adjust / create indexes, if new column should be indexed
    4. ALTER procs to reference new location
    5. ALTER related table to DROP column (might need to drop indexes that use this column beforehand)

Related Questions

Sponsored Content

2 Answered Questions

3 Answered Questions

[SOLVED] Distribute values equally in count and sum to a set of users

  • 2017-01-04 13:50:19
  • SPM
  • 2235 View
  • 3 Score
  • 3 Answer
  • Tags:   sql-server

0 Answered Questions

3 Answered Questions

[SOLVED] MySql Invoice and Subscription Schema

1 Answered Questions

1 Answered Questions

[SOLVED] Generating Invoices and Tracking

1 Answered Questions

1 Answered Questions

[SOLVED] Multiple Database Design or NOT?

  • 2013-07-02 11:40:31
  • Jacques
  • 574 View
  • 2 Score
  • 1 Answer
  • Tags:   database-design

1 Answered Questions

[SOLVED] Splitting a table based on business logic

Sponsored Content