The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.
Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process.
For example, we have three columns that reflect control numbers (
varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are usually one to one to the invoice, so there would only ever be one control number per invoice. There are also some invoices that will just not have a control number because they're older invoices being imported in from a system that didn't keep track of control numbers (they'll represent about 25% of the initial data). Even those older invoices may eventually get a control number (although most likely they're old enough that won't happen).
Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.
Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).
This is part of an OLTP database.
Probably 75% of the rows will have a control number eventually.