Dirty data
From Wikipedia, the free encyclopedia
Dirty data is a problem that plagues most real world databases and ERP systems. Whenever you have more than a handful of tables, it becomes almost inevitable that some of the rows in one table will be inconsistent with data in another table. (Think of a spreadsheet if the concept of table is unfamiliar).
Before SQL server became common, the reason for this dirty data was often related to bad network cards, switches, cabling, even hard drives or motherboards. The bits simply did not record correctly in the database. Foxpro users sometimes see this as a record with a whole field or all fields full of vertical bars. Another reason is still prevalent today: it is impossible to check all the possible data validation rules if you have more than a few tables. For example, if you can update a General Ledger account called "Accounts Receivable" balance using a journal entry as well as by posting batches of receipts and invoices, it is possible that the AR balance in the GL is not the same as the sum of all the open balances in the sub ledger called "Accounts Receiveable". That is definitely inconsistent if you have just one AR account in the GL. Sage Pro (also known as SBT PRO and ACCPAC PRO)(now replaced by Sage 300 ERP) has the ability to uncheck the "allow posting from GL".
Classic Example of Dirty Data
Some well known systems keep their inventory GL data in one table, the total system counts in another table, the location counts in another table, the bin, store, and serial number counts in a different table, and the FIFO costing tiers in yet another table. With MS SQL Server, all these tables might be part of the same database, but they each have separate numbers of records, different field names, etc. So, when doing something that might seem to be simple like receiving a purchase order into a particular warehouse, you have to update all the tables consistently. The ERP software takes care of this, but it is very difficult to write a program which will handle all the different scenarios and error conditions correctly. So, in this case if you have a system count for a particular SKU which is different from the total of the location counts, you have "dirty data" because that condition can't exist in the real world.
Why does Dirty Data matter?
Sometimes it does not matter. If you have a bad address on a vendor record that is no longer used, it is unlikely to make a difference. But sometimes, the data can have a material effect on the financial statements or reports, especially if the item is an expensive inventory item with bad counts, or missing invoice or check data. Payroll data is also something that must be kept as accurate as possible because of tax issues. If a company is publicly traded or if ownership is applying for a loan, bad data can lead to legal problems, with the counterparties alleging fraud.
What can you do about it ?
Most ERP systems offer little in the way of fixing data inconsistencies, since it is not a feature that is very marketable. (Why did the data get dirty in the first place, the potential customer will ask). In that case, the end user will need to find a way to either work around the problem in the system or contact a company that will provide the service of cleaning up the data. Precision Computer Methods Inc. has over 20 years experience in cleaning up data errors and inconsistencies for our clients and for others. We have worked with Sage Pro, Vision Point, BPI, Accpac Plus, Advantage and Sage 300. We have also sometimes been able to fix Peachtree and Quickbooks inconsistencies. PCM will take a copy of the data on a weekend or other down time and work with it using proprietary tools and database manipulation to eliminate the inconsistency. Before we proceed, we can usually give a fixed price quote for the cost to fix it, using a copy of the data. The result is data that is consistent across the tables. Like washing dirty windows, cleaning up your data just makes good sense. You can't make good decisions if you don't have a clear view into your business data.