4 Steps To Better Data Hygiene

4 Steps To Better Data Hygiene
5 (100%) 1 vote[s]

I once did a consulting project with a popular Indian fitness startup that wanted to track a bunch of churn related statistics to better understand why their customers chose to leave. This was a technology company founded by people who understood the power of data and their core product tracked a lot of information about their customers. But it was still an uphill struggle to get them into an analysis program, because their data hygiene was almost non-existent.

For one thing, the data was all over the place. Just getting the data from disparate sources (usage stats were on one database, subscription stats were in a different database, marketing and acquisition details were in Google Analytics, etc) and tying in key fields so that you ended up with one or two files with all the customer data took a very long time and lots of manual effort from high level resources. Then we had to figure out how to do this on a regular basis, which means automation, which means even more time (and hence money). By the time a month had passed by, they were not any closer to getting any insights, despite the huge amounts of effort put in.


Of course, once these initial problems are taken care of, the analytics more than makes up for the lead time, by delivering real, actionable insights. But it really would have been great to get rid of that first month’s worth of slog! Here are the things I learned that companies need to do from the very beginning, to get their data hygiene correct. These are steps that need to be implemented before the data is even needed, so that when you do need the data, it’s there for you at a moment’s notice, not a month’s notice! Basically, the time for planning is in the beginning.

You cannot plan what your data strategy is after you’ve started collecting data. Most companies realise early on that they need to track data and they even identify the data that would be useful in the future. But, unfortunately, most companies stop there.  What really needs to happen, is step 1:

Establish A Data Hygiene Routine

Good data hygiene starts with a good routine. Just like brushing your teeth or remembering to drink water, routinely managing your data will have positive long-term effects on the health of your business. Whether you choose to do this as a top down method (a quarterly data review and cleanup) or a constant, on-going process (set all the data hygiene plans in the beginning and quarterly checks to see the protocols are being followed) will depend on your organization. But you do need to establish some routine. Once you’ve done this, something that comes simultaneously is to:

Establish Clear Data Hygiene Ownership

It’s extremely important to establish clear ownership and accountability over data hygiene as soon as possible. In a startup, everyone wears different hats and all the key stakeholders are invariably very busy, so when nobody is directly responsible for the overall hygiene of the data, it’s easy to let it get out of control.

While some companies might want to split up data hygiene by department, such as Finance looking after billing data and Marketing looking after acquisition data and Customer Service looking after customer service data, you need to be sure that no matter which source you’re pulling from, the total meshes with each other. It is of no use to have billing data that does not record which User ID is paying, because that will never mesh with the product usage data that the product team is recording. Acquisitions data, which usually does not track either User IDs or email addresses is a notorious place for this error to creep in.

Having a single person looking at data from a top-down perspective will eliminate this problem.

Prioritize Your Data Hygiene Efforts

While it would be great to have every bit of data on your customers, that’s just not going to be possible. There is no way you will have complete, deep information on each person who interacts with your product. Accept it and move on. Realise though, that not all data is equal. For your business to work, some minimum level of data capture is required. As an example, if we were talking about a billing transaction, the key elements would be: the date, the name of the customer, the billing rate, and the account (User ID) they’re associated with. Focusing on these elements of your data will make it easier to keep track of the information that’s most useful to your business. Additional data like credit card expiry dates, address, etc are nice to have but not critical.

Establish Data Hygiene Documentation

One way to avoid confusion about data is through proper documentation. This includes defining what the data is, and the context of the data within the business.

One example where this can help is when organizations add a lot of manual, custom fields to their data, such as when tackling customer service requests. Instead of sorting the request into pre-selected categories, for example, it may be left up to the service executive to type it in on a case-by-case basis. The result? A non-sortable database of random text that will let you get no insight into which category of problems are most affecting your users. How can you fix something when you don’t know what’s broken? Thinking about this early and often can help reduce this issue later on.

While it can seem daunting, these are four practical steps that businesses of all sizes can utilize for a data management program. As your organization grows, you can start to add dedicated personnel or tools but you have to start building that foundation now.

There are also some data hygiene tools that might help you process your data better, but with all of these, it’s more a question of getting your data organized once you’ve collected it, not helping with the collection part itself.