Written in Livemark
(2022-06-18 16:49)

Find

Where is the data located and is it accessible?

This step entails knowing where to look for your data, finding it, and knowing how accessible it is. This is a step of varying difficulty depending on how well you defined your data problem.

Finding data also depends on your creativity and critical thinking. When data seems hard to find, you can consider looking at proxy indicators—an indirect measure or sign that indicates a phenomenon in the absence of a direct measure or sign.

Things to consider

Asking the right questions

It is impossible for a single person to know where to find all the data that you need which is why experience, contextual knowledge, and having contacts in the relevant fields are key assets that will help you find the right dataset for your project.

Be mindful of the fact that several sources may maintain similar datasets where one dataset is a better fit for some projects than others. Your task is to understand the precise data needs of your project in order to compare it with all the available data that you find. This step is important as it may lead your team to review the scope or research question of the project.

When looking for data, you can:

Data sources

There are a lot of tools, techniques, and data sources that can help you in finding data both online and offline. These include:

Understanding data formats

Different types of digital files use different structures to hold information. For example, a text file is structured differently than an image file, which is structured differently than a web page. At the same time, most computer applications can only open a few file types since they are programmed to work only with specific structures—i.e. a word processor cannot open a spreadsheet file. It is important that you know about different file formats and how they relate to the data that you require so that you can better plan a strategy on how to get the data.

Some of the most common file formats/file extensions that you might encounter when working with data include:

Common issues

Settling for a poor quality dataset

When looking for data that you need, it is not uncommon for you to find multiple datasets and sources pertaining to the same data. Try to avoid the temptation to settle on the first dataset you find and adjust the project based on that without investigating further if there are better options.

Sometimes it may be more useful to create the needed dataset out of several quality datasets rather than settling for the obvious choice.

Learn about open data, how to work with data, how to do better data-driven projects, and how to improve your data literacy.