Wednesday 19 August 2015

Dark Data : an abnormal challenge.

It is no wonder that big data is big news. Organizations are keen to bring right things to right people at right time. It is a promise of speed and specificity.Stores like Wall-mart handle over a million transaction in an hour. Google processes tens of thousands of terabytes of data a day. Analysts work hard to find out the pattern of the digital clues we leave about our whereabouts and our desires.

But, India has a different challenge. Only 16% individuals leave digital footprint. What about the rest? The one billion others ? Their desires and whereabouts are still dark to the world. This is the problem of dark data. Even as we move towards big data for future, we cannot ignore dark data. It represents too enormous population to be ignored.

We can be effective if we are able to understand the dark data ecosystem. The main challenge for analyst to find dark data is to find any data at all. Addressing this issue will require entire different set of mechanisms.

For any organization serving this segment, every data point comes with a cost of time and accuracy.
every organization, before going for this type of data, should ask two questions about the relevance of this type of data. 1. What are the human behaviors that matter ? 2. Is it really worth ?
Dark data is about having to know ahead of time which object you want. This requires deeper perspective and different kind of exploration and research.

Even if you obtain the information about the people in dark zone,obtaining it with useful accuracy is not simple. Let's say you want know something simple as occupation. Establishing a consistent classification that is understood by major part of the population is difficult. Most people in informal economy have multiple occupations. They do daily wage labor by day and sell flowers at a bus stop in the evening. Some of them might work on the farmlands in the morning and sell Papads in the afternoon. Are they in Agriculture or retail ? Some people may check one box and some may check other.

Furthermore, there is the issue of reliability. In any organization, most of the times, filling of forms is often just statutory with no checks. This may be because of lack of sensitivity and little awareness. If there are no checks,people take shortcuts and tick any box.

Without relevance, consistency and accuracy the data cannot be analysed and it is of very little importance to the organizations. The challenge of large data is not trivial. But , its abnormality makes it a challenge. The dark data will soon play  a major role in analysis. Cleverness and technology aid can solve this problem of dark data. Organizations will soon be hunting for the candidates with these two qualities. 

2 comments:

  1. If the check boxes are left to the people to tick ; authenticity is questioned but if the person seeking the data talks to people and checks the boxes ; authenticity is 100%

    ReplyDelete
    Replies
    1. True. People should not tick the boxes but the person seeking the data should be clever enough to get the dark data out.

      Delete