Dark Side of the Big Data



Big Data?? 

Big data remains one of the fastest –growing segment of the all industries including IT Business.Big data is distinct from the internet, although web makes much easier to collect and share data. Big data refers to large volumes of data beyond the normal processing, storage and analysis capacity of typical database application tools. Basically this is the combination of data sets whose size, complexity and rate of growth make them difficult to captured,managed,processed or analyzed by conventional technologies.



Three ‘V’s of Big Data;


  • Volume:-  This refers to the size of the data sets from 30-50 tera bytes to multiple peta bytes as big data.More sources of data generation in the digital age combine to increase the volume of data to a potentially unmanageable level.



  • Velocity: -Another term is Real time growth. Data is streaming from sources such as social media sites at a virtually constant rate and processing servers are unable to cope with this flow and generate meaningful real-time analysis.


  • Variety:-Traditionally data was structured and in similar and consistent formats such as Excel spreadsheets and standard databases. Data can now be generated in an unstructured level and collected in a huge range of formats including rich text,web logs,RFID, sensor embedded devices and GPS devices among others.


  • Why is Big Data Important?

    When big data effectively and efficiently captured, processed and analyzed, companies able to gain more complete understanding of business, products, competitors. This lead to effective improvement like increased sales,lower costs,better customer service,improve brand image.
    Effective use of big data such as;
    ·         Improve IT troubleshooting and security breach detection, speed and future occurrence prevention.
    ·         Use social media content to better and quickly understand customer sentiments.
    ·         Fraud detection and prevention in any industry such as banking,shopping,investing etc.
    ·         Use financial market transaction information to more quickly assess risk and take corrective action.
    ·         Organizations sell their digital data that increase the value to another organization.Through this,company able to get profit from their data.





    The issues of big data


     Place to store the data-
          Even small and medium amounts of data can be difficult to manage, both technically in terms of how to store it and in terms of analyzing it.So, the more data, companies have the even more complex problems of managing it can become. Do you buy hardware? Do you store it in the cloud?How often will you need to access it? Can you deal with latency?
         Uploading data into cloud also generate lots of problems.Big data is about getting all the data in the business and linking it together to get the information.It could be terabytes of data, and uploading it to the cloud could take a while,and the data set could be changing rapidly.The rate of change of data makes big data hard to upload in real-time.
         Anyway,there are solutions like designing whole system to live in the cloud,rather than uploading data.So,the system not in the cloud,unable to get the information in and out fast enough.In the cloud,it has the connectivity to make data accessible to others.
    Identification of inactive data

        Top challenges in handling big data are the growth of data.Data is growing day by day.The enterprise which capable of handling data today may not be able to handle the data tomorrow.The most important thing about data is to identify active data.The ironic thing about data is that the most of the enterprise data are inactive and is no longer used by the end user.For example,typical access data for corporate data follows a pattern where data is used most often in the days after it created and then use less frequently thereafter.

    Security & privacy challenges-

          Use of No SQL databases and other large-scale,non relational data stores create issues due to lack of capabilities including real authentication, encryption for data at rest or in transit, data tagging and classification. Organizations need to consider    use of middleware layers to enforce authentication and data integrity.All passwords must be encrypted.Especially,companies need to defend against unauthorized access continuity and availability.Also,big data implementation leads to data leakage and exposure.When numerous end points may submit data for processing and storage,leads to false or malicious data submitted.
       Lack of skills triangle-
          All this requires people with specialist skills and an understanding of the business.The key challenge is in getting people with “the triangle of skills”. This means the right combination of business,computer science and statistics.Computer scientists and statisticians can go wild summarizing data.But,it’s not fully completed.Company wants to get insights out of the data as well.That extra insight requires an understanding of the business.People with backgrounds in statistics,data mining,predictive modeling,natural language processing,content analysis,social network analysis are all in demand.These people work with structured and unstructured data to deliver new insights and intelligence to the business.Platform management professionals needed to implement Hadoop clusters,secure and optimize them.

      Risks associate with big data technologies-

          Big data implementations typically include open source code,with the potential for unrecognized back doors and default credentials.The attack surface nodes in a cluster may not have been reviewed and servers adequately hardened.User authentication and access to data from multiple locations may not be sufficiently controlled.Regulatory requirements may not be fulfilled,with access to logs and audit trails problematic.This creates opportunity for malicious data input and inadequate data validation.Company has to use technology options such as SaaS, cloud,Virtualization,mobile etc.Organizations need to be ready to invest in big data specific training programs and to develop the Big Data test automation solutions.

      Understanding the data-

          For Big data testing to be effective,testers need to continuously monitor and validate the 3Vs.Understanding the data and its impact on the business is the real challenge faced by any big data tester.It is not easy to measure the testing efforts and strategy without proper knowledge of the nature of available data.Testers need to understand business rules and the relationship between different subsets of data.They also have to understand statistical correlation between different data sets and their benefits for business users.

    Dealing with sentiments and emotions-

          In big data system, unstructured data shown from sources such as tweets,text documents.The biggest challenge faced by testers while dealing with unstructured data is the sentiment attached to it.For example,customer tweet and discuss about a new product launched in the market.Testers need to capture their sentiments and transform them into insights for decision making and further business analysis.


    Dinakshee Saarangaa
    Viduni2@gmail.com 
                            CIS 2011/2012 


1 comment :