In 1848, gold was discovered in California, leading to a massive influx of people from all over the world seeking their fortunes. Driven by their hopes of striking gold in the rivers and streams of California, these individuals were known as “Gold Rushers.” While most traveled west to mine for gold directly, a few savvy entrepreneurs realized that there was another way to profit from the Gold Rush without prospecting for gold. They recognized the significant demand for tools and equipment that the Gold Rushers needed to mine for gold effectively. This story of entrepreneurial success spawned the metaphor “selling pickaxes during a gold rush” that we know today. In the 21st century, we have tech gurus migrating west to strike it rich with groundbreaking AI algorithms. This parallel begs the question, how do you sell virtual pickaxes to data scientists or tech companies? The challenge is not about sifting through rivers of gold but about providing the essential tools, technologies, and services that enable data miners to extract valuable insights and knowledge from the vast streams of unstructured data. Below, we explore how these virtual pickaxe sellers play a vital role in shaping the landscape of technology.
Traditional Sources of Data
Before the internet, data was limited to paper, punch cards, floppy disks, and CD-ROMS. Until 1998, with the introduction of Google, we entered a “new era of data analysis, collection, and storage, leading to increased efficiency and accuracy in data processing.” The value of data grew tremendously in this era. A few years later, the term “Big Data” garnered traction, but little did anyone know that this would be the motto of the digital transformation over the next two decades. Big Data refers to maximizing the most valuable characteristics in a dataset: volume, diversity, and speed. An ideal Big Data system will contain data that is varying in source, large in quantity, and rapid in collection and analysis. The importance of Big Data has led businesses to consider the capacity to gather, store, and analyze substantial volumes of data a crucial aspect of operations. This capability empowers them to enhance decision-making processes for better outcomes.
Now that we understand how valuable data can be, let us investigate how we traditionally collected data during this time. Traditional Data offers a “high level of organization and structure, which makes it easy to store, manage, and analyze.” Data scientists can then draw value from that data by “using statistical methods and visualizations to identify patterns and trends in the data.” One of the most common traditional sources of data is internal data which can be found within sales, customer information, financials, etc. For each of these categories, data is collected from many sources, such as product names, units sold, purchase dates, shipping addresses, customer purchase history, etc. Companies can then use this data to search for patterns to discover relationships between these sources. Then statistical analysis would be used to determine the strength of those relationships. If the relationship proves to be strong, the company can use it to make a data-driven decision that takes advantage of the situation. For example, let’s say a candle company found a strong relationship for units sold between both December purchase dates and Sarasota addresses. They would be incentivized by the data to increase production and advertising in Sarasota during December to take advantage of this fruitful market. Beyond their internal data, companies can also gather data from other sources, such as market research, government databases, or publicly available data, to create similar insights. These sources gather data in different ways, such as surveys, observational studies, industry analysis, etc. The determining factor for their identity as traditional data is that it was “collected for a specific purpose.” Traditional data is also characterized by its consistent predefined format that is typically easily stored in spreadsheets. While these methods alone were satisfactory for most companies, the digital revolution allowed others to search for more complex insights within non-traditional sources.
Non-Traditional Sources of Data:
Around 2005 the internet was taken over by a new contagious phenomenon, social media. This change caused some to adopt a new addiction to selfies, hashtags, and filters. To others, these changes signified “a shift in the way we look at data collection.” Social media brought in a whole new dimension of digital information, causing a shift where internet data would now be dominated by “people using the service rather than the service themselves.” With this change, Big Data reached new levels in both volume and variety. Social media introduced both massive amounts and new types of collectible data upon its internet invasion.
In contrast to traditional data, non-traditional data is not collected for a specific purpose, but it is “repurposed for a new use case.” For example, imagine you are a talk show host, and you want to use non-traditional data to find a new popular guest to bring onto your show. To find the most in-demand celebrity, you use an algorithm to track the number of likes, comments, and shares possible candidates have received across multiple social media platforms over the past month. Keep in mind that the original function of likes, comments, and shares was to determine the activity of that one post. You eventually select Dwayne Johnson because the data analysis has shown he received the most interaction over the past month. By repurposing these data points to reflect the degree of popularity amongst other celebrities across multiple social media platforms, you have utilized non-traditional data. The digital revolution has led us to the point that nearly every move we make and word we speak is tracked and monitored for business insights. This concept is known as the Internet of Things. The concept explains how our world is filled with physical objects that are “embedded with sensors, software, and other technologies for the purpose of connecting and exchanging data with other devices and systems over the internet.” These “things” include phones, computers, TVs, healthcare devices, wearable devices, and any device with the word “smart” thrown in front of it. The data collected from these “things” can range from location, motion, audio, text, images, activity, etc. Most of this non-traditional data is useless when it is in small amounts with little processing preparation. However, with large amounts of non-traditional data, the merit of your insights will be determined by your ability to convert that data into a usable format.
Future in Data Sources
The future of data will be defined by two categories – who has access to the most data and who is best at converting that data into a usable format. Some of the most coveted data comes from search engines because of their volume and variety. People will search for anything and everything from “Where to get cheap gas?” to “Is my house haunted?” In terms of search engine market share, Google dominates with a share of “93.12% worldwide.” Google is a giant when it comes to data quantity, some companies resorted to other methods to collect data, such as data marketplaces. These data marketplaces are a “one-stop-shop for buying and selling external data.” These online locations are optimal for data-rich companies to sell data they cannot process and ideal for data-lacking companies to buy data. One of the top data marketplaces is Datarade. With a catchy name and “2000+ data provider companies,” including Google, Amazon, and SAP, Datarade is poised to be one of the top data marketplaces in the future. At the end of 2022, the “global data marketplace market size was valued at USD 968 million.” With future technological improvements, global data marketplaces are destined to be a profitable new market.
The emergence of AI implementations over the past year has many data scientists interested in its data processing capabilities. Unfortunately, “the AI you implement is only as intelligent as the data you feed it” Because of this fact, companies are racing to figure out the best ways to improve the quality of data they feed their AI. Quality of data can be difficult to advance because the methods for improving audio data quality would differ vastly from improving image data quality. However, the reward for a solution to these difficulties is massive as “last year it was estimated 90% of all data created was unstructured, and its rate is growing at 60% per year” AI’s ability to sift through massive amounts of data swiftly will surely pay off for the companies that solve difficulties with data quality.
In this digital gold rush, those who can effectively harness the power of data will hold the key to success. The pickaxe in this gold rush could be data marketplaces, AI data quality improvement, or non-traditional data itself. The advantage in this race will be with the company that makes the most technical progress in any or all these categories. With the right tools, techniques, and strategies, businesses can unlock the full potential of data and pave the way for a data-driven future. The quest for valuable insights continues, and as the data landscape expands, so do the opportunities for those with the vision to seize them.