Connect with us

Tech

Roadblocks to getting real-time AI right

Published

on

Roadblocks to getting real-time AI right

Analysts estimate that by 2025, 30% of generated data will be real-time data. That is 52 zettabytes (ZB) of real-time data per year – roughly the amount of total data produced in 2020. Since data volumes have grown so rapidly, 52 ZB is three times the amount of total data produced in 2015. With this exponential growth, it’s clear that conquering real-time data is the future of data science.  

Over the last decade, technologies have been developed by the likes of Materialize, Deephaven, Kafka and Redpanda to work with these streams of real-time data. They can transform, transmit and persist data streams on-the-fly and provide the basic building blocks needed to construct applications for the new real-time reality. But to really make such enormous volumes of data useful, artificial intelligence (AI) must be employed. 

Enterprises need insightful technology that can create knowledge and understanding with minimal human intervention to keep up with the tidal wave of real-time data. Putting this idea of applying AI algorithms to real-time data into practice is still in its infancy, though. Specialized hedge funds and big-name AI players – like Google and Facebook – make use of real-time AI, but few others have waded into these waters.

To make real-time AI ubiquitous, supporting software must be developed. This software needs to provide:

  1. An easy path to transition from static to dynamic data
  2. An easy path for cleaning static and dynamic data
  3. An easy path for going from model creation and validation to production
  4. An easy path for managing the software as requirements – and the outside world – change

An easy path to transition from static to dynamic data

Developers and data scientists want to spend their time thinking about important AI problems, not worrying about time-consuming data plumbing. A data scientist should not care if data is a static table from Pandas or a dynamic table from Kafka. Both are tables and should be treated the same way. Unfortunately, most current generation systems treat static and dynamic data differently. The data is obtained in different ways, queried in different ways, and used in different ways. This makes transitions from research to production expensive and labor-intensive.  

To really get value out of real-time AI, developers and data scientists need to be able to seamlessly transition between using static data and dynamic data within the same software environment. This requires common APIs and a framework that can process both static and real-time data in a UX-consistent way.

An easy path for cleaning static and dynamic data

The sexiest work for AI engineers and data scientists is creating new models. Unfortunately, the bulk of an AI engineer’s or data scientist’s time is devoted to being a data janitor. Datasets are inevitably dirty and must be cleaned and massaged into the right form. This is thankless and time-consuming work. With an exponentially growing flood of real-time data, this whole process must take less human labor and must work on both static and streaming data.

In practice, easy data cleaning is accomplished by having a concise, powerful, and expressive way to perform common data cleaning operations that works on both static and dynamic data.  This includes removing bad data, filling missing values, joining multiple data sources, and  transforming data formats.

Currently, there are a few technologies that allow users to implement data cleaning and manipulation logic just once and use it for both static and real-time data. Materialize and ksqlDb both allow SQL queries of Kafka streams. These options are good choices for use cases with relatively simple logic or for SQL developers. Deephaven has a table-oriented query language that supports Kafka, Parquet, CSV, and other common data formats. This kind of query language is suited for more complex and more mathematical logic, or for Python developers.    

An easy path for going from model creation and validation to production

Many – possibly even most – new AI models never make it from research to production. This hold up is because research and production are typically implemented using very different software environments. Research environments are geared towards working with large static datasets, model calibration, and model validation. On the other hand, production environments make predictions on new events as they come in. To increase the fraction of AI models that impact the world, the steps for moving from research to production must be extremely easy. 

Consider an ideal scenario: First, static and real-time data would be accessed and manipulated through the same API. This provides a consistent platform to build applications using static and/or real-time data. Second, data cleaning and manipulation logic would be implemented once for use in both static research and dynamic production cases. Duplicating this logic is expensive and increases the odds that research and production differ in unexpected and consequential ways. Third, AI models would be easy to serialize and deserialize. This allows production models to be switched out simply by changing a file path or URL. Finally, the system would make it easy to monitor – in real time – how well production AI models are performing in the wild.

An easy path for managing the software as requirements – and the outside world – change

Change is inevitable, especially when working with dynamic data. In data systems, these changes can be in input data sources, requirements, team members and more. No matter how carefully a project is planned, it will be forced to adapt over time. Often these adaptations never happen. Accumulated technical debt and knowledge lost through staffing changes kill these efforts. 

To handle a changing world, real-time AI infrastructure must make all phases of a project (from training to validation to production) understandable and modifiable by a very small team. And not just the original team it was built for – it should be understandable and modifiable by new individuals that inherit existing production applications.  

As the tidal wave of real-time data strikes, we will see significant innovations in real-time AI. Real-time AI will move beyond the Googles and Facebooks of the world and into the toolkit of all AI engineers. We will get better answers, faster, and with less work. Engineers and data scientists will be able to spend more of their time focusing on interesting and important real-time solutions. Businesses will get higher-quality, timely answers from fewer employees, reducing the challenges of hiring AI talent.

When we have software tools that facilitate these four requirements, we will finally be able to get real-time AI right. 

Chip Kent is the chief data scientist at Deephaven Data Labs.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Read More

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published.

Tech

USB logos finally make sense, thanks to a redesign

Published

on

By

USB logos finally make sense, thanks to a redesign


Author: Mark Hachman
, Senior Editor

As PCWorld’s senior editor, Mark focuses on Microsoft news and chip technology, among other beats. He has formerly written for PCMag, BYTE, Slashdot, eWEEK, and ReadWrite.

Read More

Continue Reading

Tech

Cheaper OLED monitors might be coming soon

Published

on

By

Cheaper OLED monitors might be coming soon


Author: Michael Crider
, Staff Writer

Michael is a former graphic designer who’s been building and tweaking desktop computers for longer than he cares to admit. His interests include folk music, football, science fiction, and salsa verde, in no particular order.

Read More

Continue Reading

Tech

New Pixel Watch leak reveals watch faces, strap styles and more

Published

on

By

New Pixel Watch leak reveals watch faces, strap styles and more
Google Pixel watch



The Google Pixel Watch is incoming
(Image credit: Google)

We’re expecting the Google Pixel Watch to make its full debut on Thursday, October 6 – alongside the Pixel 7 and the Pixel 7 Pro – but in the meantime a major leak has revealed much more about the upcoming smartwatch.

Seasoned tipster @OnLeaks (opens in new tab) has posted the haul, which shows off some of the color options and band styles that we can look forward to next week. We also get a few shots of the watch interface and a picture of it being synced with a smartphone.

Watch faces are included in the leak too, covering a variety of different approaches to displaying the time – both in analog and digital formats. Another image shows the watch being used to take an ECG reading to assess heartbeat rate.

Just got my hands on a bunch of #Google #PixelWatch promo material showing all color options and Watch Bands for the first time. Some details revealed as well…@Slashleaks 👉🏻 https://t.co/HzbWeGGSKP pic.twitter.com/N0uiKaKXo0October 1, 2022

See more

Full colors

If the leak is accurate, then we’ve got four silicone straps on the way: black, gray, white, and what seems to be a very pale green. Leather straps look to cover black, orange, green and white, while there’s also a fabric option in red, black and green.

We already know that the Pixel Watch is going to work in tandem with the Fitbit app for logging all your vital statistics, and included in the leaked pictures is an image of the Pixel Watch alongside the Fitbit app running on an Android phone.

There’s plenty of material to look through here if you can’t wait until the big day – and we will of course be bringing you all the news and announcements as the Google event unfolds. It gets underway at 7am PT / 10am ET / 3pm BST / 12am AEDT (October 7).


Analysis: a big moment for Google

It’s been a fair while since Google launched itself into a new hardware category, and you could argue that there’s more riding on the Pixel Watch than there is on the Pixel 7 and Pixel 7 Pro – as Google has been making phones for years at this point.

While Wear OS has been around for a considerable amount of time, Google has been leaving it to third-party manufacturers and partners to make the actual hardware. Samsung recently made the switch back to Wear OS for the Galaxy Watch 5 and the Galaxy Watch 5 Pro, for example.

Deciding to go through with its own smartwatch is therefore a big step, and it’s clear that Google is envious of the success of the Apple Watch. It’s the obvious choice for a wearable for anyone who owns an iPhone, and Google will be hoping that Pixel phones and Pixel Watches will have a similar sort of relationship.

What’s intriguing is how Fitbit fits in – the company is now run by Google, but so far we haven’t seen many signs of the Fitbit and the Pixel lines merging, even if the Pixel Watch is going to come with support for the Fitbit app.

Dave is a freelance tech journalist who has been writing about gadgets, apps and the web for more than two decades. Based out of Stockport, England, on TechRadar you’ll find him covering news, features and reviews, particularly for phones, tablets and wearables. Working to ensure our breaking news coverage is the best in the business over weekends, David also has bylines at Gizmodo, T3, PopSci and a few other places besides, as well as being many years editing the likes of PC Explorer and The Hardware Handbook.

Read More

Continue Reading

Trending

Copyright © 2022 Xanatan