CloudFactory Webinar - WEBINAR SPOTLIGHT: Maximizing the Value of your Autonomous Vehicle Data

Written by BTOES Insights Official | Jan 16, 2023 2:21:26 PM

Courtesy of CloudFactory's 'Alex Morales' & Unstruk Data's 'Kirk Marple', below is a transcript of the webinar session on 'Maximizing the Value of your Autonomous Vehicle Data' to Build a Thriving Enterprise.

Session Information:

Maximizing the Value of your Autonomous Vehicle Data

How to Build Safe AV Applications from Large Datasets

There is no shortage of data sources for training autonomous vehicle (AV) systems. From 2-D and 3-D images to LiDAR, you can easily have millions of data elements to manage. But it can be challenging to know what data is valuable, how to find it, and when to use it. Attend this webinar and learn how expert human-in-the-loop (HITL) annotators help you get the most out of your data by increasing its value and usability. You’ll walk away with a better understanding of how to:

Prioritize quality data for AV applications
Identify accurate data for AV models
Interpret data in context
Improve the quality and enrich your data through accurate annotation

Session Transcript:

Hello, everyone, and thanks for joining us today.

I'm Michael ... and I'm the Moderator for today's webinar, Maximizing the Value of Your autonomous Vehicle Data.

When you're training and AV model, you have a wide variety of data sources including two D cameras, video lidar, or radar, generating millions of data elements. This is great for covering those typical driving situations and corner cases and your AI model, but now you've got a new challenge.

When you look at this vast amount of data, how do you know what's valuable, how can you find the data that's useful for treating your AV model at the time that you need it?

So, today, our panelists are going to talk about these challenges. Why Now is the time to face those challenges, and how you can get the most value from your data.

First of all, introduce Alex Morales, one of Cloud Factory's industry executives for autonomous vehicles.

For over 15 years, Alex has worked with data analytics and autonomous systems to solve a variety of problems, leveraging computer vision.

He brings that experience to Cloud Factory, helping our clients to scale their AV projects.

Joining Alex is Kirk Marple, founder and CEO of unstruck Data.

Kirk as an entrepreneur and technologists with over 25 years of experience, including time at General Matt at General Motors where he built a prototype of a big data ingestion and processing platform for video lidar and Vehicle Telemetry for their A, the Development Teams.

He'll share his approach and experience solving data challenges for AV development.

Now, at anytime during this webinar, please feel free to submit questions that you have. You'll see on your screen, there's a questions chat window.

We'll do our best to answer all your questions by the end of the webinar.

Also, you can see a document that you can download and look at sometime during or after the webinar at your leisure.

So let's go and get started. I'm going to pass things over to Alex.

Excellent, Thank you, Michael. Again, Alex Morales based out of Austin, Texas, excited to be here today and just share some of the stories and information that we've seen, join me as Kurt. that. I actually have a relationship with, for quite some time working at different ventures and, you know, just excited to share some of our stories and really dig into the data. No date, data really does drive everything.

Yeah, thanks for having me on and. I mean this is going to be really fun.

Excellent. So just quick context. Here, a chair at Cloud Factory, just want to give a little bit of context 32nd overview. Founded over 12 years ago, we've worked with over 700 clients. And we come in at various stages of the AI ML life cycle. But we're very focused on mission driven approach, to provide meaningful work, and develop leaders with highly technical labeling skills and developing nations. But, at the same time.

At the core is human in the loop data annotation service. We meet clients and become part of their team at the very beginning of it. We're training datasets to feed their models, and we see them in the later stages, as well. During that life cycle, We operate more as a QA layer on already in the wild models.

It's a non anonymous crowd, so it's the same worker showing up, day in, day out, focusing on high quality data labeling.

So, as we jump in, Yeah, I'll set the stage. And, it's funny, Kirk and I had previous conversations, just digging into data, and there's a million analogies about how it's almost like a forest. And you have to be able to really see the, the forest from the trees. So I'll sure, I'm sure that I'll show, and lead on some other analogies later. But the goal of today is to keep it conversational. I don't wanna over PowerPoint, anybody, and I'll set the stage, and dig in, and from there, we'll dig into key points.

So to set the stage, the market is strong. There's Kim Serum, there's consumer demand across various markets. Ton of his cars, robot, taxis, and trucking.

Each provide value varying from safety to efficiency, to quality of life.

So with the growing markets, we're at the most important time for success.

There's a host of positives and there's a host of challenges as well that are escalating in importance of that success of the models.

Hardware manufacturers and software development companies are improving on edge case navigation.

There's a wide variety of sensors available. We've seen fleets that have nine cameras to lidar systems, radar, various microphones, and thermal IR cameras, and sometimes that's just what it takes, itself, the biggest challenges.

On the consumer side, some of the challenges are the perception versus capabilities that are coming into play.

So that kind of leads into some of those challenges, And then, overall, we're seeing some of those leaders that are really pulling ahead, and what they're doing is they're focusing on the process to maintain their strength in the market.

The success for Level four and Level five autonomous vehicles, is not just about developing safe navigation systems that perform better than humans, AV developers have to demonstrate effective operations quickly, ideally first to market as well, with safe operating models.

So, at the same time, we see companies holding operations for fully autonomous vehicles. Just, it's business, it's economic.

And even some government, officials are, while they're open, two trials in their cities and states.

They are shutting down testing pretty quickly if there's just one serious challenge. So it's important to be very strategic and just to have that level of success be as key as possible.

And, you know, it's, it's tough to see, but some of the most advanced systems are closing and stopping operations as well, even though they do have some of the greatest talent on board.

So, let's kinda dive in a little bit.

The, you know, data does drive every stage of the model development process and to simplify it, the moment of the data, and the movement of the data, and the models, are thought of as a feedback cycle.

Yeah, that's very, high level.

But, at the same time, when you design and build first, you have to pause and hear the feedback, how much data is actually needed, and the quality levels that are required. A lot of times, the context is key around that data.

So, what we're noticing just off the very bad is just the, the importance of that quality of data, but then also finding the right data. So deployment and operation, and being able to operate operationalize it.

Each model is different and can require different unique datasets.

Then refining an operation, optimizing, can be a challenge on its own.

We have clients that have been working for five plus years and they're still tuning their model.

It takes more data, more training, even with good quality data, new use cases and new edge cases come up that require that support.

I think I can drop in a couple of points here. On the, on the left-hand side here, it's really, I mean, the volumes of data are growing and, and really just, the acquisition sometimes gets overwhelming. I mean, how do you get the data, actually, into this process? I mean, off of the vehicle or from the, the test track? I mean, we've seen limitations, and, I mean, just the sort of, nuts and bolts of, I mean, how do you move, I mean, hard drives around. And so, I think there's, I mean, there's, there's a lot of struggles even, just getting to this point, it's because the data sizes are growing so quickly.

Yeah, that's perfect, and actually the perfect lead in here is where, I would love to kinda double click on this and dig in. just kinda bringing back the forest analogy. And the importance of the data is if you sharpen your X for six hours, you can chop down a tree in six minutes.

So, it's really having the sharpest acts and being able to have the data available. Kirk, I'll let you jump in about some of the stories of, initially. Even getting that data. Yeah. Yeah. No, I'm in my time at GM. It was very early on, after they had bought Cruz, but one of the I mean, the actual problems, whether it's I mean, they were capturing data, kind of so-called recording it into the rosberg format and it was lidar. It was video is telemetry. But actually getting from the vehicles back to the data center was something. I think. I mean, it was it was sort of, I mean, maybe overlooked a little bit at first, and so we had to set up a, basically, an automated ingestion system to actually say, look, I mean, how do you get those bits into the hands of these kinds of data pipelines? And so I was tasked a lot with taking the files, and getting in the hands of data scientists, but there was even a step before me.

Just, I mean, how does it even come into the into the data center? And so, that data acquisition box, that you showed there, is, can actually be more complicated than been really, is, maybe obvious, at first.

Then, the data cleansing, I think, is something, as he could say, that, 80% of the project time, but there's, there's dirty data. I mean, video frames, fail to capture, and they're black or, I mean, telemetry is all flatlined, things like that.

And so, you can talk about it as kind of a Beta QC as part of this. And that's a mean annotation QC, and downstream is so important for training. But, really, getting good quality data at the forest level, not the tree level, is really, I think, sometimes it over overlooked area. Because it's, especially, once you start to get years of data, and you're, I mean, and if it's geospatial, lee, disparate, if it's time disparate, that's where some of the challenges come in.

Yeah.

Kind of even digging into that, if you are able to acquire the data something that is, that was really interesting, that I know that we've talked about in the past, was, partially, moving. The data becomes expensive.

Sometimes recapturing, that data becomes really expensive and thinking about just the different use cases.

And one that I heard and had a deep conversation recently was something that, if somebody explained it, just very high level of construction worker waving hands, somebody in traffic Yeah, we don't think twice about that, it's a construction worker giving a gesture to move somebody along through the site. But there's different conditions.

So being able to understand those conditions and Kirk, I know that scene, this over and over.

Yeah, it's a really interesting to me because I had worked in the broadcast video world where you put a show on Netflix, but there's all this metadata around it of the cast and crew. And I mean, who went into it? And in this world, there's kind of a similar model of, mean, you've captured the data that, I mean, there's a video there that, that's being used. But there's a lot of context to it. There's metadata that needs to be tracked of. I mean, where was the sun? Where was, I mean, what time of day? Or was it, What was the weather like, can actually be used as filtering mechanisms, when you're training a model, you might want similar data that's clustered by time of day, by weather condition. And I know. Especially with a View Projects, and they all typically are tested right now, a lot of times in the sun, in sunny areas. And at the start, I mean, they were trying to get everything around, kind of clustered by weather and some conditions.

But it as you start to get into training and a snowy place or rainy place, how, I mean, how would you go and look at a directory of 10,005 and figure out, OK, like which ones, Which, where was it snowy? And that's where the metadata comes into play. And that's where, I think seeing this is, it's not just, I mean, every video's a video or every image is an image. There's a lot of context. But has to be kept along. And that's where I really get interested in seeing, how do you keep the knowledge and context around the media, with it through this entire life cycle.

Yeah, and you know, just to keep the conversation going with that, I think it's really interesting, because a lot of companies keep their data. Whether it's an Azure and an S three bucket, if there's, but it's unstructured, there's no context to it, from my seat.

You know, Cloud Factory, I could say, Well, Yeah.

We could label the data, and give some structure, but there's, There's many more, as well.

So, in a environment, if you want to talk about providing them metadata, as well, I mean, that's really what my company is all about, is, I mean, looking at, how do you manage the metadata with be unstructured, or some people are now calling a complex data files. And there's really, to me, a physical side and a logical side. I mean, the physical side is the MP four on desk, or the JPEG file on disk, and there is some technical metadata in there like, I mean, you can get the time, the GPS location, the camera information, make model, even lens information. But then there's, I mean, sort of, high level, logical things. Like, what is this observing?

I mean, is this a, is this an outdoor photo? Is it a picture of a piece of equipment? And so there's some a kind of ML algorithms that are easy to use for classification these days. And so having sort of a high level, rough cut, sort of editing a rough cut browsing capability. That's both technical metadata and that kind of logical metadata so important.

And that's really what I've been spending a lot of time on our customers who have once you put 10000, 100,000 files into an S three bucket. Filings aren't good enough to differentiate the data anymore, and folder names, or remain, or just a very difficult. And so having, essentially, a database, and then some sort of database, on top of this, to be able to find the physical files, is, isn't really key.

And a story, we were talking about the other day, where they were autonomous rovers. They happened to be undersea rovers. And there was just hundreds of thousands of these files that were taken from the sea floor by an oil and gas company.

And, but once they get compiled and year after year, and trip after trip. And how do you go back? And even though we're in the world, they are, I mean, find them geospatial. So, dude, that's all the metadata comes into play and even found where they would have to go rerun ships because they couldn't find, I, mean, where have we scan the sea floor?

And another, another quick story I didn't mention the other day was, I'd worked with a, basically a public works kinda state department of accounting and they had different divisions, different groups, that were flying drones.

But they wouldn't know if, I mean, versus using, sort of generically it's called the water department didn't know where the fire department and Florida Drone and they may have flown a drone over the same spot in the last two weeks, but it was siloed data. And so that's where they end up, just, each person does their own thing, they spend more money sending out drone pilots, But if they had a view of this metadata, they could have re-use data, and that's really where this data acquisition. And, I mean, what we call data cataloging kind of fits in at this point.

Yeah, and I think that's super key, because exactly what you're mentioning is, not knowing what data that you have available, is a key conversation that we have all the time is.

I need to find specific data around specific edge cases, and it's not a little bit of data, there's petabytes and petabytes and petabytes that are created. I mentioned all of the sensors. and it's not just one vehicle that's collecting all of this, it's Fleet's.

So being able to set up a process and I'm really be able to tackle that. I think that really is the sharpening of the X analogy. And being able to set it up correctly in the first place, the diversity of the data types is what causes. I think a lot of the complexity, where men, back in the day everybody, I mean, you had some of the similar problem of, Hey, I have a bunch of file, a bunch of documents in my organization. And I'm trying to track them and everybody jumps into like SharePoint or something, and you can index and search them. But what does that mean when you have lidar data? And when you have three-d.

data, when you have, I mean, really high, I mean, high resolution, like ortho mosaic images. And not everybody or not every tool can even support them today.

And a lot of the ML sort of tools. And libraries have restrictions on, like how big an image can be handed to it. And so, you have to manage things, like tiling, and rescaling and all those kind of things. But the metadata is still important, because you have to know when to do that. And so you get a lot of metadata driven code prep, I mean, Dataprep. That has to really come into play here. And so that's where I mean all this. Like you said, it's it's a big kind of virtuous cycle.

But if you can start and have a clean catalog, but a very smart catalog upfront, it just makes all the downstream stuff.

More valuable. And what we see folks doing a lot, is when you really start to get a mass of data, like into the tens of terabytes and petabytes, is, I mean, the ability to triage. And, I mean, you might want to start you. You don't want to annotate everything.

You want to start and start carving that haystack down into smaller haystacks and then kind of roll into the annotation phase, and that's really where the cataloging is important.

We've also seen some things of, like, I mean, now, with finding similar data, that you may have an image that is something interesting, maybe of a vehicle in a construction site. How do you look in your 20 terabytes of data to find other ones taken around it? Geospatial other ones taken, maybe at the same time of day. And you just want to sort of start hacking that data down into some usable form, and I think that's where it also gets interesting. And I haven't seen a lot of folks, I mean, kind of focus on that side of it. But, I mean, but, it really only comes into play when you're really high volumes of data.

And I think you're exactly right, where it's that high volumes of data, but being able to, going off year, barrels of, hey, finding the needle in the haystack of the actual piece of data that you need.

So I'm kind of going back to the very beginning stages, having that set up correctly, being able to get the context, and I've seen a ton of memes where it's, You know, what, what with data, actually, it looks like versus what is interpreted?

So, making sure that it's just as clear as possible and keeping clean data, something that you mentioned of having dirty data could end up impacting not just models, but it takes up space.

It becomes expensive and just could, you know, throw a wrench in in the total operation process And there is a loop that, I think, we didn't even talk about it, but, I mean, once you annotate the data in a human way, you can put that back into the catalog as well. And that, essentially, you're labeling the data, and that can be used for searching.

And sometimes I don't see that in, in sometimes where, I mean, you're actually adding knowledge to the system via human labeling and that doesn't circle back to this kind of triaging cataloging mechanism. But if you can mix sort of auto labeling with human labeling, you're just getting more and more knowledge, injected into your system about your data and it just makes the process we're easier for for everyone.

And then I mean that in the downstream. I mean, that's what, I mean my, my focus has really been on the upstream side of this, I mean but the downstream is there's a lot of commoditization and I guess you could say are just a lot of great. I mean, technology, that's coming out for auto ML and just better that are distributed training algorithms.

And things that, I think there's a lot of effort has been put in those areas, and it's, I mean, we can make incredible models now, but the upstream side, I think it's gonna have to start moving into it. And some more innovation in those areas.

Absolutely, absolutely.

So, we'll dig in a little bit into the technology piece, shortly.

So the Cloud factory recently did acquire a piece of technology that I'll kinda hint at, I won't dig too far, and happy to have separate discussions around that.

But I think it's extremely interesting, and I guess the main point is just having the right data, being able to have it, searchable, being able to make use of it.

And, again, I've had several conversations where that's the moment where people start pulling out their hair. So, just getting that organized and in place at the right time.

Exactly.

So, imagine if there was a way to overcome some of these challenges, whether it's aggregating, cleaning, labeling, augmenting the data, reducing data errors, you know, it's not only giving you the quality of data for your safe and effective AV models, but it reduces the time to market.

So, right now, that's even more key than ever.

Yeah, So, what we're talking about today really is ... prioritizing the quality of the data needed for the applications, and improving the quality of the data and enriching it. I think the enriching it part is really becoming kind of, I would say, that it was overlooked before, but is becoming more and more interesting to a lot of companies. Because while we hinted at the masses of data, yeah, one of the, it's funny. It was actually an oil company in the past.

They said, the data itself is becoming as valuable as the oil itself, and I think that's completely true, because as much of the data that you can make usable, you can then just qualified, and quality decisions.

In a startup I just thought about, I don't think we reference it later. But with new innovations in synthetic data be able to kind of build sort of simulated data is having that knowledge of your actual captured data makes it a lot more valuable or not more interesting to generate synthetic data from. And so, I mean, I've worked with some of that, was doing essentially capture of, I mean, autonomous drives and, but they wanted to essentially create a simulation kind of version of that. But, I mean, you have to know, OK, find me, data that has like, in this picture. I mean, whatever, 35 cars in it.

I mean, how would you go back and search for that to use, for training, and used, for examples, to drive simulated data generation from? And so, I think that, as, as we see, the simulated data technology is kind of evolve, having that quality to then call it, enrich. That data with some simulated data, I think, is, we're gonna see more, more kind of tie ins between, between those two. And also for companies that are, I saw somebody building a model for tools to sort of hand tools and sort of equipment, there's typically like a skew somewhere for each of those. Those tools living in another database.

You may want to filter your data by the skew, not just buy a picture that has a label With that says like hammer, for example.

And so that's where the enrichment comes in and kind of Creating those links between your business data and the media that you capture and I mean, think of it like for a vehicle. I mean, you may want to filter on, I mean, I'm making model of a car.

And, and that may have to go look in another database for that, and that's where that enrichment gets really important.

And I haven't seen a lot of folks who are really going to go into that level of, it's sort of enriching the knowledge of their captured data.

Yeah, and that's interesting, so you brought up the synthetic data and I have a feeling that some of the questions will, will come up regarding that as well.

Just a little plug to throw out additional questions in the in the Q&A, Michael, will be organizing those, and we can get to them at the end, but that question does come up a lot around synthetic data and you can create scenarios to test models that just either might not ever exist.

Whether it's a series of traffic pylons and formation that you wouldn't even think to to create without just kind of playing around with that synthetic data. But a lot of the companies that we've had conversations with, and I heard a talk recently from somebody way, malware it Sometimes it can be a blend, but using the real data from out in the world, and things of that nature. And then, just being able to tie in a hint of it can really lead and strengthen those ML models, and that success. I mean, I've been working a lot in the geospatial data space. And I've been with me to be able to place objects in existing satellite imagery, I mean, for training. And I think that's where kind of mixing real-world and synthetic world. I think, is where we're gonna see a lot of this, because, as we all know, I mean, it's really expensive. And difficult to label enough data to make valuable models, and I really see that.

I mean, we're gonna start to see a blend of those by, I mean, really kind of crossing the chasm more in the coming years.

Absolutely.

So, quick, I'll give a clock factory, just a quick plug, just to go over some of the things that, you know, I mentioned, that we have the conversations, and a lot of it has to do with how much data that they have and being able to manage it.

For over 12 years, we've been helping to train and analyze specifically for the AV market. We've seen over 30 customers, we've worked with two D three-d. Lidar Data, you name it, it's been there.

Variety of S self driving option, Object detection, hardware systems, you know, just to name a few.

So, it's Yeah, it's speaking to the point and just kind of hearing what we hear day to day and being able to work with our customers as just, but I like to say, being part of the team a little bit further down the hall.

And being able to provide that feast that feedback cycle, and just keep it as Kasten as possible.

So, while there are changes that arise, whether it's hardware changes, whether it's different use cases, things of that nature, we've seen it. So, being able to adjust quickly, being able to scale extremely quickly and help augment the team just to get, you, know, just to get things done. So.

I will, sorry, I lost control for accent.

So a couple of slides ago, we mentioned early data collection and movement, And you know, it's expensive.

It is very expensive.

So what Kirk was mentioning of being able to bring value to it, and it helps you prioritize what to keep, it helps you organize it. It can help you define the goals, and the safety, and the effectiveness, and there's ways to automate the process as well.

So, you know, maybe we dig in a little bit there as well. I hinted that there was technology that was acquired recently by Cloud Factory.

And the idea is being able to blend in the technology with the human aspect and being able to give context that only a human can get.

So, whether it's training for auto labeling, that's a part of the process of being able to make the most out of the two D images, Being able to make it as efficient as possible, because, with that much data, there has to be a way to automate part of that process as well. But I think that also, the collaborative aspect of it is something, I think, where, I mean, you have subject matter experts that, I mean, can identify, I mean, things specifically. But, I mean, you might need to cover review and approval process. I mean, maybe 2 or 3 people need to kind of collaborate on that to validate. I mean, is this really the right label, or the right annotation area, or segmentation area. So, I think of looking at, it is not a one person problem, but a collaborative problem isn't as important as kind of just the data side of it.

But, also, I think, Given the tooling is where, I mean, think of all the productivity tools we now have for normal business kind of IT capability. By the ability to look over masses of data and quickly navigate between, OK, I'm looking at one image, Show me kind of things that were collected around, like a five minute window. Show me things collected within a five meter area.

And just almost to be able to navigate through that space of time and time and a geospatial, I think it's going to be really important for this area and organizing. I think you, one assumption, I always look at it, as you can't assume humans or to organize the data, It's just too much data.

So I think having a human in the loop is important. But you really have to do a first cut and have a sort of auto labeling autocratic organization.

But you want to be got tunable to, I mean, you want to be able to have, almost like a, like a rules engine that can say, OK, here's how I want to organize my data, because every company might be a little bit different.

Absolutely, And that, that's absolutely key.

So it's fine for me. I'm just hinting at the quality, just being so important. That's exactly it.

You know, just being able to enrich the data is one of the things that I really wanted to touch on today and just the importance of it, but I'm curious to hear from others as well, is, you know, is what is the main challenge today? And is it the management of the data, is giving it context?

Is it really just bringing value from it?

Because kinda what I, what I hinted at before is having all that data can actually be extremely valuable. It can lead to further success within the ML models. It could be used in the future. We've worked with companies in the past where pests and old data actually did have quite a bit of value when having to go back retrained models or train new models just on new cases.

So I think finding that accurate data and finding it for all scenarios, going back to the construction worker, for example, is what if it's raining one day? What if it's snowing one day? How do you make sure that you have that easily available? How can you find it in the future?

Just for different scenarios and making sure that it's all available.

Now it's important it's, it's funny, I just came across a Reddit thread actually last night about someone building a motto on basically video surveillance. And so they were relabeling kind of video surveillance videos for, for objects to identify. And they were talking about how, they're trying to understand, like, how long do we keep the data for? I mean, how, do you do? I mean, if you really need to go back a couple of years in the data, retrieved a model and we could have called it. That are gonna be kinda backfill. I mean, you may have inferences that you're running on current data that's coming in, but then as you train a new model, having, how do you can run it back on on other data. How do you find old data involved in the training? It kind of works both ways. But I think it's, one of the things we've actually seen is, cost is a key factor of, I mean, do you keep all your data in kind of cloud storage and kinda kinda hot or cold? Data. TO you keep all that data hot. Or can you say, OK, after 90 days we age this off to cold storage, it these days, or whatever, half the price, or whatever it is?

Or even put it in archiving form, which is maybe say, let's 25% of the cost.

And you want to have balance of access to all your data, but you also don't want to break the bank in terms of the storage and all that. And that's where a catalog is actually much cheaper.

I mean, if you have a catalog of that metadata in a database, mean, you can actually push off some of your data into colder storage.

And, I mean, really, optimize your your data, you, Susan, if you need to re warm it.

I mean, you're saying, oh, I need to retrain, now Pull my data back into something that I can run compute on.

Having that ability becomes a really powerful value. And it's really kind of that kind of multi tier of storage, and I think that's going to be another area as these.

as volumes keep growing. I mean, that just managing costs is going to be a big part of what people are focused on as much as the problem.

That's exactly it.

And kind of what I hinted at, the technology of being able to auto label something is, how do I get through as much data as possible, determine if there's value there as fast as possible as well, Which I think becomes key as well. So, it's, yeah. You know, it's aged, or determining the value quickly allows you to be more efficient with what that cost. Exactly. Yeah, because, I mean, you can I mean, that's a really nice thing. You can do a query on a catalog without physically touching the images. And so, that's, that's separating that logical and physical, and then you can, you can go to optimize each one differently. But that's, that's, that's really important.

Absolutely, so, I'm going to fly through the rest of the slides, because I'd love to get to question and answer.

I think not just giving a simple answer, but double clicking on some of those questions could be fun as well. So, I'm gonna move forward a little bit. And I kind of hinted at the technology and tools speeding up that data enrichment, it's really becoming a hybrid of what can humans do. What can computers and machines do to really, again, bring value to that data and do it quickly. So I think that that becomes extremely important.

And, again, yeah, I hinted at the capabilities of Cloud factory and we really on around a mission to create one million talented people in developing nations, just give them the best work and opportunity available.

So I will let Michael jump on and Passover, any questions?

OK, yeah, Alex, Kurt, thanks. That was really a great discussion, Really insightful about increasing the value and usability of AI training data and you talk to especially about the importance of context. That was something that certainly, I hadn't thought about.

My hope that our audience also found that valuable given some of the recent changes in the EV market you talk about.

I'm sure our audience is going to be interested in these approaches, and some of the strategies you talked about, as far as making data management much more efficient, so they can get those projects to work faster.

OK, so as Alex said, we're going to open things up for question and answer.

As I said earlier, if you haven't already go ahead and take a look at the Questions window on your screen.

You can go and enter your questions there for Alex and Kirk and also remind you that we have a document, the handouts that you can download.

So let's see.

The first question we've got OK, this will be for Alex. But what kinds of Dataprep and annotations can Cloud factory do? and what are the limitations?

Yeah. Good question.

So for the prep itself, you know, there's data management, with sorting, and filtering and search, integrations to Azure, Amazon Web Services, so on. The types of annotations are actually interesting.

And this kind of it's funny, while we're talking about autonomous vehicle data, I'm noticing a lot of the AV teams are taking on additional work. So typically, my answer would be say: bounding boxes, segmented: semantic segmentation, three-d., three-d., EQ, boyd's, things like that, video tracking, things like that. But it's interesting that more and more of the types of annotations in danger enrichment that we can do is becoming into play because a lot of these teams are being asked to take on additional tasks.

So while we've typically worked with the teams that are working on L four, L five automation and AI systems, they are being tasked with, hey, we need to make sure that supply chain, if there, if we can use computer vision for supply chain, is there. Damage on packages are packages where they should be being able to read QR codes and things like that.

So the the list of types of annotations really goes further. We hear a lot of use cases for inside of the vehicle as well. So looking for drowsiness and things like that.

So some of the other things that kind of I'll say, Flow over is we've done things like post detection, where we're able to align human bodies to understand what positions they're in video tracking gaze tracking.

Sensor Fusion is always popular one. OCR and video transcription is an interesting one as well, being able to provide context of what's actually happening in a screen or being able.

Also, being able to just really complete the micro task of giving an understanding of what's happening, OK, Yeah, you hadn't really thought about that as far as your opportunities, beyond the actual driving itself, you know, things that are going on within the vehicle and outside the vehicle.

Let's see. another question. Can I send oversample data for testing?

Absolutely. Yeah. What I'll do is I'm actually gonna go to the thank you screen where we can put up our information. So, feel free to reach out to myself and Kirk and we're happy to start that discussion. Run an analysis, dig in. And a lot of times, it's not just, you know, hey, can you send me over the data? You know, we want to understand what the goal is, whether it's providing context as to what's happening in a scene. Whether it's enriching the data, what file format.

And it will be returning this analysis, but it's definitely a great start to the conversation and being able to dig in, because, you know, as we started off Yeah, really being able to see that forest is key. Kirk, any?

Any thoughts there?

Yeah, no, I mean, I think for, for us and it's worth self-service platform, so, I mean, if anybody wants to get started and just try out some data cataloging features, I'd be happy to I mean, really, just put you in touch and give you the link to to try it out. So it's a really easy to use platform that, I mean, I think works really nicely with, I mean, all the other tooling that's out there today.

And I'd love to talk to anybody that has interest in there.

OK, thanks, guys. There's another question, here. How does Cloud factory handle data security and privacy?

Good question. So I do see a lot of news articles coming up about, you know, that just the privacy and the security around the data. Nobody wants to be in a news article about data that they've collected, and it's actually kind of funny.

You know, Kirk and I had a conversation around the security and being able to keep it as private as possible.

Speaking for Cloud factory, we do have soc to data security Accreditation. It's something that we take very seriously. Our IT approach is ISO 27,000 and were certified to protect data and the project confidentiality.

So we know that everything from not just the data but anything that happens within somebody's company, they want to keep it as tight lipped and things like that as possible.

But you know that, that, that kinda hints at one other point and Curt, just kinda, going back to a previous conversation we had around the privacy of the data. I think that that was really an interesting one. Act, if you want to touch on that for just, I mean, I think, I mean, in addition to the security, which is so important and we even do things like partition physically one customer's data from another customers' data even and so, it's something we think about as well, But the privacy issue of identifying PI. I mean, if there's anything identifying faces and the ability to redact information is really, I mean somewhat overlooked.

In some areas where it's sort of an afterthought, but, I mean, so you're capturing, I mean, video from a web, a surveillance camera, or a GoPro, or a drone. I mean, the ability to, in that flow, we are talking about from acquisition to data cleansing.

I mean, part of that, cleansing maybe reduction, I mean, and because you, you want to make sure that you're actually pulling out that private information as you go through that flow. So it's, it's a really key area that sometimes becomes an afterthought. And I think it's, I mean, by moving all that upstream and saying, what am I constraints, I mean, how do I want to treat security? How do I don't want to treat PII? And just letting the platform to do it for you can be really important.

Absolutely, absolutely.

OK, I'm looking here at the Q I don't see any more questions.

So, again, Alex, Kirk, I really thank you for participating this webinar being a panelist here And I'd also like to thank our audience for participating.

Thank you. Have a great day.

Thank you so much.

Excellent, thank you.

About the Author

Alex Morales,
Industry Executive,
CloudFactory.

Alex Morales is an Industry Executive at CloudFactory. In his role, he helps innovators in the autonomous vehicle industry scale their AI initiatives. Alex has spent more than 15 years embracing big data analytics and autonomous systems, solving a variety of industry challenges by leveraging computer vision to mitigate risk, improve data quality, and accelerate innovation. Originally from Buffalo, NY, Alex holds a degree in Management Information Systems from University of New York Buffalo. Outside of work, Alex is active in the fitness community, enjoys competitive motorsports, and loves to travel.

About the Author

Kirk Marple,
Founder and CEO,
Unstruk Data.

Kirk Marple, Founder and CEO of Unstruk Data is an entrepreneur, operator, and technologist. With over 25 years of experience, he started his career in media management and entertainment at Microsoft and while at General Motors, Kirk built prototype of “big data” ingestion and processing platform for video, LIDAR and vehicle telemetry, providing guidance for development teams on streaming for Autonomous Vehicles. He shares with us entrepreneurial, approach to collaborating with customers on product development and solutions for solving the biggest problems.

View full post