So the next thing we need to do, of course, is load the model now because we imported Coco SSD with our script tag on the previous slides, this Coco SSD variable will now be available to us and we simply call it CocoSSD.load. And because this is an asynchronous operation, we use the Ven keword. So when it's ready or when a function of our choosing and pass us the loaded model. This loaded model, we can now assign to our global value model. So we know that the model is loaded and we can use it in our other code and we can then remove this invisible class in the CSS such that the demos now render correctly. So you'll see in the demo later on it goes from a greyed-out state into a nice, colourful state. So you know when to click on things. Next, we're going to get all the images that we want to be clickable and of course, I gave them a class of classified unclick in the HTML, and you're going to grab all the elements that have that class. So you're going to get an array of images here called image containers.
And we're going to iterate for all of these image containers and essentially we're going to get the child image node and add an event listener for a click event and associate a function called handle click when that element is clicked. So, if you're going into the handle click function, you can see all this does here is when it executes, an event is passed. And if the model has not already loaded —- because it does take a couple of seconds -- then we're going to return straightaway just in case someone tried to click on something before we are ready. Otherwise we're going to go ahead and call the model, .detectfunction. And this, of course, takes an image-like object as parameter. In this case, we're passing the event target, which is the image that was clicked. And again, this is an asynchronous operation. So we wait for that to finish and then it will pass the predictions to a function of our choosing. In this case, I've named it as handle.predictions. So let's dive into the handle predictions function. And essentially you can see here just a JSON object pass back, which you can just look if you wanted to see what it looks like, but essentially just an array of objects for things we think we found in the image along with their confidence, values and so on and so forth.
And essentially we can then iterate for objects as such using a [], and we can start creating some new HTML elements so that we can start rendering out the details to the screen. So all I'm doing here is I'm creating a paragraph tag and I'm sending the text of this paragraph tag to contain the class and the rounded score for the accuracy. And I then simply add some style to this paragraph. So it appears at the bottom left of the image. And then I simply add a development so I can actually have banded box and just going to be lots of dashed lines so we can have the body sharing nicely. And I'm just going to put this at the position at which the classification recognize the objects. I'm just setting the whip top and left. And with that we've got about the box. So all I need to do now is add these two elements I've created in memory to the actual DOM, the actual the Web page itself, and we're basically good to go. So finally, with some success, this is up to you how you want to start things, of course. But here I got some nice transitions and I've got some bodies attached and so on, so forth to make it look pretty.
And if we go over to the left, you can see what this is going to look like. So I'm going to stop this screen and go over to my other window at one moment to screen again.
And hopefully I can now find my way. Good stuff.
So now we can see essentially if we switch to my new window, I think I still see myself.
So you're not going to stop presenting this and go back to the slides one moment screen again and then we go back to the slides if you can present my slides again for me.
So next up, we have phase mesh, which is just three megabytes in size and essentially allows you to recognize four hundred sixty eight facial landmarks if it's just another premade model like the one you saw before. But this one is aimed around identifying parts of the face. And you can see if it can be for many interesting things, including augmented reality, such as image on the right hand side, that this is actually my multiphase, which is part of the L'Oreal group. And this lady on the right is actually not wearing any lipstick red using face mesh to understand where her lips are and then using webbed shaders to then overlay pretty graphics on a face. So it looks like it's actually on her face and looking very realistic. It can change the shape of a lipstick and so on and so forth. And which is really great in today's world where have stuck at home. They can't touch the product and maybe don't want to try and products other people have touched. So in that case, this is very useful and can help you still shop in these times. And I picked up a demo for this as well to show you running light because it is really cool. So if I just go ahead and swap my screen again, there's going to be a bit of screen shopping, screen swapping going on today. One moment this. So now you can see I'm actually doing this live in the Web browser and if you can present my screen again was moderating.
One moment during my screen. Back to the slides here.
Next up, we have body segmentation, this allows you to distinguish twenty four body areas across multiple bodies in real time. This is called our Body Picks model. And you can see here on the right hand side how this works in action. And you can see multiple bodies being detected at the same time. And all the different colors represent different parts of the body. And even better, we can get an estimate of the POWs as well. But that's the blue line you see in the image on the right hand side. Now, with a bit of creativity, we can actually emulate many of the superpowers. We can probably see in the movies by now and try to share a few examples I've created to illustrate this. The first one is invisibility. This is running live in the Web browser. And I made this in just one day and I'm able to remove myself in real time from the webcam in the browser. But notice, as I get on the web on the bed, how the bed still deforms. So if it's not some cheap trick where I'm just replacing the background of a static image, this background is being updated in real time. And I'm already moving the body pixels and I'm calculating what the background is over time. So this can take some cool effects. But what about lasers? We can once again talk about WGL.
We can have people from our community making things like this, which allows him to use Web shaders to understand how to combine this with face mesh, to shoot lasers from his mouth and eyes, just like you can in Ironman or something like this. And I thought, well, this is pretty cool, but let's just go one step further. And some of you may have seen recently on social media, I created this kind of teleportation demo that allows me to segment myself in real time. I then transmit myself using Web RTC with real time communication over the Internet to some remote location and then using Web XOL, which is what makes reality. Essentially, I can then place myself in a remote room and the person is watching me in that remote room can walk up to me that can hear me from the right angle and it can move around me and so on and so forth. So now instead of having a video conference where you're stuck in a rectangular box, you can actually be physically present almost in 3D and had to be a massive fitting and more personal feeling of meeting someone, especially the times when it's like a home and it's very hard to be out in large groups and so on and so forth. And of course, other questions can be made beyond this as well. And here's another one I created for clothing size estimation.
I don't know about you, but I'm terrible at what clothing size? I am out in the wild. I always look at my sizes and of course the body changes over time as well. So here in the 15 seconds, I can use politics to figure out what my size is, getting my measurements for chest and height, using my height, I can figure out my chest and my inner leg and waist and those measurements, which allows me to select automatically on the Web page if I'm a small, medium or large, that kind of stuff. So now I can buy stuff without having to return all the time and save time and money because of that. And of course, we've seen a community also do some great things too. And this guy from Paris and France, from our community has managed to combine it also with web and web gels. So if he can scan any magazine and bring that person from the magazine into his living room. So maybe you're interested in fashion or something like this and you can now go and inspect those clothing's in more detail in a way that is more meaningful to you. And that is he's using his mobile phone here. He's actually using it to two and a half year old Android device to do this. So it doesn't require the latest hardware to do this, but it's all running in the Web browser on that device, which is pretty, pretty cool.
There we go.
So when are you going to teach your machine with Google dot com? You are presenting a page, something like this right now to machine can allow you to train to recognize images, audio or specific poses. And these three are just the starting point. I'm sure more will be coming soon as it says here, but for today we're going to go with image. So you click on Image and by default you have two glasses, but you can add more classes, if you like, at the bottom left here. Now, I'm going to go ahead and name be seven more meaningful. So the first thing I'm going to recognize is myself. And the second thing I'm going to recognize is a deck of playing cards I've got in my room right now. So all we need to do is click on a webcam and allow access or webcam and live preview of what's coming from the camera here.
And we can use this to take samples of the objects you're interested in. So I'm just going to sit here and move my head around and get a few samples of one moment. And we get maybe a few more images of my face in various positions, and of course, if you're doing this more properly, you do this with more variety, you have more training data. But for the purpose of today's demo, that's all we need to do exactly the same thing. I'm going to hold up this deck of cards instead, so I'm going to get roughly the same number of images, about thirty eight. Forty one. That's close enough. And then I'm simply going to click train model. Now, what's going to happen here is that the tension is going to retrain the top there of this model of a teaching machine using which is actually a classifier built upon upon on it. And we can then repurpose the speed of understanding that one has already learned to then classify these objects here. And you can see in under 15 seconds, it's already trained. We're done. And you can see on the right hand side a live preview of an action. As you can see here, I predict the output is Jason, which is correct, that is showing on the live preview. And if I hold up my deck of cards, you can see the card straight away. Jason cards, Jason cards. And look how fast that was in under three minutes. We've managed to make something, but you could use for prototype to demonstrate something and you're good to go. So this is good enough for your knee. If you can simply click on export model here, you can download and you can download the various files containing the model weights and such, and you can use those on your website or CDN and then use that on any website you like. Using the Capullo to really not too hard to.
And of course, if you try to make more robust models, you would need more training data to avoid any biases and so on and so forth. And for that, as already mentioned, Google Cloud also now is cheaper for those kinds of situations.
So if I stop presenting this and go back to the slides screen.
So the next thing is, Carol, and of course, the previous president has already gone into this in much detail, so it's going to give a very high level overview here. But essentially, as already mentioned, cloud or HTML allows you to train custom models in the cloud. However, it can export to tends to vote yes, which is pretty nice, too. That's why I just like to touch on this very quickly. You can see here how someone is trying to recognize different flowers. They just updated all their photos of flowers to cloud storage and then you can put that fire out or to email, as you saw before, events at your various options. If you want to have a higher accuracy of your predictions and set your various type of parameters if you want to, and then you can continue and wait for it to train, as you just saw, you then have an option to download at the end there you download that zip bundle and unzip it onto your server and store it somewhere else even. And basically you can then use those files in your own website and you might be wondering, well, how hard is it to use vote trained from cloud if that is so easy, if it's just one slide, even easier by the time I showed you before. And let me just quickly walk you through this.
Maybe you want to reach more people or get more people to try out or something like this. As a researcher, that could be useful to you with a computer. You can basically do that. And there are some caveats with that. We don't have all the support that the core has right now. Obviously, we're playing catch up with a team with only two or three years old, has been around for much longer. So we are open source and we welcome contributions to those supports, if you want to add more on the side as well. But to a lot of models do come out about any issues. And if you're using some more exotic operators that you'll find that is an issue. And of course, you will mention which operation is not supported and you can even choose to implement that or use a different opt to get around that. Cool. So let's talk about performance then. This is for Mobile Net VTE, and you can see the running of the graphics card in Python. It takes seven point nine eight seconds and running with a graphics card in Niger is eight point eight one. So it's within the margin of error, pretty much depending which way the wind was blowing on the server on the day that we recorded this result. And it's pretty much the same. But the real beauty comes when you're doing a lot of pre and post processing.
That means any data coming from the sensors stays on the client's machine. At no point is the imagery or sound or whatever data you're grabbing is never sent to a third party server. And in today's world of privacy is top of mind. That's really important, especially if you're making some kind of health care application, as we saw before, or even something legal requirements maybe, or in Europe or DPR rules in this kind of stuff. There's many reasons you want to execute things on the client side, executing on the client side at least a point. Number two, lower latency because there's no round trip time to the server and back again, especially on mobile devices for latency involved. That could be quite high. You could be looking at one hundred milliseconds or more just to talk to the server and get a result. That means you can have higher frames per second if you classify this stuff in real time directly at the source of sensors and the Web browser has direct access to all of our senses. That point lower cost. If you're not running a server, then of course you need to keep the server running all the time to do the inference, which means no hiring of CPU's and CPU's and lots of ram running 24/7. And if you're having tens of thousands of users per month, that can lead to some serious cost savings, maybe tens of thousands of dollars per month. If you have a popular site full of point interactivity.
And if just a few people from your team are only the ones using it. And of course, my native region scale, we mentioned on the GPU we can support eighty four percent of devices via webcast and that means we can run on a MacBook Pro with a AMD graphics card. And I believe Antiflu on the server side can only support Invidia as to the best of my knowledge, using the Cuda drive. So we don't care what you have as long as it can run, but we're good to go. So the flip side of this is that on the server side, there's also some benefits here. This applies both to know Jass and Python, whichever one you're using here. So here you can see that you can use potentially save money without compassion. And which is great if you're trying to integrate with a Web team. As I mentioned before, a lot of what developers out there probably are more familiar with. And if you're working in Python, the machine learning, which is a really nice way to collaborate with other teams. Second point, you can run larger models down on the client side. Often you'll be limited by the graphics memory of the client's device as to what size model you can load and execute on that device.
Awesome. Thank you so much. That was a wonderful presentation. Lots of amazing demos. I really love that. The conclusion which is, is we are just democratizing the machine learning or part of the of the revolution as we came in and revolutionized the whole whole industry. As we see right now, colleges, we are seeing the same thing happen again.
Yeah, that's amazing. One of the questions that we got on chat was why would you want to choose to use less than an hour? And I think you've covered some of those things. But if you want to go into Malkin's.
Yeah. That is like the great way to do it. You can not only build your new models with the you can also use existing models that are built by other team members. Yeah. Awesome. Thank you so much. Orthodontia next. Presenter. Thank you.