RasPi - Room for Optimizations.... From 30,000 ms down to ~260 ms with web display

Canny edge filter - over video streamer

Another experiment this is 640X480 Color - Latency is still bad (2 seconds) - its about 2 fps on my little pi..
All previous experiments do a PyramidDown - 320X240 and Gray filter which comes out almost 4 fps .. all experiments are with VideoStreamer.  If you look at the logging you can see it has already completely sent frame 1167 and is working on converting 1168 to a SerializableImage - So if 1166 & 1167 have been sent (and I flushed the stream .. always flush before leaving ;) It's either Java, OS, network, or render time in the Browser .. so if it's only a frame or 1.5 frames behind that is going to be a second latency at least...


 

2014.1.2  Got a few more optimizations - the bulk of all of it is checked in now into the bleeding edge. Whenever there is a conversion from OpenCV to Java using the method IplImage.getBufferedImage there is potentially a 4 second delay on my raspi.  I have remove (almost) all of these methods.  Now OpenCVData contains only OpenCV data structures.  OpenCVData is the main "product" of the OpenCV service. A cycle of OpenCV's video processor will create a new OpenCVData structure and iterate though all the current filters in the pipeline, then it publishes this data so that other services can subscribe to it.

OpenCVData has become smarter in that it know's how to convert its data to other formats (BufferedImage, ByteBuffer, byte array).  A new feature of OpenCVData is when its asked for a converted data type (which takes time) - it will store the results internally.. caching them for subsequent requests.  If the same data-type is requested again, it returns the already processed image.

Now I'm searching for latency.  This is a snapshot streaming from the raspi to the laptop. The logging on the left is a ssh session on the raspi.  You can see from the logging frame #161 was sent and is currently being displayed on the browser.. the second frame #162 is being written to the socket, but has not finished..  I believe this is very quick..  Unfortunately, I did not print the timestamps on the logging DOH ! - there is a timestamp on the video frame now, so the next time I experiment with the raspi I can correlate the times & the frame numbers.  So far, it appears that the network is not the issue (or maybe not the largest issue).  The MRL message appear to be working very fast too..  more info is needed !


2014.1.1  Have a display now !

 

There is good & bad news.  Using the same python script - the OpenCV image processing and converting to a displayable image takes between 250 & 450 ms.  Not "bad"  so between 4 & 2 fps - very much improved from 30 seconds :)

There are more things still to "tweak" to try to get more speed out of it.  Possibly more type conversion enhancments. Also not printing performance data helps performance :)

The "bad" part is the latency.  Moving the image data from OpenCV to the VideoStreamer moves the data between 2 threads too.  So there is context switching overhead. But I think the biggest problem is there is a queue between the two - and the queue gets filled up because the OpenCV thread is faster than the video streamer thread, so there's a large "lag". Next I'll force the max queue size to be only 1 - this will make it more "jittery" but should reduce the lag.

 


2013.12.29

Here's example of DJ's Odroid - this is WITH crappy conversion which takes the Raspi ~3000ms to do - the Odroid does it in 10 ms - so ~14 fps even with bad conversion....

looks like the USB grab is taking the most time... wonder what kind of camera he has...
(camera is a logitech C270)

2013.12.26

WoooHoo !

Found the last copy for display, made it configurable.. you can choose to enable the display, or put it in fast non-display mode ... from 30 seconds per frame down to 140 ms :)

Down to 3 seconds.....

Got rid of 2 copies so far... One was in the filter processing, the other was in the frame grabber itself.
Now I think the IPCameraFrameGrabber is the fastest Java frame grabber at the moment.  

Details on the IPCameraFrameGrabber :
Previously, this famegrabber would pull the data off the network and convert it to a BufferedImage, then convert that to an OpenCV IpIImage.  I wanted to go directly from the network to lpllmage because the conversion on a raspi takes about 13 seconds !!! 

A problem existed in that the lpllmage needs to know the dimensions, depth and number of channels of the data from the network.  This would be a big pain trying to decode the network data - so I first created a "template" BufferedImage - which costs 13 seconds on the very first frame, but as long as the format doesn't change from the camera - the template's values are cached and used to decode subsequent frames directly !  - From 13,000 ms down to 25 ! :D

now we are from ~30 seconds down to ~7 !  ... Got more copies to get rid of... a couple might be critical to "display" images to the GUI - so I think the best strategy is to have the display configurable - so on a fast computer you can still "see" the processing - but on a raspi you have the option of shutting down all display info for max performance....   more to do !!

First red square shows time reduction from 12900 ms down to 28 ms

Second red square shows display related conversions which need more optimizations


2013.12.26

Very Poor Performance 30 seconds per frame

Senario #1

USB WebCam --> USB Camera Driver -> V4L Driver --> OpenCV --> JNI --> JavaCV --> MRL

1 frame per 30 seconds :(   

~35K ms :(

Whoa !   30 Seconds :((((

Uh.. a whole lot of room for improvement :D

Alright - now with our handy dandy performance logging (which is slightly detrimental to performance but necessary :)  we can begin to isolate where the issues are.  In this case the "grab" itself from the USB camera with the OpenCV camera frame grabber takes 12 seconds :(
Another big hit is a 13 second time sent between the pre-filter & the preProcess filter for PyramidDown...

My first guess is its related to "copying" - to verify I would need to find the copy and put performance monitors directly before and after ..

Senario #2

USB WebCam --> RasPi --> MJPeg Streamer --> LAN --> Laptop --> MRL --> GUI

(5 fps or ~200 ms per frame)

The "grab" from the raspi takes the most amount of time - with the pre-filter & display copy taking a total of 10 ms - mjpeg streamer is a great program - but 5 fps or ~200 ms 

Lets think what is happening in this senario.  Camera is capturing an image from a CCD - Wheezy webcam driver is pulling the data off from kernel space and sending it to mjpeg (via V4L drivers) - an http multi-part boundary is added and its sent over a network cable to a laptop - which copies it from its network buffers and drivers into MRL - it then becomes a MRL message and is sent to a display.  All this is happing in ~200 ms !

If control messages were sent back from the laptop to the raspi - this would be a good example of distributed processing working faster than local processing.  In this particular case its faster to ship the data off and have it processed on the laptop, versus having the raspi do it all.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
DJUltis's picture

Hey, but it crunched through

Hey, but it crunched through successfully... take the accolade!

GroG's picture

More Power Scotty !

It would be nice is Orbous didn't effectively close his eyes for 7 seconds.. blinking can be a bit of a distraction 

DJUltis's picture

I'm gonna put in a good word

I'm gonna put in a good word to the Overlords and see if they'll raise your BEPSL quota, awesome work sir!