Georgia Tech researchers recently presented their work at leading programming and systems conferences, focusing on static ...
SOLE is highly generalizable and can segment corresponding instances with various language instructions, including but not limited to visual questions, attributes description, and functional ...
Abstract: It is always well believed that pre-trained vision-language foundation models (e.g., CLIP) would substantially facilitate vision-language tasks. Nevertheless, there has been less evidence in ...
Abstract: Segmenting affordance in 3D data is key for bridging perception and action in robots. Existing efforts mostly focus on the visual side and overlook the affordance knowledge from a semantic ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results