Hugging face's learning

Hugging Face is a French-American company with a focus on building tools and resources for developers and researchers working in the field of artificial intelligence, particularly natural language processing (NLP).

Image-to-text models

Pix2struct
1.Pix2Struct is a pre-trained image-to-text model developed by Google AI. It's designed for tasks involving visually-situated language understanding.

2. Purpose:

Understand and interpret the structure of web pages based on screenshots.
Bridge the gap between visual information and textual understanding.

3. Functionality:

Takes a masked screenshot of a webpage as input (where parts of the image are hidden).
Predicts the underlying HTML structure corresponding to the visible parts of the screenshot.
This process essentially translates visual elements like text, buttons, and images into a textual representation (HTML).

Input:

ImageURL:"https://encrypted-tbn0.gstatic.com/images q=tbn:ANd9GcRIzR7FkRTxtF7KKmAjylwIVcNQ3bjhJ0uN_pXXdL5Qlg&s"

Output: A can of Coca-Cola sits next to a glass of beer.

Input:

ImageURL: "https://www.shutterstock.com/shutterstock/photos/1514415767/display_1500/stock-photo-elm-street-and-stop-sign-in-residential-community-at-road-intersection-on-a-clear-day-with-blue-1514415767.jpg"

Output: A stop sign is on the side of a street.

Input:

Image URL: "https://www.ilankelman.org/themes1/copenhagen.jpg"

Output: A phone that says 2139 on the front of it.

Search This Blog

Deep dive into understanding GenAI model working

Hugging face learnings

Hugging face's learning

Image-to-text models

Pix2struct
1.Pix2Struct is a pre-trained image-to-text model developed by Google AI. It's designed for tasks involving visually-situated language understanding.

Comments

Post a Comment

Hugging face learnings

Hugging face's learning

Image-to-text models

Pix2struct 1.Pix2Struct is a pre-trained image-to-text model developed by Google AI. It's designed for tasks involving visually-situated language understanding.

Comments

Post a Comment

Pix2struct
1.Pix2Struct is a pre-trained image-to-text model developed by Google AI. It's designed for tasks involving visually-situated language understanding.