Hugging face learnings
Hugging face's learning
Hugging Face is a French-American company with a focus on building tools and resources for developers and researchers working in the field of artificial intelligence, particularly natural language processing (NLP).
Image-to-text models
- Pix2struct
1.Pix2Struct is a pre-trained image-to-text model developed by Google AI. It's designed for tasks involving visually-situated language understanding.
2. Purpose:
- Understand and interpret the structure of web pages based on screenshots.
- Bridge the gap between visual information and textual understanding.
- Takes a masked screenshot of a webpage as input (where parts of the image are hidden).
- Predicts the underlying HTML structure corresponding to the visible parts of the screenshot.
- This process essentially translates visual elements like text, buttons, and images into a textual representation (HTML).
ImageURL:"https://encrypted-tbn0.gstatic.com/images q=tbn:ANd9GcRIzR7FkRTxtF7KKmAjylwIVcNQ3bjhJ0uN_pXXdL5Qlg&s"
Input:
ImageURL: "https://www.shutterstock.com/shutterstock/photos/1514415767/display_1500/stock-photo-elm-street-and-stop-sign-in-residential-community-at-road-intersection-on-a-clear-day-with-blue-1514415767.jpg"
Input:
Image URL: "https://www.ilankelman.org/themes1/copenhagen.jpg"
Comments
Post a Comment