Hugging face learnings

                   Hugging face's learning

Hugging Face is a French-American company with a focus on building tools and resources for developers and researchers working in the field of artificial intelligence, particularly natural language processing (NLP).

Image-to-text models

  • Pix2struct

  1.Pix2Struct is a pre-trained image-to-text model developed by Google AI. It's designed              for tasks involving visually-situated language understanding.

   2. Purpose:
  • Understand and interpret the structure of web pages based on screenshots.
  • Bridge the gap between visual information and textual understanding.
    3. Functionality:
  • Takes a masked screenshot of a webpage as input (where parts of the image are hidden).
  • Predicts the underlying HTML structure corresponding to the visible parts of the screenshot.
  • This process essentially translates visual elements like text, buttons, and images into a textual representation (HTML).
Input:
ImageURL:"https://encrypted-tbn0.gstatic.com/images q=tbn:ANd9GcRIzR7FkRTxtF7KKmAjylwIVcNQ3bjhJ0uN_pXXdL5Qlg&s"


Output: 
A can of Coca-Cola sits next to a glass of beer.

Input:
ImageURL:  "https://www.shutterstock.com/shutterstock/photos/1514415767/display_1500/stock-photo-elm-street-and-stop-sign-in-residential-community-at-road-intersection-on-a-clear-day-with-blue-1514415767.jpg"
                            
Output: A stop sign is on the side of a street.

Input:
Image URL: "https://www.ilankelman.org/themes1/copenhagen.jpg"
                
Output: A phone that says 2139 on the front of it.







Comments

Popular posts from this blog

Gen ai Evloution