Create an app with the Gemini API and LangChain

Burhan Amjad
5 min readJun 8, 2024

--

Photo by Chris Ried on Unsplash

In this project I am creating a simple ChatBot using LangChain and Google’s Gemini API. Also, I am experimenting with Project IDX, basically Project IDX is Google’s approach towards creating a cloud based text editor. If you have used Visual Studio Code, you will realize how similar it is, because it is Visual Studio Code but on your browser. You can now code on the iPad through Safari. The Project IDX has templates for different projects, such as web, mobile, AI/ML etc. You also get inline AI assistance inside any file by pressing Cmd/Ctrl + I. You can simply describe the changes you want to make to your code and Gemini in IDX will provide real-time error correction, code suggestions, and auto-completion in your code.

Let’s begin, you may select a template, as you can see there are many templates, we will be using AI/ML template > LangChain with Gemini.

Once you have selected the option you may want to name your project and environment. I have named the project ChatBot and environment is set to Python notebook (.ipynb).

Once done, you will see a boiler plate code in your main.ipynb file.

You can go through the code, its basically providing the image to the Gemini API and a prompt which is asking to get the recipe for the image.

Unfortunately, you will not be able to run as the pip environment is not yet configured.

Let’s set the pip environment.

First go to your nix.dev folder.

You are probably wondering what is .nix? nix is basically an os-level package manager that is good for experimenting. If you like to know more check this link out https://wbg.gg/blog/nixos/.

Add pkgs.python311 and pkgs.python311Packages.pip to your nix packages and rebuild the project (Assuming you are using Python 3.11.x, change depending on your Python env).

Go back to your main file and try running the project. Although you would need a Google Gemini API.

But we are going to create a ChatBot instead, so you can erase all the code from the main.ipynb file.

You can get Gemini API from here: https://aistudio.google.com/app/apikey?utm_source=project_idx&utm_medium=referral&utm_campaign=in_product&utm_content=

Let’s create the ChatBot, the below code is basically taking Gemini API.

import getpass
import os

if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google API Key")

Once you have created and entered the API key, you might have to wait 15 mins to 1 hour before it can be used.

Import Google Generative AI Chat models API.

from langchain_google_genai import ChatGoogleGenerativeAI

Set a variable LLM, select the model and write a prompt.

llm = ChatGoogleGenerativeAI(model="gemini-pro")
result = llm.invoke("Write a ballad about LangChain")
print(result.content)

You can use this in your web and mobile apps as a chatbot or as an AI assistant. It can be also used for context reference within your app to either summarize or explain in another language etc.

Streaming and Batching

ChatGoogleGenerativeAI natively supports streaming and batching. Below is an example.

Batch processing is ideal for initial training because there is usually a large amount of historical data that needs to be ingested and processed. After the first training is completed, stream processing is an ideal paradigm for training models on real-time data. This enables more adaptable, dynamic models that evolve as new data arrives.

Once the model has been installed, batch inference can be used to do inference on huge datasets, such as daily sales forecasts or monthly risk assessment. Streaming, on the other hand, allows for real-time inference, which is required for jobs such as anomaly detection and real-time recommendation engines.

for chunk in llm.stream("Write a limerick about LLMs."):
print(chunk.content)
print("---")
# Note that each chunk may contain more than one "token"
results = llm.batch(
[
"What's 2+2?",
"What's 3+5?",
]
)
for res in results:
print(res.content)

Multimodal support

To provide an image, pass a human message with contents of type List[dict], where each dict contains either an image value (type of image_url) or a text (type of text) value. The value of image_url can be any of the following:

import requests
from IPython.display import Image

image_url = "https://picsum.photos/seed/picsum/300/300"
content = requests.get(image_url).content
Image(content)
from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-pro-vision")
# example
message = HumanMessage(
content=[
{
"type": "text",
"text": "What's in this image?",
}, # You can optionally provide text parts
{"type": "image_url", "image_url": image_url},
]
)
llm.invoke([message])

There are many things you can implement within your app with the Gemini API. This project was a quick overlook on how to use Gemini API, but you can obviously create a dynamic web or mobile app where users can use to it to describe the prompt and get an answer.

--

--

Burhan Amjad
Burhan Amjad

Written by Burhan Amjad

0 Followers

Computer scientist and researcher

No responses yet