ABOUT US
OpenBB is a leading open-source investment research software platform that gives you access to high-quality financial market data and analytical tools. Our mission is to make investment research effective, powerful and accessible to everyone.
LlamaIndex is a data framework that allows users to unlock the full capabilities of Large Language Models (LLMs) over their private, external data. Over the past few months, it has become one of the most popular open-source frameworks for LLM data augmentation (context-augmented generation), for a variety of use cases: question-answering, summarization, structured queries, and more.
BACKGROUND / PROBLEM STATEMENT
With the rise of LLMs, it was only a matter of time before LlamaIndex became prevalent in the finance space. While emergent technologies like Bloomberg GPT is trained on financial information and financial documents, OpenBB is taking a different approach. With over 900 different commands accessing data from almost 100 different sources, we wanted to map natural language to these to reduce barrier of entry to new comers.
While this may not sound like a complex problem, as we just need to classify natural language queries into one of 900 possible options, there are quite a few intricacies. Among these are the tree type structure of OpenBB terminal commands. For example, if you want to view a candle chart looking at the so-called “Golden Cross” (where a 50 and a 200 day EMA cross), you would use the command stocks/ta/ema -l 50,200
. However, this requires that we have some previous context - i.e. the data should be previously selected. So the full command to use would actually be stocks/load <TICKER>/ta/ema -l 50,200
with <TICKER>
being the data of interest.
Another intricacy is that there are many different asset classes covered, which have their individual load
commands. Loading in a cryptocurrency, such as Bitcoin, is a different command than loading an equity like AAPL. This means that the language model needs to understand the difference in mapping load bitcoin
to crypto/load btc
and load AAPL
to stocks/load AAPL
.
OpenBB x LlamaIndex
So how do we accomplish this? This is where LlamaIndex comes into play. Each of our OpenBB Terminal commands has an associated usage string in the form of a typical CLI help argument. An example for the stocks load command:
You can find this on our docs here: OpenBB Docs.
If this help string is provided to an LLM, we can ask it for a command based on the context. So what we did was copy all of these command helps into txt files. In order to help out the model, we provided a few examples into each command. In our stocks load file, we add the following examples:
This process is repeated for EVERY function in the OpenBB Terminal. Once we have these, we can pass them to a Vector Index in two lines of code:
documents = SimpleDirectoryReader(GPT_INDEX_DIRECTORY / "data/").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
What this Vector Index does is load in each file and creates an embedding using the OpenAI embeddings API. In essence, this means each file is associated with a vector (a series of numbers such as [0.001, 0.002, .2, ..., 0.03]
), and these are all saved in memory through LlamaIndex.
Now that we have the context of our documentation and examples saved, it is time to query the LLM. For AskOBB, we are using OpenAI’s gpt3.5-turbo model by default, but allow users to specify others (e.g. GPT-4). We provide a prompt string to the query, indicating to only return a command and to follow certain rules when querying. The prompt string we provide is:
To get the LLM response, it is just another 2 lines of code with LlamaIndex:
query_engine = index.as_query_engine()
response = query_engine.query(prompt_string)
And that is all we need!
What this process does is takes our previously defined Vector Index, which stores all the relevant embeddings, and it creates an embedding from our prompt, which includes the query. To determine which txt file to use, the Vector Index finds the index with the “closest” embedding and provides that as context to the LLM.
More on LlamaIndex and OpenBB can be found on this webinar with Jerry Liu (CEO of LlamaIndex) and Didier Lopes (CEO of OpenBB):