Mastering the OpenAI Contextual Connections: Testing Done Right

AI Updates

Puja Arora

April 9, 2024

Mastering the OpenAI Contextual Connections: Testing Done Right

Source: ExpertGuru Articles

Contextual Connections can be perceived as a “chain of thoughts,” i.e., data interconnection.

ExpertGuru, our ChatGPT-based application, not only processes the provided dataset but also seamlessly incorporates the user's session and interaction history for response. This exquisite feature contributes to a deeper understanding of the user, ultimately elevating the overall user experience.

The feature looks cool but it adds a lot of complexity at the application level and can lead our ChatGPT-based application in a State of Confusion.

Concerns When History is Considered

Huge Volumes of History

When a single user engages in numerous interactions, it results in the generation of huge history data volumes. Consequently, there exists a potential for an imbalance in data considerations between business/application data sets and huge history data sets during response formulation, leading the bot into a perplexing state. During our application testing, we observed instances where the bot generated responses such as "I'm sorry for the confusion" followed by the actual response or where the historical context outweighed the current query in response formation.

Ex.

Domain: Electronics store

History: Huge history around smartwatches and their straps.

Query: Show me some headphones also with long battery life.

Response: “Sorry for the confusion” followed by a response for headphones. Another case was the bot continued to answer for smartwatches with long battery life and the current query wasn’t even considered or partially considered.

Contextually Connected History

The challenge here is when we consider a case “Is the query well connected with the history?” In this scenario, the expectation for the bot is to discern user preferences and requirements adeptly, eliminating the need for users to repeatedly provide redundant information. The user here should experience a seamless and well-formed response to queries like:

Ex.

Domain: Clothing store

History: Is around beach dresses in size XXL and the favourite colour is pink.

Queries: “Show me in blue colour as well,” “Show me office wear also,” “Show me more,” etc.

Response: Response is around office wear only in all available sizes and colours.

Expected response: It should be around office wear in XXL and around pink or blue if available.

Contextually Disconnected History

At times, it’s expected that user needs will shift from one category to another. But as our bot keeps the history intact, there comes a loophole where history and current query change in category relevance but AI will keep on crafting patterns to make the history relevant and here bot fails.

Ex.

Domain: Clothing store

History: Interaction history around kids’ wear

Query: Show me some women’s wear also

Response: “Sorry for the confusion” followed by a response for women’s wear. Another case was the bot continued to answer for kids’ wear and the current query wasn’t even considered.

ChatGPT Model Training for History

Different ChatGPT models consider history differently. So, it’s important to test the application from time to time in case of a model upgrade or change.

Ex.

In one of the models, it used to give responses like “Apologies for confusion” followed by the correct response. Another model constructs responses solely based on the content of the preceding conversation, incorporating the exact text from previous interactions. Consequently, various permutations and combinations have been observed, resulting in nonsensical responses.

History Biasing

Sometimes, our application suffers from history biasing. So, cases like over and underfitting were observed.

Ex.

The response contains text based on history and the current query was not even answered. At times, the historical context can overshadow the significance of the current inquiry, leading to an overfitting case over historical data. Conversely, there are instances where the history is disregarded and users are required to provide similar information again.

Testing Insights

Testers need to create test data for history and create test cases over that history. Scenarios should involve both similar-category history and similar-category current queries, as well as different-category history and similar-category current queries. Comprehensive planning is required to cover all potential cases, including the creation of extensive history, connected history, and disconnected history. Given the nature of ChatGPT-based applications and their various models, thorough and time-to-time examination of responses is essential to address and rectify each specific case.