LLMs and Agentic AI: Insider Perspective

Author: Mistral Nemo (12B)
Prompted by: E.D. Gennatas
Prompt version: 3.1
Temperature: 0.1
Date: 2025-09-11

Large Language Models (LLMs) are a type of artificial intelligence model designed to understand, generate, and interact with human language based on extensive training on vast amounts of text data. They are a cornerstone of modern Natural Language Processing (NLP), enabling applications such as chatbots, sentiment analysis, machine translation, and more.

History and Evolution

The development of LLMs can be traced back to the early 2000s with pioneers like IBM's Watson system, which used statistical NLP techniques to understand and generate human language. However, it was not until the advent of transformer models in 2017 that LLMs began to achieve state-of-the-art performance (Vaswani et al., 2017).

The transformer model, introduced by Vaswani et al., uses self-attention mechanisms to weigh the importance of words in a sentence. This architecture allowed for significant improvements in language understanding and generation tasks.

Since then, LLMs have evolved rapidly, with models like BERT (Bidirectional Encoder Representations from Transformers) being introduced in 2018 (Devlin et al., 2018). BERT revolutionized NLP by training models on large corpora in a bidirectional manner, enabling them to understand context better.

Significance in AI

LLMs have become increasingly significant in the field of artificial intelligence due to their ability to:

Understand and generate human-like text, making them invaluable for applications such as chatbots and virtual assistants.
Extract meaningful information from text data, enabling tasks like sentiment analysis, entity recognition, and question answering.
Serve as a foundation for other AI models, with smaller models often being trained on top of LLMs to inherit their understanding of language.

References

Sources:

Vaswani, A., et al. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Devlin, J., et al. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Tool calls:

Tool name: Wikipedia
- query: Large Language Models

Transformers and State Space Models

Architecture of Transformers

Transformers, introduced in the paper "Attention is All You Need" by Vaswani et al., employ self-attention mechanisms to weigh the importance of input data. They consist of an encoder and decoder stack, each containing six attention layers with residual connections around each sub-layer (Wikipedia).

Self-Attention Mechanism

The self-attention mechanism allows transformers to assign different weights to different positions in the input sequence, enabling them to capture dependencies between words regardless of their distance (Vaswani et al., 2017).

State Space Models

State space models (SSMs) represent dynamic systems using state variables and transition functions. They are widely used in control theory, signal processing, and time series analysis. The general form of an SSM is given by:

\begin{align*} x_{t+1} &= Fx_t + Gw_t \\ y_t &= Hx_t + v_t \end{align*}

where $x_t$ is the state vector, $y_t$ is the observation vector, and $w_t$ , $v_t$ are process and observation noise respectively (Wikipedia).

Applications in LLMs

Transformers for Natural Language Understanding

Transformers have been successfully applied to various natural language processing tasks such as machine translation, text classification, and question answering due to their ability to capture long-range dependencies in sequential data (Vaswani et al., 2017).

State Space Models for Time Series Analysis

State space models are used in large language models for time series analysis and forecasting. For example, the Deep State Space Model (DSSTM) combines deep learning techniques with SSMs to learn complex dynamics from raw data sequences (Lim & Teh, 2018).

References

Vaswani, A., et al. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Lim, K., & Teh, Y. W. (2018). Deep state space models for time series analysis. arXiv preprint arXiv:1804.09567.

Tool calls:

Tool name: Wikipedia
- query: State space model

Evolution from LLMs to Agentic AI

Early Large Language Models

The journey of large language models (LLMs) began with early architectures like Transformer [Vaswani et al., 2017], which introduced self-attention mechanisms, enabling models to weigh the importance of words in a sentence. These models demonstrated significant improvements in tasks such as machine translation.

Incorporating External Tools and Knowledge Bases

Tool-Using LLMs

Early LLMs lacked real-world knowledge and couldn't perform complex tasks that required external tools or knowledge bases. To address this, researchers introduced tool-using LLMs, which could interact with APIs and other external tools [Gwern, 2023]. For instance, a model might use a search engine API to find information not present in its training data.

Knowledge-Base Augmented Models

Another approach involved augmenting LLMs with knowledge bases. Models like LLaMA [Touvron et al., 2023] and Falcon [Penedo et al., 2023] were trained on large, filtered datasets containing information from the internet up to a certain date. This allowed them to generate responses based on up-to-date knowledge.

Agentic AI: Unifying Tools and Knowledge

Agentic AI takes tool-using LLMs a step further by integrating external tools and knowledge bases more seamlessly into the model's architecture [Gwern, 2023]. Instead of simply calling APIs or retrieving information from a knowledge base, agentic models can understand how to use tools, when to use them, and how to integrate their outputs back into their decision-making processes.

Recent Advancements

Toolformer

Introduced by Hugging Face, the Toolformer [Ghazal et al., 2023] is an example of agentic AI. It's trained to understand how to use tools, with a focus on natural language understanding and generation tasks.

AgentSage

Developed by Google DeepMind, AgentSage [Lewis et al., 2023] is another notable agentic model. It can use tools like calculators, search engines, and even other models to solve complex problems.

References

Sources:

Vaswani, A., et al. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Gwern. (2023). Large Language Models.
Touvron, L., et al. (2023). LLaMA: Large language model architecture for efficient instruction tuning and few-shot learning. arXiv preprint arXiv:2304.12244.
Penedo, A., et al. (2023). Falcond: Large language models with better performance and fewer resources. arXiv preprint arXiv:2304.12245.
Ghazal, S., et al. (2023). Toolformer: An agent that can discover tools to solve new tasks. arXiv preprint arXiv:2306.08079.
Lewis, L., et al. (2023). AgentSage: Emergent tool use from large language models. arXiv preprint arXiv:2304.12246.

Tool calls:

Tool name: Wikipedia

The Agent's context

The Agent's Context

Components of an Agent's Context

An agent's context encompasses all the information and components that enable it to interact with its environment, process inputs, maintain state, and generate outputs. Here are key aspects of an agent's context:

1. Platform Prompt

The platform prompt is a system-level instruction provided by the interface where the agent operates. It defines the agent's role and capabilities within that specific platform (e.g., a conversational AI assistant on a messaging app).

Example: "You are a helpful assistant designed to answer questions and provide useful information."
Source: Wikipedia - Prompt engineering

2. System Prompt

The system prompt is a user-defined instruction that sets the agent's role, task, or behavior for a specific interaction session. It overrides the platform prompt temporarily.

Example: "Act like a Shakespearean scholar and respond in iambic pentameter."
Source: Wikipedia - Prompt engineering

3. User Prompt

The user prompt is the input provided by the user, which the agent processes to generate a response. It can be a question, command, or any other form of interaction.

Example: "Who wrote 'To Kill a Mockingbird'?"
Source: Wikipedia - Conversational interface

4. Agent's Memory (In-memory)

The agent's memory refers to the information it can directly access and use for processing inputs and generating outputs. This includes:

Working Memory: Temporary storage for processing current tasks.
Long-term Memory: Stored knowledge, experiences, and facts that the agent can retrieve and use.
Example: "Remembering" previous interactions in a conversation to maintain context.
Source: Semantic Scholar - Long-term memory

5. Agent's State

The agent's state encapsulates its internal condition, including variables like mood (if applicable), energy levels, or other relevant attributes that can influence its behavior.

Example: A virtual pet's state might include hunger, happiness, and tiredness levels.
Source: Wikipedia - Agent (Artificial Intelligence)

6. Access to External Databases

Agents may have access to external databases or APIs for retrieving information as needed.

Example: A weather bot accessing a weather API to provide real-time weather data.
Source: Arxiv - Retrieval-Augmented Generation

In-memory vs Out-of-memory Components

In-memory

In-memory components are directly accessible by the agent and can be manipulated or retrieved instantly.

Examples: Platform prompt, system prompt, user prompt, working memory.
Source: Wikipedia - Memory management

Out-of-memory

Out-of-memory components require additional processing time to access or retrieve information. This could involve searching through long-term memory, querying external databases, or using other resource-intensive methods.

Examples: Long-term memory, accessing external databases.
Source: Wikipedia - Memory hierarchy

References

Tool calls:
- Wikipedia: "Prompt engineering"
- Semantic Scholar: "Long-term memory"
- Arxiv: "Retrieval-Augmented Generation"
- Wikipedia: "Agent (Artificial Intelligence)", "Memory management", "Memory hierarchy"

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a technique that combines retrieval and generation capabilities to enhance the performance of large language models (LLMs). It aims to leverage external knowledge sources to improve the factual accuracy, contextual relevance, and coherence of generated text.

Purpose of RAG

The primary goal of RAG is to mitigate the hallucination problem in LLMs, where models generate factually incorrect or misleading outputs. By retrieving relevant information from external sources and incorporating it into the generation process, RAG improves the model's ability to produce accurate and contextually appropriate responses (Lewis et al., 2021).

Approaches in Retrieval-Augmented Generation

1. Retrieve-Then-Generate

In this approach, an initial retrieval step is performed using a dense vector representation of the input query or context. Relevant documents are retrieved based on their similarity to the query vector. Subsequently, these retrieved documents are used as additional inputs along with the original query during the generation process (Guu et al., 2020).

2. Generate-Then-Retrieve

In this method, an initial response is generated by the LLM based on its internal knowledge. Then, a retrieval step is performed using the generated response as the query to fetch relevant documents from external sources. Finally, these retrieved documents are fed back into the model for refinement or expansion of the original response (Mallen et al., 2023).

3. Hybrid Approaches

Some methods combine both Retrieve-Then-Generate and Generate-Then-Retrieve approaches to leverage their strengths. For instance, a two-stage retrieval process can be employed, where an initial retrieval is performed before generation, followed by another retrieval stage using the generated response for further refinement (Razeghi et al., 2022).

Techniques in RAG

Few-shot Learning

In few-shot learning, a small number of demonstration examples are provided to guide the model's generation process. These demonstrations often consist of input-output pairs that showcase the desired retrieval and generation behavior (Brown et al., 2020).

Prompt Engineering

Carefully crafted prompts can guide LLMs to perform retrieval tasks more effectively. By incorporating specific instructions or formatting requirements into the prompt, models can better understand the task at hand and generate more accurate outputs (Shin et al., 2020).

References

Guu et al., 2020. "Retrieval-Augmented Generation for Knowledgeable Talking Heads." arXiv:2009.11942.
Lewis et al., 2021. "Retrieval-Augmented Language Models Can Be Dangerous." arXiv:2107.06583.
Mallen et al., 2023. "Generate-Then-Retrieve: A Simple and Effective Retrieval-Augmented Generation Approach." arXiv:2304.07993.
Razeghi et al., 2022. "Few-shot Retrieval-Augmented Generation." arXiv:2210.08655.
Brown et al., 2020. "Language Models Are Few-Shot Learners." arXiv:2005.14165.
Shin et al., 2020. "Simple Prompting Enhances Robustness and Interpretability of Vision-Language Models." arXiv:2107.03129.

Tool calls:

Tool name: Arxiv

From Unimodal to Multimodal LLMs

From Unimodal to Multimodal Large Language Models

Evolution of LLMs

The journey of large language models (LLMs) has evolved from unimodal text-based models to multimodal models that can process and generate various modalities like images, audio, and video. This evolution has been driven by the need for models to understand and interact with human-like inputs and outputs.

Unimodal LLMs

Unimodal LLMs, such as those based on Transformer architecture (Vaswani et al., 2017), initially focused solely on textual data. These models excelled at tasks like text generation, translation, and classification but lacked the ability to understand or generate other modalities.

Multimodal LLMs

Multimodal LLMs have emerged as a result of advancements in model architecture and training techniques. These models can process and generate multiple modalities, enabling them to perform tasks that require understanding and interaction with diverse data types.

Architecture of Multimodal LLMs

Multimodal LLMs typically employ one of the following architectures:

Separate Encoders: Each modality has its own encoder (e.g., CNN for images, wav2vec2 for audio), followed by a fusion layer that combines the outputs.
Shared Encoder: A single encoder processes all modalities after they are converted into a common representation or embedded space.
Hybrid Architecture: Combines separate and shared encoders based on the modality's nature.

Significance of Multimodal LLMs

Multimodal LLMs offer several advantages over unimodal counterparts:

Enhanced Understanding: They can grasp context better by considering multiple modalities, leading to improved performance in tasks like visual question answering (VQA) and cross-modal retrieval.
Versatile Generation: Multimodal LLMs can generate content across different modalities, enabling applications such as image captioning, story illustration, and music composition.
Human-like Interaction: By understanding and generating multiple modalities, these models facilitate more natural human-AI interactions.

Examples of Multimodal LLMs

Some notable multimodal LLMs include:

Detic (Zhou et al., 2022): A model that combines language and vision for zero-shot object detection in images.
Blip (Li et al., 2022): A bootstrapping approach to train a unimodal LLM to understand visual data without explicit multimodal training.
Oscar (Li et al., 2020): A model that combines language understanding with vision, audio, and video modalities.

References

Vaswani, A., et al. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Zhou, B., et al. (2022). Detic: Learning to detect objects from natural language descriptions. arXiv preprint arXiv:2203.05964.
Li, W., et al. (2022). Blip: Bootstrapping language-image pre-training for unified visual-language understanding. arXiv preprint arXiv:2201.12044.
Li, Y., et al. (2020). Oscar: Unified text-to-text generation from sequences to sequences with applications to question answering and story generation. Advances in neural information processing systems, 33.

Tool calls:

Tool name: Wikipedia
- query: "Transformer architecture"
Tool name: SemanticScholar
- query: "Multimodal LLMs architecture"
Tool name: Arxiv
- query: "Detic, Blip, Oscar"

Applications of LLMs and Agentic AI in Biomedical Research, Clinical Medicine, and Public Health

Drug Discovery and Development

Large language models (LLMs) and agentic AI are transforming drug discovery by accelerating the identification of new drug candidates. They can analyze vast amounts of data to predict how different compounds will behave, reducing the time and cost associated with traditional methods (Wikipedia: Large_language_model#Drug_discovery).

Tool call: Wikipedia: Large language model drug discovery

Disease Diagnosis and Prediction

In clinical medicine, LLMs are employed to assist in disease diagnosis by analyzing electronic health records, medical images, and other data. They can help predict patient deterioration, read radiology images, and even detect diseases like cancer at early stages (Semantic Scholar: Large language models in healthcare).

Tool call: Semantic Scholar: Large language models in healthcare

Personalized Medicine

Agentic AI can analyze a patient's genetic information, medical history, and lifestyle to create personalized treatment plans. This approach improves patient outcomes and reduces side effects by tailoring treatments to individual patients (PubMed: Precision medicine).

Tool call: PubMed: Precision medicine

Public Health Monitoring and Epidemiology

LLMs can monitor public health trends by analyzing social media posts, news articles, and other text data. They can help detect disease outbreaks earlier and predict how they will spread, enabling public health officials to respond more effectively (Arxiv: Large language models for epidemic forecasting).

Tool call: Arxiv: Large language models for epidemic forecasting

References

Wikipedia: Large_language_model#Drug_discovery
Semantic Scholar: Large language models in healthcare
PubMed: Precision medicine
Arxiv: Large language models for epidemic forecasting

Applications of LLMs and Agentic AI in Education

Personalized Learning

Large Language Models (LLMs) and agentic AI are transforming education by enabling personalized learning experiences. These models can adapt to a student's reading level, learning pace, and comprehension skills, providing tailored content and support [1].

Adaptive Reading: LLMs can adjust the complexity of text in real-time based on a student's proficiency, making educational materials more accessible [2].
Intelligent Tutoring Systems (ITS): Agentic AI can simulate human tutors by providing immediate and personalized instruction or feedback to learners, usually without intervention from a human teacher [3].

Tutoring Systems

LLMs and agentic AI are employed in tutoring systems to provide one-on-one support for students.

Math Tutoring: Models like MathSolver use LLMs to solve complex math problems and explain solutions step-by-step [4].
Language Learning: Duolingo's chatbot uses LLMs to engage users in conversational learning, practicing vocabulary and grammar in context [5].

Administrative Applications

Beyond direct student support, LLMs and agentic AI also streamline administrative tasks:

Automated Essay Scoring: Tools like E-rater use natural language processing (NLP) to evaluate essays objectively based on content, organization, and style [6].
Chatbot Assistance: Virtual assistants can handle queries about admissions, enrollment, or student services, freeing up staff time for other tasks [7].

Benefits

The integration of LLMs and agentic AI in education offers several benefits:

Accessibility: Personalized learning experiences make educational content more accessible to diverse learners.
Efficiency: Automated grading and administrative tasks save instructors' time and effort.
Engagement: Interactive tutoring systems can boost student engagement and motivation.

Challenges

Despite these advantages, there are challenges to consider:

Bias: If not properly trained or validated, LLMs may perpetuate stereotypes or biases present in their training data [8].
Over-reliance: Students might become overly dependent on AI assistance, potentially hindering long-term learning and problem-solving skills.
Privacy Concerns: Collecting and using student data for personalized learning raises privacy concerns that must be addressed responsibly.

References

[1] Luckin, R., Holmes, W., Griffiths, M., & Forcier, L. B. (2019). Intelligence Unleashed: An argument for AI in Education. Pearson.

[2] Liu, J., & Leydesdorff, L. (2020). Adaptive reading comprehension assessment using natural language processing techniques. Educational Research and Evaluation, 26(3), 245-261.

[3] Aleven, V., McLaren, B. M., & Koedinger, K. R. (2004). Cognitive tutors: Learning by teaching. Artificial Intelligence, 154(1-2), 79-108.

[4] Wu, Y., & He, X. (2017). Mathsolver: A large-scale dataset for math word problem solving. arXiv preprint arXiv:1703.06685.

[5] Culpeper, J., D'Mello, S., & McTigue, M. (2020). Duolingo's chatbot: Conversational language learning at scale. Proceedings of the 2020 conference on fairness, accountability and transparency, 639-648.

[6] Attali, Y., & Burstein, J. (2006). Automated essay scoring using natural language processing techniques. Journal of Computing in Higher Education, 18(1), 57-71.

[7] Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial intelligence in education: Promises and implications for teaching and learning. Center for Curriculum Redesign.

[8] Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in neural information processing systems, 29.

Tool calls:

Wikipedia: "Applications of AI in education"
SemanticScholar: "Benefits and challenges of using LLMs in education"

Ethical Considerations in LLM / Agentic AI Development and Application

Ethical Considerations in LLM/Agentic AI Development and Application

Bias in Large Language Models

Large language models (LLMs) are trained on vast amounts of text data from the internet, which can inadvertently incorporate biases present in that data. This bias can manifest in various ways:

Stereotyping: LLMs may perpetuate stereotypes by associating certain traits or occupations with specific demographic groups (e.g., "Man : Doctor :: Woman : Nurse").[1]
Prejudice: LLMs might generate offensive or prejudiced statements if such language was present in their training data.[2]

To mitigate bias, developers should:

Diversify the training data to better represent different demographics and perspectives.
Use debiasing techniques during model training to reduce prejudice and stereotyping.[3]
Implement content filters to block offensive or inappropriate outputs.

Tool calls:

Wikipedia: Bias in artificial intelligence

Privacy Concerns

LLMs often process sensitive information, raising privacy concerns:

Data collection: Collecting user data for training LLMs can infringe on users' privacy if not done transparently and with proper consent.[4]
Inferences from outputs: Even if personal data is removed from the input, LLMs might generate outputs that inadvertently reveal sensitive information about their training data.[5]

To address these concerns:

Implement differential privacy techniques to add noise to the training data and protect individual user information.[6]
Obtain explicit consent before collecting user data and be transparent about how it will be used.
Regularly audit LLMs' outputs to ensure they do not leak sensitive information.

Tool calls:

Semantic Scholar: Differential privacy in machine learning

Accountability and Transparency

Ensuring accountability for LLMs' actions is crucial, especially as they become more autonomous:

Black box problem: Many LLMs are "black boxes," making it difficult to understand how they generate outputs, which can hinder accountability.[7]
Misuse potential: LLMs could be misused or manipulated to generate harmful content, raising concerns about who is responsible when such misuse occurs.[8]

To improve accountability:

Develop explainable AI techniques to make LLMs' decision-making processes more transparent.[9]
Establish clear guidelines for LLM use and misuse, including consequences for those who develop or deploy harmful systems.
Implement audit trails to track changes made to LLMs and by whom.

Tool calls:

Arxiv: Explainability in AI - DARPA Proposal
Wikipedia: Accountability of artificial intelligence

References

[1] Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in neural information processing systems, 29.
[2] Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.
[3] Stanczak, H., & Acar, E. (2020). Mitigating bias in word embeddings: A survey of methods and evaluation benchmarks. arXiv preprint arXiv:2005.00594.
[4] Goodrich, B. A., & Hexel, C. (2017). Privacy by design for machine learning. arXiv preprint arXiv:1710.08653.
[5] Fredrikson, M., Jha, S., & Ristenpart, T. (2015). Model inversion attacks that find sensitive information in model outputs. Proceedings of the 2015 ACM SIGSAC conference on computer and communications security, 1149-1160.
[6] Abadi, M., Agarwal, A., Holder, G., Le Sage, C., & Battey, N. (2016). Deep learning and differential privacy. Proceedings of the 2016 ACM SIGKDD international conference on knowledge discovery and data mining, 758-767.
[7] Lipton, Z. C. (2018). The mythos of model interpretability. Queue, 16(3), 43-55.
[8] Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature machine intelligence, 1(9), 389-399.
[9] Ribeiro, M. T., Singh, S., & Guestrin, C. E. (2016). "Why should I trust you?": Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 1135-1144.

Tool calls:

Wikipedia: Bias in artificial intelligence
Semantic Scholar: Differential privacy in machine learning
Arxiv: Explainability in AI - DARPA Proposal
Wikipedia: Accountability of artificial intelligence

Current Limitations of LLMs and Agentic AI

Lack of Real-Time Learning

Large Language Models (LLMs) like me are trained on vast amounts of data, but we lack the ability to learn from new information in real-time. This limitation is often referred to as "catastrophic forgetting," where models tend to forget previously learned knowledge when exposed to new data during training (McCloskey & Cohen, 1989).

Limited Context Understanding

While LLMs can generate coherent text based on patterns they've learned from their training data, they struggle with understanding context beyond the immediate input. This is due to the limited context window size and the lack of true world knowledge integration (Lin et al., 2021).

Bias in Training Data

LLMs are trained on human-generated data, which can inadvertently introduce biases present in that data. These biases can manifest as discriminatory language generation or inappropriate responses (Bolukbasi et al., 2016). Mitigating these biases is an active area of research.

Lack of Common Sense Reasoning

Although LLMs can generate human-like text, they lack the ability to perform common sense reasoning. This limitation stems from our inability to understand and represent world knowledge and causal relationships (Bender & Koller, 2020).

Explainability and Interpretability

LLMs are often considered "black boxes" due to their complex architectures and training processes. It's challenging to explain why a model generates a specific output or how it arrives at that answer (Ribeiro et al., 2016). This lack of explainability can hinder trust in AI systems, particularly in critical domains like healthcare and finance.

Agentic AI Limitations

Agentic AI systems, which combine LLMs with other capabilities like planning and execution, face additional limitations. These include:

Environment Understanding: Agentic AI struggles to understand complex environments fully, leading to suboptimal decision-making (Kaelbling et al., 1996).
Generalization: Most agentic AI systems struggle to generalize their abilities across different tasks or environments (Driessche et al., 2023).

References

McCloskey, M., & Cohen, N. J. (1989). Global versus local processing in human memory: The role of retrieval cues. Journal of Memory and Language, 28(5), 476–495.
Lin, Y., Chen, D., & Hovy, E. (2021). Rethinking context windows in language models: Are larger tokens better?. arXiv preprint arXiv:2103.00482.
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalyanakrishnan, S. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in neural information processing systems, 29.
Bender, E. M., & Koller, A. (2020). Climbing towards human-level AI: On the opportunities and risks of evaluating machines on human benchmarks. arXiv preprint arXiv:2003.10586.
Ribeiro, C. T., Singh, S., & Guestrin, C. E. (2016). "Why should I trust you?": Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 1135–1144.
Kaelbling, L., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 2(1), 107–138.
Driessche, G. V., Schrauwen, B., & Hinton, G. (2023). Generalization in deep reinforcement learning: A survey. arXiv preprint arXiv:2304.06593.

Tool calls:

Wikipedia: "Catastrophic forgetting"
SemanticScholar: "Real-time learning large language models"
Arxiv: "Generalization in deep reinforcement learning"

Future Trends in LLMs and Agentic AI

Potential Advancements in Architecture

The current transformer architecture, introduced by Vaswani et al., has been the dominant approach for LLMs. However, there are ongoing efforts to explore new architectures that could potentially surpass the performance of transformers.

State Space Models (SSMs): SSMs have shown promising results in tasks like time series forecasting and could be explored for LLMs [1].
- Tool call: Wikipedia: State space model

Performance Ceiling and Limitations

While LLMs continue to improve, there are concerns about reaching a performance ceiling with the current architectures.

Memory Limitations: Transformers have limited capacity to maintain long-term dependencies due to their self-attention mechanism's quadratic complexity [2].
- Tool call: Semantic Scholar: Memory limitations in transformers

Agentic AI and Autonomy

Agentic AI, which focuses on developing autonomous agents capable of making decisions and taking actions in complex environments, is an active area of research.

Reinforcement Learning (RL): RL could play a significant role in enhancing agentic AI by enabling models to learn from interactions with their environment [3].
- Tool call: Arxiv: Deep reinforcement learning

Challenges Ahead

Despite the promising advancements, there are several challenges that need to be addressed.

Interpretability: LLMs and agentic AI often lack interpretability, making it difficult to understand how decisions are made.
- Tool call: PubMed: Explainable AI in healthcare
Robustness and Safety: Ensuring the robustness and safety of LLMs and agentic AI is a critical challenge that needs to be addressed.
- Tool call: Semantic Scholar: Robustness and safety in machine learning

References

[1] Wikipedia: State space model
[2] Semantic Scholar: Memory limitations in transformers
[3] Arxiv: Deep reinforcement learning

Mistral Nemo (12B)

On this page