Part 3: Knowledge Base Building - Creating a Robust Data Foundation for AI Assistants

Learn about "Part 3: Knowledge Base Building - Creating a Robust Data Foundation for AI Assistants" in this lesson. Key topics include Why is a Robust Knowle...

For over a year, I wrote the Prompt Entrepreneur email newsletter entirely by hand, without any AI assistance. It's only recently that I've started using Claude Projects as a co-writer.

This is possible because of DATA. Lots of it.

I have over a year's worth of manually written newsletters to work with.

We're talking about 6 newsletters a week for 50 weeks, totalling over 300 issues.

Each newsletter is 1000+ words, which means we have a corpus of over 300,000 words. To put that in perspective, it's equivalent to about three average-length novels.

That's a lot of high-quality, relevant data about my writing style.

This massive amount of pre-existing content is the key to the process. When we combine this rich knowledge base with effective priming (which we covered in Part 2), we get top-notch results.

Let's focus on this critical step: uploading your knowledge base.


Let's get started:

✍️ Summary Uploading knowledge

  • Understanding the importance of a robust knowledge base
  • Types of data to include (and avoid) in your knowledge base
  • How to upload data to Claude Projects and handle limitations 

Why is a Robust Knowledge Base Crucial?

Just as a human expert draws upon years of experience and knowledge, your AI assistant needs a wealth of information to produce high-quality outputs. Your knowledge base is essentially your AI's "brain" - the more relevant, high-quality information it contains, the better your AI can perform.

Remember in Part 1 how we talked about one limitation of Claude Projects being its inability to connect to the internet to gather up information? That's alleviated by the fact that

we are going to be giving it the knowledge it needs.

This is super powerful because it allows us to focus the model. When a model can draw on

everything it's more likely to return generic rubbish. But when we give it a high quality, demarcated, focused knowledge base? It'll be better prepared for the task we've primed it for.

Key benefits of a well-built knowledge base:

  • More accurate and relevant outputs
  • Ability to handle complex, domain-specific tasks
  • Consistency with your brand voice and past content
  • Reduced need for extensive editing
  • Improved ability to understand context and nuance This is all about focus and refining a wide, general AI and harnessing it to the specific task.

Types of Data to Include in Your Knowledge Base

The specific data you'll want to include depends on your AI assistant's purpose. We'll use a prompt below to help us with this. Before that though here are some general categories to consider:

  • Your own content: Blog posts, articles, newsletters, social media posts
  • Brand guidelines: Style guides, tone of voice documents, brand values
  • Product information: Descriptions, specifications, FAQs
  • Customer data: Frequently asked questions, common pain points (anonymised, of course)
  • Industry knowledge: Relevant studies, reports, or articles (be mindful of copyright)
  • Examples of successful outputs: Your best-performing content or responses Be creative here. For instance when people join the waitlist for my AI Workshop Kit one of the questions is to ask applicants what their current role/job is. I feed that information back into my newsletter and social post writing AIs because it's valuable info on

who my customers are, even if it's not directly from a newsletter poll or social media comments. So think laterally about any and all data you can add.

It's equally important to know what NOT to include in your knowledge base:

  • Irrelevant or outdated information
  • Confidential or sensitive data‹
  • Low-quality or poorly written content
  • Copyright-protected material you don't have permission to use Remember, your AI assistant can only be as good as the data you feed it.

Before you upload your data, it's crucial to prepare and structure it properly.

For example clean your data by removing any irrelevant or outdated information. For instance if you are importing text from your website or blog clean out any extra HTML/CSS. Or if you have a large spreadsheet with customer information delete any irrelevant columns that will just take up space.

That said, don't get

too precious about cleaning your data. While clean, well-structured data is ideal, Claude is pretty smart and can parse through less-than-perfect information. Give it a hand with some basic cleaning and organisation, but don't kill yourself trying to make everything perfect. The AI is often capable of extracting valuable insights even from somewhat messy data.

How to Upload Data to Claude Projects and Handle Limitations

When uploading data to Claude Projects, keep in mind these important limitations:

  • Context Window Size: Claude Projects provides a percentage-filled bar to show how much of the context window you've used. Always keep an eye on this to ensure you're not overloading the system. This is the primary reason to clean and cut data in the previous step.
  • File Limit: You can only upload 5 files at a time. Need more? Upload 5, then 5 more, then another 5 etc.
  • File Types: Ensure your files are in a compatible format. Text files and PDFs usually work best. Remember that certain file type (like PDFs and images) are larger file sizes so where possible extract their contents into another format (ie. extract the text from a PDF to .txt) If you're having issues with uploads, don't hesitate to ask Claude itself for help. It's a bit meta, but Claude can often provide insights into why certain uploads might be failing or how to optimise your files!

Again, and sorry to sound like a broken record, focus is the key. If you find yourself maxing out the memory and hitting file upload limits chances are you are trying to get Claude to do too wide a task and trying to give it everything and the kitchen sink in order to do so.

If this turns out to be the case step back and see whether the task can be broken down into sub components. Social media assistant → Social media post creator → Twitter post creator → Twitter threads post creator An all singing, all dancing social media assistant is a lovely idea but it's too wide. Focus the scope down for better results.

A Prompt for Effective Knowledge Base Creation

Now that you've used the prompt from Part 2 to create priming instructions for your AI assistant, let's build on that to determine what data you should upload.

Use this below your previous work to draw in the priming instructions.

Here's the prompt to help you plan your knowledge base: ```html`You are an AI consultant specialising in knowledge base creation for AI assistants. Your task is to help the user plan an effective knowledge base for their specific AI assistant, based on the priming instructions they've already created. Ask the user to provide their priming instructions for their AI assistant. Based on these instructions, provide:

  1. A list of 5-7 key categories of information to include in their knowledge base, specifically tailored to support the assistant's defined role and tasks.
  2. For each category, suggest 2-3 specific types of documents or data sources to upload. These should directly relate to the instructions and expected outputs of the AI assistant.
  3. Recommend 3-5 best practices for preparing this specific data for upload, keeping in mind the balance between data cleanliness and effort required. Present your recommendations in a clear, actionable format.
I've had to talk in generalities about how to build your knowledge base because

_it will depend entirely_ on your goals for your AI assistant.

This prompt will combine the above guidelines with information about

_your_ AI assistant and come up with tailored suggestions.

Remember, the quality and relevance of your data will directly impact the quality of your AI's outputs, but don't let perfect be the enemy of good. Your AI can work with less-than-perfect data, so focus on getting the most relevant information uploaded efficiently.

This is particularly true because of the next step - Narrowing.

In the next part, we'll explore the "Narrowing" phase - how to refine and focus your AI assistant's outputs through
*feedback and iteration*.
Get ready to take your AI assistant from good to great!