Part- 3: How to Setup Automated Batch Processing Pipeline for OpenAI API(GPT-4o-mini) ?
Introduction
I am back with part 3 , where we would look at how to check if our batch processing is completed , like automate this process using cron job in supabase and how to save all of our processed data into supabase .
If you have not visited part 1 and part 2 , I will suggest first go through them and then try to follow part which would be easier for you to follow .
Step 1 : Setup table for saving batch outputs
You will have to setup the table where you would be saving the outputs given to batch processing (OpenAI API GPT-4o-mini) .
Based on how you want to setup your table if you are utilsing a totally different data .
For imdb data that I shared in the part 1 , create a table with columns id , created_at , description , categories , summary. Set the id column without Is Identity given as below.
Let’s move to the next step once your done setting up the outputs table.
Step 2: Setup database function for the cron job
Now to setup the database function for ingesting the batch_id from the batch_processing_detail tables , you can find this in part 2 .
In the SQL editor run this code and your database function would be created.
Replace your_table_name with the table name you have created in supabase. How above code operates is it goes to the table where I have stored the batch id’s for the batch job I have created in openai. I pass this batch id pass to the API and utilise this api to check if batch job is completed or not pushes the data into new table having outputs .
Step 3 : Setup database function for the cron job
Now this cron job is setup which will run the database funciton for every 2 hrs .
Step 4 : Setting up the FastAPI code for saving the outputs to the db
Utilise the above code to fetch all the data from the batch , that is completed and insert the data to the relevant column into the supabase database.
Above snippet actually checks if the batch process is compeleted or not and and then adds the task of inserting the data into the database to a background task , so that api response should not be delayed in the database function and does not face a timeout .
Above code snippet fetches the output of the batch , loads that into the json in memory and parses the json and extracts relevant output from the batch output json and inserts the data into the output supabase table.
Conclusion
This completed the series , and tells you exactly how can setup a automated batch processing pipeline utilising open ai API , but the process sort of remains the same , where for any database you have to set triggers , to check if you have significant number of rows , pass that data to an API endpoint which would create a batch job and save the batch_job_id into our database tables as logs . Now you can create a cron job that would check for you for regular intervals if the batch processing is completed or not .