16
I am trying to create a program that downloads a JSON file and I'm trying to convert it to sqlite.
Why?:
- I believe the training data on Deepdanbooru is weak compaired to Rule34 or Gelbooru. I have tried compiling the C# for linux, but I decided it was too hard.
I am a mainly a Web developer who uses basic JS/HTML/CSS. These don't help as I need it to be in C. (Not ++/#). I have tried using AI to assist, but no matter what language I use (Python, Ruby, Java, C), It will fail.
Here is the closest I gotten ( 〔〕 --> ><:
#include 〔stdio.h〕
#include 〔stdlib.h〕
#include 〔string.h〕
#include 〔curl/curl.h〕
#include "cJSON.h"
#include "sqlite3.h"
#define URL "https://danbooru.donmai.us/posts.json?page=1&limit=1000&tags=duck&json=1"
#define DB_NAME "data.db"
static int callback(void *NotUsed, int argc, char **argv, char **azColName) {
int i;
for (i = 0; i < argc; i++) {
printf("%s = %s\n", azColName[i], argv[i] ? argv[i] : "NULL");
}
printf("\n");
return 0;
}
int main() {
CURL *curl;
CURLcode res;
FILE *fp;
char *url = URL;
char outfilename[FILENAME_MAX] = "data.json";
curl = curl_easy_init();
if (curl) {
fp = fopen(outfilename, "wb");
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, NULL);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
fclose(fp);
}
sqlite3 *db;
char *err_msg = 0;
int rc = sqlite3_open(DB_NAME, &db);
if (rc != SQLITE_OK) {
fprintf(stderr, "Cannot open database: %s\n", sqlite3_errmsg(db));
sqlite3_close(db);
return 1;
}
char *sql = "CREATE TABLE data (id INT PRIMARY KEY NOT NULL, md5 TEXT NOT NULL, tag_string TEXT NOT NULL, tag_count_general INT NOT NULL);";
if (rc != SQLITE_OK) {
fprintf(stderr, "SQL error: %s\n", err_msg);
sqlite3_free(err_msg);
sqlite3_close(db);
return 1;
}
cJSON *json;
cJSON *item;
cJSON *id;
cJSON *md5;
cJSON *tag_string;
cJSON *tag_count_general;
char buffer[1024];
fp = fopen("data.json", "r");
fread(buffer, 1024, 1, fp);
fclose(fp);
json = cJSON_Parse(buffer);
if (!json) {
printf("Error before: [%s]\n", cJSON_GetErrorPtr());
return 1;
}
cJSON_ArrayForEach(item, json) {
id = cJSON_GetObjectItem(item, "id");
md5 = cJSON_GetObjectItem(item, "md5");
tag_string = cJSON_GetObjectItem(item, "tag_string");
tag_count_general = cJSON_GetObjectItem(item, "tag_count_general");
char insert_query[1024];
sprintf(insert_query,
"INSERT INTO data (id, md5, tag_string, tag_count_general) VALUES (%d,'%s','%s',%d);",
id->valueint,
md5->valuestring,
tag_string->valuestring,
tag_count_general->valueint
);
rc = sqlite3_exec(db, insert_query, callback, 0, &err_msg);
if (rc != SQLITE_OK) {
fprintf(stderr, "SQL error: %s\n", err_msg);
sqlite3_free(err_msg);
sqlite3_close(db);
return 1;
}
}
}
Compile:
gcc cJSON.c file.c -lcurl -lsqlite3
Error:
Error before: [tag_count_co]
The Python
read()
function, which is basically the same asfread()
also accepts a length parameter and 1024 is a sensible number to give it.You can read the entire file in python but you probably shouldn't do that. In the real world, your software will be given files larger than it can handle and it's important to have logic in place to process the file incrementally instead of all at once.
Honestly, it's what makes this particular problem so challenging... the proper way to parse a JSON file is as a "stream" and not by holding the entire string in memory at once unless you can guarantee it's very small.
It's definitely possible to write a C program that reads the entire file. You just won't find many examples of that because it's a bad idea. What if someone gives you a 30GB file?
You said you used an "AI to assist" but it failed? What were you using? I pasted it into ChatGPT 4 and it found eight problems - including this one. Some of the others are even worse... for example the
CREATE TABLE
query is literally never executed. Oops.Streaming JSON parsers are a thing, e.g. pdjson for C. It's, of course, a different approach and it's generally slightly trickier to work with those, but that's what you would use of you have unbound document size and you can process it in chunks.
While true it should stop reading a 30G file, I am the stupid when it come to this and I just need a program running.
The AI I used was Bing AI. It was "Helpful" since I don't know C at all, but my primitive JS knowledge couldn't save it.
I know small amount of Python and this is the only time I used AI for assistance. I have to say, the context is there, but needs fixing from time to time.