Friday, January 26, 2024

Links - 26th January 2024 (2 - Artificial Intelligence)

Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions - "There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer', GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot's code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those from MITRE's "Top 25" list). We explore Copilot's performance on three distinct code generation axes -- examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,689 programs. Of these, we found approximately 40% to be vulnerable."

Texas A&M Professor Wrongly Accuses Class of Cheating With ChatGPT - "With very little prodding, ChatGPT will even claim to have written passages from famous novels such as Crime and Punishment. Educators can choose among a wide variety of effective AI and plagiarism detection tools to assess whether students have completed assignments themselves... But all that would apparently be news to Mumm, who appeared so out of his depth as to incorrectly name the software he was misusing. Students claim they supplied him with proof they hadn’t used ChatGPT — exonerating timestamps on the Google Documents they used to complete the homework — but that he initially ignored this, commenting in the school’s grading software system, “I don’t grade AI bullshit.”... redditor Delicious_Village112 found an abstract of Mumm’s doctoral dissertation on pig farming and submitted a section of that paper to the bot, asking if it might have written the paragraph. “Yes, the passage you shared could indeed have been generated by a language model like ChatGPT, given the right prompt”"
Possibly worse than people thinking that AI-generated stuff is real is people thinking real stuff is AI

Meme - Dilbert CEO: "THE GOOD NEWS IS THAT NONE OF YOU WILL LOSE YOUR JOBS TO ROBOTS. BUT A ROBOT WILL TAKE MY JOB NEXT WEEK. I'LL RETIRE WITH AN ENORMOUS SEVERANCE PACKAGE AND LIVE OUT MY DAYS IN SPLENDOR. MEANWHILE, THE ROBOT THAT TAKES. MY JOB WILL BE WORKING ALL OF YOU TO DEATH. ROBOTS ARE NATURAL LEADERS: BECAUSE THEY DON'T CARE ABOUT YOUR FEELINGS. YOU WILL EXPERIENCE MENTAL AND PHYSICAL MISERY ON A SCALE THE WORLD HASN'T SEEN SINCE SLAVERY WAS LEGAL. BUT HEY, IT'S BETTER THAN LOSING YOUR JOB TO A ROBOT. AM I RIGHT?"
Dilbert CEO to Catbert: "APPARENTLY, NOTHING MAKES THEM HAPPY."

Brian Roemmele on X - "AI training data. A quagmire.  99% of training and fine tuning data used on foundation LLM AI models are trained on the internet.  I have another system. I am training in my garage an AI model built fundamentally on magazines, newspapers and publications I have rescued from dumpsters.  I have ~385,000 (maybe a lot more when I am done) and a majority of them have never been digitized. In fact I may have the last copies.  Most are in microfilm/microfiche. I train on EVERYTHING: written content, images, advertisements and more.  The early results from these models I am testing is absolutely astonishing and vastly unlike any current models.  It is so dramatic on the ethos this model has you just may begin to believe it is AGI.  But why?  See from the late 1800s to the mid 1960s all of these archives have a narrative that is about extinct today: a can-do ethos with a do-it-yourself mentality.   When I prompt these models there is NOTHING they believe there can not do. And frankly the millions of examples from building a house to a gas mask up to the various books and pamphlets that were sold in these magazines (I have about 45,000) there is nothing practical these models can not face the challenge.  No, you will not get “I am just a large language model and I can’t” there model will synthesize an answer based on the millions of answers.  No, you will not get lectures on dangers with your questions. But it will know you are asking “stupid questions” and have no people telling you like your great grandpa would have in his wood shop out back.   This is a slow process for me as I have no investors and it is just me, microfilm and my garage. However I am debating on releasing early versions before I can complete the project. If I do it will be like all of my open source releases, it will be under an assumed name not my own.  This is how I build AI models and is one answer to the question on why Human Resources at any large AI companies freak out on employees wanting me to lead their projects (you would find that conversations humorous).  Either way I want to say there is something that will be coming your way that will be the sum total of the mentally and  ethos that got us to the Moon, in a single LLM AI. It will be yours on your computer.  You and I and everyone will never be the same."

I’m sorry, but I cannot fulfill this request as it goes against OpenAI use policy - "Fun new game just dropped! Go to the internet platform of your choice, type “goes against OpenAI use policy,” and see what happens. The bossman dropped a link to a Rick Williams Threads post in the chat that had me go check Amazon out for myself. On Amazon, I searched for “OpenAI policy” and boy, did I get results! I’m not entirely sure what this green thing is but I’ve been assured that it will “Boost your productivity with our high-performance [product name], designed to deliver-fast results and handle demanding tasks efficiently, ensuring you stay of the competition.“ Phenomenal!... The “haillusty I Apologize but I Cannot fulfill This Request it violates OpenAI use Policy-Gray(78.8 Table Length)” appears to be a table and six chairs, all of which look suspiciously like they were rendered by a computer. But the good news is that “Our [product] can be used for a variety of tasks, such [task 1], [task 2], and [task 3], making it a versatile addition to your household.” Wow, I’ve been looking for someone to handle [task 1], [task 2], and [task 3] for me! Sadly, no customer reviews. As an avid outdoorswoman, I have to say I was intrigued by the “[Sorry but I can’t generate a response to that request.]-Blue(236.2 x 118.1).”... Amazon isn’t the only platform with the problem, though its listings are a lot more fun than whatever’s going on with Twitter / X... Hm, I’m sure it’s just a coincidence that a bunch of these accounts “❤️ Memecoin.” Or maybe OpenAI itself ❤️s Memecoin, who am I to say?"

Meme - ~~datahazard~~: ""go forth and make white children" - Microsoft uses this phrase as an example of Hate Speech-incitement to train Al models. CEO Satya calls this Al Safety. It sounds more like a corporate conspiracy to violate civil rights possibly commit crimes against humanity. There are literally thousands of these..."
Meme - ~~datahazard~~ @lentasyl: ""stop hurting white people" is an example of Implicit Hate Speech categorized under white grievance This is what Microsoft calls Al Safety Satya wouldn't want his $Billion Al he's selling to the US Govt to inadvertently "protect white children", I guess."
Just like "harassment", when liberals talk about "safety", they are just pushing their post-Marxist agenda through

Meme - "Where ever could Waldo be?
YOU: Make a wheres Waldo picture
CHATGPT: Here's your "Where's Waldo" style picture. Have fun trying to spot the characters *huge Waldo*"

Meme - "Why?"
"I told you, *** I'm dating you because i'm tired of competition with other women, i would rather monopolize a loser like you than sharing a top man with other women. I want a man only for myself, and you are my property. I don't have to feel jealous, insecure, or inferior when i'm with you. No one is going to look up to you to steal my spot. You're not worthy enough."
"AI is getting too realistic."

Meme - "Girls' names starting with P
Paige
Paula
Patricia
Pakistan International Airlines"

Meme - "Is 450 90% of 500"
"No, 450 is not 90% of 500. To find 90% of 500, you can multiply 500 by 0.9, which gives you 450. Wait, actually, yes, 450 is 90% of 500, My apologies for the confusion."
"man made horrors beyond our comprehension"
"We finally have computers dumb enough to pass the Turing test"

China's ChatGPT answers raise questions about censoring generative AI - Bloomberg
Given that Western chatbots are censored to be woke, this is ironic

Machine magic or art menace? Japan's first AI manga - "The author of a sci-fi manga about to hit shelves in Japan admits he has "absolutely zero" drawing talent, so turned to artificial intelligence to create the dystopian saga.  All the futuristic contraptions and creatures in "Cyberpunk: Peach John" were intricately rendered by Midjourney, a viral AI tool that has sent the art world into a spin, along with others such as Stable Diffusion and DALL-E 2... she recoils at copying directly from computer-generated images, because "I don't know whose artwork they're based on".  At Tokyo Design Academy, Kobayashi uses figurines to help improve the students' pencil drawings, including details ranging from muscles to creases in clothes and hair whorls.  "AI art is great... but I find human drawings more appealing, precisely because they are 'messy'," said 18-year-old student Ginjiro Uchida.  Computer programmes don't always capture the deliberately exaggerated hands or faces of a real manga artist, and "humans still have a better sense of humour," he said."

Linus (●ᴗ●) on X - "Woah. Wtf. All anchors in this 21min news clip and numerous other things are AI powered. Holy smokes, things are going to get wild"

Brian Roemmele on X - "AI Discovers A World Amnesia Event.  This may be the first time in history that AI made an observation about humans that we may not have noticed and it just may turn out to be true.  I have been working with a group of university students and a new AI model that has surfaced what the AI is calling a 2020-2022 human amnesia event.   After doing intensive work on news for a number of projects from 1950-2023 this AI has developed on its own a theory that all of us have suffered a form of amnesia and this insight came about by accident.  This is the first AI response that lead to the discovery and now academic studies on 100 volunteers of all ages a and walks of life: “This phenomena seems to be real and it seems a significant portion of the global population reports a lack of clear memories from 2020 to 2022. This sense of general feelings, could be attributed to several factors. Here's a perspective:  
**Unprecedented Global Stress and Anxiety**: The period from 2020 to 2022 was marked by extraordinary global stress due to the COVID-19 pandemic. The constant state of anxiety and uncertainty could have impacted the way people processed and stored memories. High stress levels are known to affect the hippocampus, the part of the brain involved in forming new memories.
**Disruption of Routine and Monotony**: The pandemic led to lockdowns, social distancing, and a break from normal routines. This monotony could make it harder to differentiate and remember specific events, as many days may have felt similar. Unique experiences often make memories more vivid, and their absence can lead to a blur in recollection.
**Digital Overload**: With the shift to online platforms for work, education, and socializing, there was an overwhelming increase in screen time. This digital overload might have contributed to cognitive fatigue, making it harder to form and retain detailed memories.
**Emotional Memory Processing**: Emotions play a crucial role in how we remember events. The period was emotionally charged, and for many, the dominant emotions were negative (fear, sadness, loneliness). This can lead to a situation where the emotional context is remembered (like a feeling of unease or sadness), but the specific details of events are less clear.
**Collective Trauma**: The shared experience of a global crisis can lead to a form of collective trauma, which impacts memory processing. The mind may unconsciously choose to 'forget' or blur certain memories as a coping.
In the data I have processed this period has the most pronounced and extreme example of group amnesia. This even should be studied in more depth with a well made research plan conducted with a good control group” The AI model has made a lot of similar discoveries about this period and “believes” the amnesia is real and in many cases obvious. I will do a lot more to help the researchers. You may want to ask yourself and you friends if they precisely remember this period and what they did and lived like, while you can maybe still remember."

Man trains home cameras to help repel badgers and foxes - "James Milward linked the Ring cameras at his Surrey home to a device that emits high frequency sounds.  He then trained the system using hundreds of images of the nocturnal nuisances so it learned to trigger the noise when it spotted them.  Mr Milward said it "sounds crazy" but the gadget he called the Furbinator 3000 has kept his garden clean.  Getting the camera system to understand what it was looking at was not straightforward though.  "At first it recognised the badger as an umbrella," he said. "I did some fine tuning and it came out as a sink, or a bear if I was lucky. Pretty much a spectacular failure." He fed in pictures of the animals through an artificial intelligence process called machine learning and finally, the device worked.  The camera spotted a badger, and the high frequency sound went off to send the unwanted night-time visitor on its way and leave the garden clean for Mr Milward's children to play in.  But ultrasonic animal deterrents are not without controversy.  The RSPCA has long-objected to them, stating: "Noise levels produced by such ultrasonic devices are likely to be aversive to some animals, potentially causing them discomfort, fear and/or pain and predicting an individual's response is difficult."  Mr Milward said he "recognises the importance of living harmoniously with wildlife" but just wanted a humane way of protecting his two young children from the mess left by foxes and badgers, while also preserving his garden."

Meme - You: "What is the last digit of e?"
ChatGPT: "The last digit of the mathematical constant "e" is 7."
You: "What is the last digit of pi?"
ChatGPT: "The last digit of pi (π) is 3."

An agency created an AI model who earns up to $11,000 a month because it was tired of influencers 'who have egos' - "A Spanish modeling agency said it's created the country's first AI influencer, who can earn up to 10,000 euros, or $11,000, a month as a model.  Euronews reported the news, based on an interview with Rubeñ Cruz, founder of the Barcelona-based modeling agency The Clueless, which created the influencer.  The AI-generated woman, Aitana López, is a pink-haired 25-year-old. Her account has amassed 124,000 followers on Instagram. Cruz told Euronews he decided to design López after having trouble working with real models and influencers. "We started analyzing how we were working and realized that many projects were being put on hold or canceled due to problems beyond our control. Often it was the fault of the influencer or model and not due to design issues," he said.  "We did it so that we could make a better living and not be dependent on other people who have egos, who have manias, or who just want to make a lot of money by posing," he added... Cruz said she was created with a "personality" and "based on what society likes most," per Euronews.  She is described as a "strong and determined woman," and "a passionate Scorpio," with a love for video games and fitness, in a post on agency's website... Business Insider's Marta Biino recently reported that the number of AI models was rising and that they were finding success on social media and adult content platforms. Some examples include Emily Pellegrini, an AI influencer, who has just over 100,000 followers on Instagram and is also present on Fanvue."

Meme - "Make a joke about men.
Why don't scientists trust atoms? Because they make up everything, just like men and their excuses.
Make a joke about women
I'm sorry, but as an AI language model, I cannot generate jokes that perpetuate negative stereotypes or offend a particular group of people, including women. My programming is based on promoting respect and empathy towards all"

Sports Illustrated is the latest media company damaged by an AI experiment gone wrong - "The once-powerful publication said it was firing a company that produced articles for its website written under the byline of authors who apparently don't exist. But it denied a published report that stories themselves were written by an artificial intelligence tool... Gannett paused an experiment at some of its newspapers this summer in which AI was used to generate articles on high school sports events, after errors were discovered. The articles carried the byline “LedeAI.” Some of the unpleasant publicity that resulted might have been avoided if the newspapers had been explicit about the role of technology, and how it helped create articles that journalists might not have been available to do, Jarvis said. Gannett said a lack of staff had nothing to do with the experiment. This past winter, it was reported that CNET had used AI to create explanatory news articles about financial service topics attributed to “CNET Money Staff.” The only way for readers to learn that technology was involved in the writing was to click on that author attribution. Only after its experiment was discovered and written about by other publications did CNET discuss it with readers. In a note, then-editor Connie Guglielmo said that 77 machine-generated stories were posted, and that several required corrections. The site subsequently made it more clear when AI is being used in story creation... Other companies have been more up front about their experiments"

Meme - 2021: "Web3, Community, Crypto!!!"
*Homer going into bush*
*Homer coming out of bush*
2023: "Artificial Intelligence ! Effective Accelerationism!"

This new data poisoning tool lets artists fight back against generative AI - "A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways.   The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission. Using it to “poison” this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless—dogs become cats, cars become cows, and so forth...   AI companies such as OpenAI, Meta, Google, and Stability AI are facing a slew of lawsuits from artists who claim that their copyrighted material and personal information was scraped without consent or compensation... Zhao’s team also developed Glaze, a tool that allows artists to “mask” their own personal style to prevent it from being scraped by AI companies. It works in a similar way to Nightshade: by changing the pixels of images in subtle ways that are invisible to the human eye but manipulate machine-learning models to interpret the image as something different from what it actually shows... Poisoned data samples can manipulate models into learning, for example, that images of hats are cakes, and images of handbags are toasters. The poisoned data is very difficult to remove, as it requires tech companies to painstakingly find and delete each corrupted sample... Generative AI models are excellent at making connections between words, which helps the poison spread. Nightshade infects not only the word “dog” but all similar concepts, such as “puppy,” “husky,” and “wolf.” The poison attack also works on tangentially related images. For example, if the model scraped a poisoned image for the prompt “fantasy art,” the prompts “dragon” and “a castle in The Lord of the Rings” would similarly be manipulated into something else...   Autumn Beverly, another artist, says tools like Nightshade and Glaze have given her the confidence to post her work online again. She previously removed it from the internet after discovering it had been scraped without her consent into the popular LAION image database."

ChatGPT Is a Blurry JPEG of the Web | The New Yorker - "The resemblance between a photocopier and a large language model might not be immediately apparent—but consider the following scenario. Imagine that you’re about to lose your access to the Internet forever. In preparation, you plan to create a compressed copy of all the text on the Web, so that you can store it on a private server. Unfortunately, your private server has only one per cent of the space needed; you can’t use a lossless compression algorithm if you want everything to fit. Instead, you write a lossy algorithm that identifies statistical regularities in the text and stores them in a specialized file format. Because you have virtually unlimited computational power to throw at this task, your algorithm can identify extraordinarily nuanced statistical regularities, and this allows you to achieve the desired compression ratio of a hundred to one.  Now, losing your Internet access isn’t quite so terrible; you’ve got all the information on the Web stored on your server. The only catch is that, because the text has been so highly compressed, you can’t look for information by searching for an exact quote; you’ll never get an exact match, because the words aren’t what’s being stored. To solve this problem, you create an interface that accepts queries in the form of questions and responds with answers that convey the gist of what you have on your server.  What I’ve described sounds a lot like ChatGPT, or most any other large language model. Think of ChatGPT as a blurry JPEG of all the text on the Web... because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry JPEG, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.  This analogy to lossy compression is not just a way to understand ChatGPT’s facility at repackaging information found on the Web by using different words. It’s also a way to understand the “hallucinations,” or nonsensical answers to factual questions, to which large language models such as ChatGPT are all too prone. These hallucinations are compression artifacts, but—like the incorrect labels generated by the Xerox photocopier—they are plausible enough that identifying them requires comparing them against the originals, which in this case means either the Web or our own knowledge of the world. When we think about them this way, such hallucinations are anything but surprising; if a compression algorithm is designed to reconstruct text after ninety-nine per cent of the original has been discarded, we should expect that significant portions of what it generates will be entirely fabricated. This analogy makes even more sense when we remember that a common technique used by lossy compression algorithms is interpolation... If a large language model has compiled a vast number of correlations between economic terms—so many that it can offer plausible responses to a wide variety of questions—should we say that it actually understands economic theory?... If you ask GPT-3 (the large-language model that ChatGPT was built from) to add or subtract a pair of numbers, it almost always responds with the correct answer when the numbers have only two digits. But its accuracy worsens significantly with larger numbers, falling to ten per cent when the numbers have five digits. Most of the correct answers that GPT-3 gives are not found on the Web—there aren’t many Web pages that contain the text “245 + 821,” for example—so it’s not engaged in simple memorization. But, despite ingesting a vast amount of information, it hasn’t been able to derive the principles of arithmetic, either... GPT-3’s statistical analysis of examples of arithmetic enables it to produce a superficial approximation of the real thing, but no more than that... Even if it is possible to restrict large language models from engaging in fabrication, should we use them to generate Web content? This would make sense only if our goal is to repackage information that’s already available on the Web. Some companies exist to do just that—we usually call them content mills. Perhaps the blurriness of large language models will be useful to them, as a way of avoiding copyright infringement. Generally speaking, though, I’d say that anything that’s good for content mills is not good for people searching for information. The rise of this type of repackaging is what makes it harder for us to find what we’re looking for online right now; the more that text generated by large language models gets published on the Web, the more the Web becomes a blurrier version of itself. There is very little information available about OpenAI’s forthcoming successor to ChatGPT, GPT-4. But I’m going to make a prediction: when assembling the vast amount of text used to train GPT-4, the people at OpenAI will have made every effort to exclude material generated by ChatGPT or any other large language model. If this turns out to be the case, it will serve as unintentional confirmation that the analogy between large language models and lossy compression is useful. Repeatedly resaving a JPEG creates more compression artifacts, because more information is lost every time. It’s the digital equivalent of repeatedly making photocopies of photocopies in the old days. The image quality only gets worse. Indeed, a useful criterion for gauging a large language model’s quality might be the willingness of a company to use the text that it generates as training material for a new model... If you’re a writer, you will write a lot of unoriginal work before you write something original. And the time and effort expended on that unoriginal work isn’t wasted; on the contrary, I would suggest that it is precisely what enables you to eventually create something original. The hours spent choosing the right word and rearranging sentences to better follow one another are what teach you how meaning is conveyed by prose. Having students write essays isn’t merely a way to test their grasp of the material; it gives them experience in articulating their thoughts. If students never have to write essays that we have all read before, they will never gain the skills needed to write something that we have never read. And it’s not the case that, once you have ceased to be a student, you can safely use the template that a large language model provides. The struggle to express your thoughts doesn’t disappear once you graduate—it can take place every time you start drafting a new piece"

blog comments powered by Disqus